Figure 11.4. A graph of how overall peptide charge varies with pH. The estimated
overall charge for an example protein sequence is shown for various pH values, which
correspond to hydrogen ion concentrations. At different pH values the various acidic and
basic chemical groups on some of the amino acids release or accept hydrogen ions, thus
changing the average (considering many molecules over time) electric charge. The pI or
isoelectric point is the pH value where the individual charges balance to give zero overall
charge.
To calculate the pI we must find the pH where we think the positive and negative
charges in the protein balance, hence must first have a method for estimating the charge of
a protein chain at a given pH. We then use this method to test different values of pH, until
we home in on the value where the overall charge is zero. It is possible to use an
exhaustive method whereby we systematically test lots of pH values which differ by a
very small amount until we find one that gives the charge closest to zero. However, here
we will use a more intelligent method that will find a good answer in only a few steps.
Although the example problem is not so challenging and we could have used the
exhaustive method, for many other, larger problems using an intelligent optimisation is
essential to get a reasonable answer in a reasonable time.
The optimisation algorithm we will use employs a divide-and-conquer strategy. We test
various pH values by stepping between test points and for a given pH value and whether
the resulting charge is above or below zero (positive or negative) tells us in which
direction we must search next for a better answer. Also, if we come across a better guess
for the pI (i.e. a pH that predicts a charge closer to zero) then we reduce the step size (how
far to go for the next guess) by half so that we get increasingly close to the optimum value
and don’t overshoot far. Note that we are only able to use this strategy because we know
how the problem works; it is well behaved because we know that there is only one
solution and we know how far ahead to look for a better answer. Not all problems will be
so simple, and for the more difficult situations we can employ the methods discussed in
Chapter 25
.
The function estimateCharge is designed to estimate the charge of a given sequence at a
given pH. The basis of the calculation involves estimating the proportion of dissociated
acidic and H
+
-bound
18
basic amino acids from reference pK
a
values. This procedure does
not take account of the effect that the sequence and folding of a protein has upon the
dissociation constants of its component residues, i.e. the pK
a
values in any real situation
vary according to how charged residues interact with each other. Also, this calculation
assumes we are in water at a standard temperature.
The function takes two input variables which are helpfully named and returns a single
charge value. We define a dictionary of pK
a
values for basic and acidic amino acids, keyed
by their code letters. Note that we also have values for ‘+’ and ‘-’, which are symbols that
will be used to represent the charge-carrying groups that arise from the free N- and C-
termini. We define another dictionary, isAcid, so that we can look up whether each charge
group acts as an acid or not.
For each amino acid letter in the sequence we find the pK
a
value of the amino acid from
the pK
a
dictionary. If an amino acid is neither acidic nor basic (uncharged), and thus not
present in this dictionary, the .get() function will helpfully give a value of None. If we do
get a pK
a
value we do the mathematics with the pK
a
and the input pH
19
to calculate how
much of the group will be dissociated (free from H
+
).
If the residue is acidic we add a negative charge for the dissociated proportion of the
residue. Otherwise, if the amino acid is basic, we have positive charge for the proportion
of the amino acid that remains associated with hydrogen ions. The associated proportion
is what remains after we subtract the dissociated proportion, hence 1-proportion, and we
do not bother to multiply by +1 for a positive charge. The estimated charge of the
individual amino acid is added to the running total. And at the end we return the total
charge from the function to be used elsewhere.
def estimateCharge(sequence, pH):
"""Using pKa values estimate the charge of a sequence of
amino acids at a given pH"""
pKaDict = {'+': 8.0,'-': 3.1,'K':10.0,'R':12.0,
'H': 6.5,'E': 4.4,'D': 4.4,'Y':10.0,'C': 8.5}
isAcid = {'+':False,'-':True,'K':False,'R':False,
'H':False,'E':True,'D':True,'Y':True,'C':True}
total = 0.0
for aminoAcid in sequence:
pKa = pKaDict.get(aminoAcid)
if pKa is not None:
r = 10.0 ** (pH-pKa)
dissociated = r/(r+1.0)
if isAcid[aminoAcid]:
charge = -1.0 * dissociated
else:
charge = 1.0 - dissociated
total += charge
return total
The estimateIsoelectric function uses the estimateCharge function defined above to
estimate the pH at which a protein sequence will be neutrally charged. To the input
sequence of letters we add the + and - symbols to represent the charge groups at the N and
C termini (strictly speaking these don’t have to be at the ends because order is
unimportant). We define an initial pI guess bestValue of zero before starting our search for
the point of neutrality, as we know that the pI is not going to be less than this. Also, the
charge at this starting pH is estimated from this initial value and an increment size of 7.0 is
defined (somewhat arbitrarily) to determine the next pH value along the scale that will be
tested.
Now we set up a while loop to search for the pH at neutrality, but we do not aim to
calculate the pH at which the charge is exactly zero, we just want to get close; the result,
just like the pK
a
values, will only be an estimate, so a very precise value is not necessary.
Thus, rather than performing the loop until the charge is exactly zero we only continue
until it is less than an acceptable small value (0.001 in this case). We test to see if the
absolute value
20
of charge for the best pH found so far is greater than the threshold.
Otherwise the loop will stop and the last value of the best pH, close to the pI, will be
recorded.
If the test charge is smaller than the smallest found so far we record the best pH from
the value tested and we record the smallest charge found thus far. Otherwise, if the test pH
gives no improvement to the smallest charge, we reduce the step size variable increment to
half its value, to narrow-in on a better value. Also, if the tested charge was less than zero
we know that we should step in the reverse direction (multiply by minus one) to get closer
to zero. Finally, when the while loop exits the last pH recorded will be one corresponding
to neutrality, so we return this value.
def estimateIsoelectric(sequence):
"""Estimate the charge neutral pH of a protein sequence.
This is just a guess as pKa values will vary according to
protein sequence, conformation and conditions.
"""
sequence = '+' + sequence + '-' # assumes seq is a string
bestValue = 0.0
minCharge = estimateCharge(sequence, bestValue)
increment = 7.0
while abs(minCharge) > 0.001:
pHtest = bestValue + increment
charge = estimateCharge(sequence, pHtest)
if abs(charge) < abs(minCharge):
minCharge = charge
bestValue = pHtest
else:
increment = abs(increment)/2.0
if minCharge < 0.0:
increment *= -1
return bestValue
To run this we simply call the function with a protein one-letter sequence. Also, to see
how quickly the test pH value homes-in on the pI, you may like to insert print(pHtest)
inside the while loop.
pI = estimateIsoelectric(proteinSeq)
Do'stlaringiz bilan baham: |