Python Programming for Biology: Bioinformatics and Beyond

Figure 11.4. A graph of how overall peptide charge varies with pH

Download 7,75 Mb.

Pdf ko'rish

bet	153/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 149 150 151 152 153 154 155 156 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Figure 11.4. A graph of how overall peptide charge varies with pH. The estimated

overall charge for an example protein sequence is shown for various pH values, which

correspond to hydrogen ion concentrations. At different pH values the various acidic and

basic chemical groups on some of the amino acids release or accept hydrogen ions, thus

changing the average (considering many molecules over time) electric charge. The pI or

isoelectric point is the pH value where the individual charges balance to give zero overall

charge.

To calculate the pI we must find the pH where we think the positive and negative

charges in the protein balance, hence must first have a method for estimating the charge of

a protein chain at a given pH. We then use this method to test different values of pH, until

we home in on the value where the overall charge is zero. It is possible to use an

exhaustive method whereby we systematically test lots of pH values which differ by a

very small amount until we find one that gives the charge closest to zero. However, here

we will use a more intelligent method that will find a good answer in only a few steps.

Although the example problem is not so challenging and we could have used the

exhaustive method, for many other, larger problems using an intelligent optimisation is

essential to get a reasonable answer in a reasonable time.

The optimisation algorithm we will use employs a divide-and-conquer strategy. We test

various pH values by stepping between test points and for a given pH value and whether

the resulting charge is above or below zero (positive or negative) tells us in which

direction we must search next for a better answer. Also, if we come across a better guess

for the pI (i.e. a pH that predicts a charge closer to zero) then we reduce the step size (how

far to go for the next guess) by half so that we get increasingly close to the optimum value

and don’t overshoot far. Note that we are only able to use this strategy because we know

how the problem works; it is well behaved because we know that there is only one

solution and we know how far ahead to look for a better answer. Not all problems will be

so simple, and for the more difficult situations we can employ the methods discussed in

Chapter 25

The function estimateCharge is designed to estimate the charge of a given sequence at a

given pH. The basis of the calculation involves estimating the proportion of dissociated

acidic and H

-bound

basic amino acids from reference pK

values. This procedure does

not take account of the effect that the sequence and folding of a protein has upon the

dissociation constants of its component residues, i.e. the pK

values in any real situation

vary according to how charged residues interact with each other. Also, this calculation

assumes we are in water at a standard temperature.

The function takes two input variables which are helpfully named and returns a single

charge value. We define a dictionary of pK

values for basic and acidic amino acids, keyed

by their code letters. Note that we also have values for ‘+’ and ‘-’, which are symbols that

will be used to represent the charge-carrying groups that arise from the free N- and C-

termini. We define another dictionary, isAcid, so that we can look up whether each charge

group acts as an acid or not.

For each amino acid letter in the sequence we find the pK

value of the amino acid from

the pK

dictionary. If an amino acid is neither acidic nor basic (uncharged), and thus not

present in this dictionary, the .get() function will helpfully give a value of None. If we do

get a pK

value we do the mathematics with the pK

and the input pH

to calculate how

much of the group will be dissociated (free from H

If the residue is acidic we add a negative charge for the dissociated proportion of the

residue. Otherwise, if the amino acid is basic, we have positive charge for the proportion

of the amino acid that remains associated with hydrogen ions. The associated proportion

is what remains after we subtract the dissociated proportion, hence 1-proportion, and we

do not bother to multiply by +1 for a positive charge. The estimated charge of the

individual amino acid is added to the running total. And at the end we return the total

charge from the function to be used elsewhere.

def estimateCharge(sequence, pH):

"""Using pKa values estimate the charge of a sequence of

amino acids at a given pH"""

pKaDict = {'+': 8.0,'-': 3.1,'K':10.0,'R':12.0,

'H': 6.5,'E': 4.4,'D': 4.4,'Y':10.0,'C': 8.5}

isAcid = {'+':False,'-':True,'K':False,'R':False,

'H':False,'E':True,'D':True,'Y':True,'C':True}

total = 0.0

for aminoAcid in sequence:

pKa = pKaDict.get(aminoAcid)

if pKa is not None:

r = 10.0 ** (pH-pKa)

dissociated = r/(r+1.0)

if isAcid[aminoAcid]:

charge = -1.0 * dissociated

else:

charge = 1.0 - dissociated

total += charge

return total

The estimateIsoelectric function uses the estimateCharge function defined above to

estimate the pH at which a protein sequence will be neutrally charged. To the input

sequence of letters we add the + and - symbols to represent the charge groups at the N and

C termini (strictly speaking these don’t have to be at the ends because order is

unimportant). We define an initial pI guess bestValue of zero before starting our search for

the point of neutrality, as we know that the pI is not going to be less than this. Also, the

charge at this starting pH is estimated from this initial value and an increment size of 7.0 is

defined (somewhat arbitrarily) to determine the next pH value along the scale that will be

tested.

Now we set up a while loop to search for the pH at neutrality, but we do not aim to

calculate the pH at which the charge is exactly zero, we just want to get close; the result,

just like the pK

values, will only be an estimate, so a very precise value is not necessary.

Thus, rather than performing the loop until the charge is exactly zero we only continue

until it is less than an acceptable small value (0.001 in this case). We test to see if the

absolute value

of charge for the best pH found so far is greater than the threshold.

Otherwise the loop will stop and the last value of the best pH, close to the pI, will be

recorded.

If the test charge is smaller than the smallest found so far we record the best pH from

the value tested and we record the smallest charge found thus far. Otherwise, if the test pH

gives no improvement to the smallest charge, we reduce the step size variable increment to

half its value, to narrow-in on a better value. Also, if the tested charge was less than zero

we know that we should step in the reverse direction (multiply by minus one) to get closer

to zero. Finally, when the while loop exits the last pH recorded will be one corresponding

to neutrality, so we return this value.

def estimateIsoelectric(sequence):

"""Estimate the charge neutral pH of a protein sequence.

This is just a guess as pKa values will vary according to

protein sequence, conformation and conditions.

"""

sequence = '+' + sequence + '-' # assumes seq is a string

bestValue = 0.0

minCharge = estimateCharge(sequence, bestValue)

increment = 7.0

while abs(minCharge) > 0.001:

pHtest = bestValue + increment

charge = estimateCharge(sequence, pHtest)

if abs(charge) < abs(minCharge):

minCharge = charge

bestValue = pHtest

else:

increment = abs(increment)/2.0

if minCharge < 0.0:

increment *= -1

return bestValue

To run this we simply call the function with a protein one-letter sequence. Also, to see

how quickly the test pH value homes-in on the pI, you may like to insert print(pHtest)

inside the while loop.

pI = estimateIsoelectric(proteinSeq)

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 149 150 151 152 153 154 155 156 ... 514