Python Programming for Biology: Bioinformatics and Beyond


Figure 11.4.  A graph of how overall peptide charge varies with pH



Download 7,75 Mb.
Pdf ko'rish
bet153/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   149   150   151   152   153   154   155   156   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Figure 11.4.  A graph of how overall peptide charge varies with pH. The estimated

overall charge for an example protein sequence is shown for various pH values, which

correspond to hydrogen ion concentrations. At different pH values the various acidic and

basic chemical groups on some of the amino acids release or accept hydrogen ions, thus

changing the average (considering many molecules over time) electric charge. The pI or

isoelectric point is the pH value where the individual charges balance to give zero overall

charge.

To  calculate  the  pI  we  must  find  the  pH  where  we  think  the  positive  and  negative

charges in the protein balance, hence must first have a method for estimating the charge of

a protein chain at a given pH. We then use this method to test different values of pH, until

we  home  in  on  the  value  where  the  overall  charge  is  zero.  It  is  possible  to  use  an

exhaustive  method  whereby  we  systematically  test  lots  of  pH  values  which  differ  by  a

very small amount until we find one that gives the charge closest to zero. However, here

we  will  use  a  more  intelligent  method  that  will  find  a  good  answer  in  only  a  few  steps.

Although  the  example  problem  is  not  so  challenging  and  we  could  have  used  the

exhaustive  method,  for  many  other,  larger  problems  using  an  intelligent  optimisation  is

essential to get a reasonable answer in a reasonable time.

The optimisation algorithm we will use employs a divide-and-conquer strategy. We test

various pH values by stepping between test points and for a given pH value and whether

the  resulting  charge  is  above  or  below  zero  (positive  or  negative)  tells  us  in  which

direction we must search next for a better answer. Also, if we come across a better guess

for the pI (i.e. a pH that predicts a charge closer to zero) then we reduce the step size (how

far to go for the next guess) by half so that we get increasingly close to the optimum value

and don’t overshoot far. Note that we are only able to use this strategy because we know

how  the  problem  works;  it  is  well  behaved  because  we  know  that  there  is  only  one

solution and we know how far ahead to look for a better answer. Not all problems will be

so  simple,  and  for  the  more  difficult  situations  we  can  employ  the  methods  discussed  in

Chapter 25

.

The function estimateCharge is designed to estimate the charge of a given sequence at a



given  pH.  The  basis  of  the  calculation  involves  estimating  the  proportion  of  dissociated

acidic and H

+

-bound


18

basic amino acids from reference pK

a

values. This procedure does



not  take  account  of  the  effect  that  the  sequence  and  folding  of  a  protein  has  upon  the

dissociation constants of its component residues, i.e. the pK

a

values in any real situation



vary  according  to  how  charged  residues  interact  with  each  other.  Also,  this  calculation


assumes we are in water at a standard temperature.

The function takes two input variables which are helpfully named and returns a single

charge value. We define a dictionary of pK

a

values for basic and acidic amino acids, keyed



by their code letters. Note that we also have values for ‘+’ and ‘-’, which are symbols that

will  be  used  to  represent  the  charge-carrying  groups  that  arise  from  the  free  N-  and  C-

termini. We define another dictionary, isAcid, so that we can look up whether each charge

group acts as an acid or not.

For each amino acid letter in the sequence we find the pK

a

value of the amino acid from



the pK

a

dictionary. If an amino acid is neither acidic nor basic (uncharged), and thus not



present in this dictionary, the .get() function will helpfully give a value of None. If we do

get a pK


a

value we do the mathematics with the pK

a

and the input pH



19

to calculate how

much of the group will be dissociated (free from H

+

).



If  the  residue  is  acidic  we  add  a  negative  charge  for  the  dissociated  proportion  of  the

residue. Otherwise, if the amino acid is basic, we have positive charge for the proportion

of the amino acid that remains associated with hydrogen ions. The associated proportion

is what remains after we subtract the dissociated proportion, hence 1-proportion, and we

do  not  bother  to  multiply  by  +1  for  a  positive  charge.  The  estimated  charge  of  the

individual  amino  acid  is  added  to  the  running  total.  And  at  the  end  we  return  the  total

charge from the function to be used elsewhere.

def estimateCharge(sequence, pH):

"""Using pKa values estimate the charge of a sequence of

amino acids at a given pH"""

pKaDict = {'+': 8.0,'-': 3.1,'K':10.0,'R':12.0,

'H': 6.5,'E': 4.4,'D': 4.4,'Y':10.0,'C': 8.5}

isAcid = {'+':False,'-':True,'K':False,'R':False,

'H':False,'E':True,'D':True,'Y':True,'C':True}

total = 0.0

for aminoAcid in sequence:

pKa = pKaDict.get(aminoAcid)

if pKa is not None:

r = 10.0 ** (pH-pKa)

dissociated = r/(r+1.0)

if isAcid[aminoAcid]:

charge = -1.0 * dissociated

else:

charge = 1.0 - dissociated



total += charge

return total

The  estimateIsoelectric  function  uses  the  estimateCharge  function  defined  above  to



estimate  the  pH  at  which  a  protein  sequence  will  be  neutrally  charged.  To  the  input

sequence of letters we add the + and - symbols to represent the charge groups at the N and

C  termini  (strictly  speaking  these  don’t  have  to  be  at  the  ends  because  order  is

unimportant). We define an initial pI guess bestValue of zero before starting our search for

the point of neutrality, as we know that the pI is not going to be less than this. Also, the

charge at this starting pH is estimated from this initial value and an increment size of 7.0 is

defined (somewhat arbitrarily) to determine the next pH value along the scale that will be

tested.


Now  we  set  up  a  while  loop  to  search  for  the  pH  at  neutrality,  but  we  do  not  aim  to

calculate the pH at which the charge is exactly zero, we just want to get close; the result,

just like the pK

a

values, will only be an estimate, so a very precise value is not necessary.



Thus,  rather  than  performing  the  loop  until  the  charge  is  exactly  zero  we  only  continue

until  it  is  less  than  an  acceptable  small  value  (0.001  in  this  case).  We  test  to  see  if  the

absolute  value

20

 of  charge  for  the  best  pH  found  so  far  is  greater  than  the  threshold.



Otherwise  the  loop  will  stop  and  the  last  value  of  the  best  pH,  close  to  the  pI,  will  be

recorded.

If the test charge is smaller than the smallest found so far we record the best pH from

the value tested and we record the smallest charge found thus far. Otherwise, if the test pH

gives no improvement to the smallest charge, we reduce the step size variable increment to

half its value, to narrow-in on a better value. Also, if the tested charge was less than zero

we know that we should step in the reverse direction (multiply by minus one) to get closer

to zero. Finally, when the while loop exits the last pH recorded will be one corresponding

to neutrality, so we return this value.

def estimateIsoelectric(sequence):

"""Estimate the charge neutral pH of a protein sequence.

This is just a guess as pKa values will vary according to

protein sequence, conformation and conditions.

"""


sequence = '+' + sequence + '-' # assumes seq is a string

bestValue = 0.0

minCharge = estimateCharge(sequence, bestValue)

increment = 7.0

while abs(minCharge) > 0.001:

pHtest = bestValue + increment

charge = estimateCharge(sequence, pHtest)

if abs(charge) < abs(minCharge):

minCharge = charge

bestValue = pHtest

else:

increment = abs(increment)/2.0



if minCharge < 0.0:

increment *= -1

return bestValue

To run this we simply call the function with a protein one-letter sequence. Also, to see




how  quickly  the  test  pH  value  homes-in  on  the  pI,  you  may  like  to  insert  print(pHtest)

inside the while loop.

pI = estimateIsoelectric(proteinSeq)


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   149   150   151   152   153   154   155   156   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish