Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet145/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   141   142   143   144   145   146   147   148   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Estimate molecular mass

This  next  script  estimates  the  mass  of  a  DNA,  RNA  or  protein  molecule  (in  units  of

daltons). This is only an estimate because various residues reversibly bind hydrogen ions

under  different  conditions  (i.e.  pH  affects  whether  H

+

 ions  are  joined  to  the  acidic  and



basic  sites)  and  we  are  assuming  standard  proportions  of  the  various  isotopes.

7

Nonetheless this estimate will be useful enough to say where we expect DNA or protein to



lie on an electrophoresis gel

8

or mass spectrometer trace.



Firstly,  we  define  a  function,  hopefully  with  a  sensible  and  informative  name,  and

specify that it takes one argument seq, which is a sequence, and one argument molType,

which  states  whether  we  are  using  a  protein  sequence,  a  DNA  sequence  or  an  RNA

sequence.  Note  that  we  set  a  default  value  for  molType  to  be  ‘protein’,  so  that  we  can

work with protein sequences without having to explicitly specify the value.

Inside the function we define a dictionary that stores the average molecular weights of

the  different  kinds  of  residue.  Internally  this  dictionary  contains  three  inner  sub-

dictionaries,  one  for  each  of  the  different  molecule  types.  We  access  the  correct  inner

dictionary using the molType as a key. The one-letter residue codes then act as the keys to



the inner dictionary to extract the appropriate molecular masses.

Next  we  define  a  variable  to  hold  the  total  for  the  molecular  mass.  This  is  initially

defined  with  a  value  equal  to  that  of  the  molecular  mass  of  water,  because  the  average

residue  masses  in  the  dictionary  do  not  take  account  of  the  end  residues  that  have  extra

atoms (OH at one end and H at the other) because they are only linked on one side, instead

of both sides.

def estimateMolMass(seq, molType='protein'):

"""Calculate the molecular weight of a biological sequence assuming

normal isotopic ratios and protonation/modification states

"""


residueMasses = {

"DNA": {"G":329.21, "C":289.18, "A":323.21, "T":304.19},

"RNA": {"G":345.21, "C":305.18, "A":329.21, "U":302.16},

"protein": {"A": 71.07, "R":156.18, "N":114.08, "D":115.08,

"C":103.10, "Q":128.13, "E":129.11, "G": 57.05,

"H":137.14, "I":113.15, "L":113.15, "K":128.17,

"M":131.19, "F":147.17, "P": 97.11, "S": 87.07,

"T":101.10, "W":186.20, "Y":163.17, "V": 99.13}}

massDict = residueMasses[molType]

# Begin with mass of extra end atoms H + OH

molMass = 18.02

for letter in seq:

molMass += massDict.get(letter, 0.0)

return molMass

The  for  loop  extracts  each  element  of  the  sequence  in  turn,  which  will  be  a  single

nucleotide or amino acid letter. This letter is then used to look up the appropriate value of

molecular  mass  in  the  dictionary.  The  .get()  function  of  the  dictionary  is  used  so  that  a

default value for the mass can be specified, just in case we have a letter in the sequence

that is not in the dictionary. In such a circumstance using a guess for an average mass of

an  unrecognised  residue,  rather  than  0.0,  may  be  appropriate  under  some  circumstances.

The  molecular  mass  of  the  current  residue  is  then  added  to  the  total,  and  the  for  loop

moves onto the next letter in the sequence. Finally the return statement is used so that the

value  of  the  total  molecular  mass  is  passed  back  to  the  point  in  the  program  where  the

function was called from. To test this function we could do something like:

proteinSeq = 'IRTNGTHMQPLLKLMKFQKFLLELFTLQKRKPEKGYNLPIISLNQ'

proteinMass = estimateMolMass(proteinSeq)

or for DNA, noting that we have to specify the molecule type:

dnaMass = estimateMolMass(dnaSeq, molType='DNA')


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   141   142   143   144   145   146   147   148   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish