Python Programming for Biology: Bioinformatics and Beyond

Download 7,75 Mb.

Pdf ko'rish

bet	145/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 141 142 143 144 145 146 147 148 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Estimate molecular mass

This next script estimates the mass of a DNA, RNA or protein molecule (in units of

daltons). This is only an estimate because various residues reversibly bind hydrogen ions

under different conditions (i.e. pH affects whether H

ions are joined to the acidic and

basic sites) and we are assuming standard proportions of the various isotopes.

Nonetheless this estimate will be useful enough to say where we expect DNA or protein to

lie on an electrophoresis gel

or mass spectrometer trace.

Firstly, we define a function, hopefully with a sensible and informative name, and

specify that it takes one argument seq, which is a sequence, and one argument molType,

which states whether we are using a protein sequence, a DNA sequence or an RNA

sequence. Note that we set a default value for molType to be ‘protein’, so that we can

work with protein sequences without having to explicitly specify the value.

Inside the function we define a dictionary that stores the average molecular weights of

the different kinds of residue. Internally this dictionary contains three inner sub-

dictionaries, one for each of the different molecule types. We access the correct inner

dictionary using the molType as a key. The one-letter residue codes then act as the keys to

the inner dictionary to extract the appropriate molecular masses.

Next we define a variable to hold the total for the molecular mass. This is initially

defined with a value equal to that of the molecular mass of water, because the average

residue masses in the dictionary do not take account of the end residues that have extra

atoms (OH at one end and H at the other) because they are only linked on one side, instead

of both sides.

def estimateMolMass(seq, molType='protein'):

"""Calculate the molecular weight of a biological sequence assuming

normal isotopic ratios and protonation/modification states

"""

residueMasses = {

"DNA": {"G":329.21, "C":289.18, "A":323.21, "T":304.19},

"RNA": {"G":345.21, "C":305.18, "A":329.21, "U":302.16},

"protein": {"A": 71.07, "R":156.18, "N":114.08, "D":115.08,

"C":103.10, "Q":128.13, "E":129.11, "G": 57.05,

"H":137.14, "I":113.15, "L":113.15, "K":128.17,

"M":131.19, "F":147.17, "P": 97.11, "S": 87.07,

"T":101.10, "W":186.20, "Y":163.17, "V": 99.13}}

massDict = residueMasses[molType]

# Begin with mass of extra end atoms H + OH

molMass = 18.02

for letter in seq:

molMass += massDict.get(letter, 0.0)

return molMass

The for loop extracts each element of the sequence in turn, which will be a single

nucleotide or amino acid letter. This letter is then used to look up the appropriate value of

molecular mass in the dictionary. The .get() function of the dictionary is used so that a

default value for the mass can be specified, just in case we have a letter in the sequence

that is not in the dictionary. In such a circumstance using a guess for an average mass of

an unrecognised residue, rather than 0.0, may be appropriate under some circumstances.

The molecular mass of the current residue is then added to the total, and the for loop

moves onto the next letter in the sequence. Finally the return statement is used so that the

value of the total molecular mass is passed back to the point in the program where the

function was called from. To test this function we could do something like:

proteinSeq = 'IRTNGTHMQPLLKLMKFQKFLLELFTLQKRKPEKGYNLPIISLNQ'

proteinMass = estimateMolMass(proteinSeq)

or for DNA, noting that we have to specify the molecule type:

dnaMass = estimateMolMass(dnaSeq, molType='DNA')

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 141 142 143 144 145 146 147 148 ... 514