Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet201/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   197   198   199   200   201   202   203   204   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Conservation display

In  the  next  example  we  will  use  the  conservation  values  to  make  a  very  simple  display,

consisting of symbols that can be placed under the alignment to indicate which positions

are conserved and which vary. Such a display, often involving ‘*’, ‘:’ and ‘.’ characters to

represent perfect, high and moderate conservation respectively, is very commonly seen in

the output of programs that generate multiple alignments. A more sophisticated graphical

display  might  involve  rendering  text  with  different  colours.  We  will  not  give  such  an

example here, but the same conservation values could be used.

The  function  is  defined  as  accepting  an  alignment,  a  similarity  matrix  and  a  list  of

threshold  values  as  input  arguments.  The  list  of  thresholds  will  define  the  values  that

separate  the  conservation  into  different  categories,  and  hence  dictate  which  symbol  is

used.  After  the  initial  function  definition  we  define  an  empty  string  which  will  be  filled

with  symbols  to  indicate  the  similarity  at  each  position.  We  use  the  previously  defined

getConservation()  function  to  extract  the  conservation  values  from  the  input  alignment,

and we extract the list of thresholds as three singular values.

def makeSimilarityString(align, simMatrix, thresholds):

simString = ''

conservation = getConservation(align, simMatrix)

t1, t2, t3 = thresholds

for score in conservation:

if score >= t1:

symbol = '*'

elif score >= t2:

symbol = ':'

elif score >= t3:

symbol = '.'

else:

symbol = ' '



simString += symbol

return simString

The  remainder  of  the  function  simply  involves  looping  through  each  score  and

comparing  it  with  the  threshold  values,  which  are  assumed  to  be  in  descending  order.  If

the score is at or above the first threshold, the symbol is defined as ‘*’. Otherwise if it’s at

or above the second it is a ‘:’, and at or above the third value a ‘.’. Finally, if the score was

below  the  smallest  threshold  the  symbol  is  set  to  a  space,  representing  little  or  no

conservation.  At  the  end  of  each  loop  the  symbol  is  added  to  the  growing  text  string,

which is then passed back at the return statement.

We  test  this  function  by  printing  out  the  sequences  in  the  alignment  with  the  line  of

newly generated conservation symbols underneath.

symbols = makeSimilarityString(align2, BLOSUM62, (1.0, 0.5, 0.3))




for seq in align2:

print(seq)

print(symbols)

And the result in this case is:

QPVHPFSRPAPVVIILIILCVMAGVIGTILLISYGIRLLIK-------------

QLVHRFTVPAPVVIILIILCVMAGIIGTILLISYTIRRLIK-------------

QLAHHFSEPE---ITLIIFGVMAGVIGTILLISYGIRRLIKKSPSDVKPLPSPD

QLVHEFSELV---IALIIFGVMAGVIGTILFISYGSRRLIKKSESDVQPLPPPD

MLEHEFSAPV---AILIILGVMAGIIGIILLISYSIGQIIKKRSVDIQPPEDED

PIQHDFPALV---MILIILGVMAGIIGTILLISYCISRMTKKSSVDIQSPEGGD

QLVHIFSEPV---IIGIIYAVMLGIIITILSIAFCIGQLTKKSSLPAQVASPED

-LAHDFSQPV---ITVIILGVMAGIIGIILLLAYVSRRLRKRP-----PADVP-

SYHQDFSHAE---ITGIIFAVMAGLLLIIFLIAYLIRRMIKKPLPVPKPQDSPD

.. : *: . :. **: **:*::..*::::: ...:.*: .

In  the  above  examples  we  have  been  thinking  of  conservation  as  being  an  amount  of

something.  However,  we  can  also  say  in  what  way  the  residues  are  being  conserved,

especially  for  amino  acids  that  have  chemical  characteristics  which  may  be  preserved,

even  if  the  exact  residue  types  are  not.  An  example  would  be  a  position  where  we  have

histidine, lysine and arginine residues (‘H’, ‘K’ and ‘R’), where although the amino acid

type does change, the position is one that can be categorised as having a positive electric

charge, a property of all three types.

Here  we  illustrate  a  Python  list  that  defines  some  amino  acid  categories,  based  upon

their properties. Each category is represented by a pair of items in a tuple. The first item is

the name of the category and the second is a string containing all of the residue codes for

that  category.  Note  that  we  have  simple  one-letter  categories  for  the  individual  amino

acids, so that if an alignment only has one residue type in a given position the pure amino

acid is stated, rather than giving a larger, less precise category.

AA_CATEGORIES = [('G','G'), ('A','A'), ('I','I'), ('V','V'),

('L','L'), ('M','M'), ('F','F'), ('Y','Y'),

('W','W'), ('H','H'), ('C','C'), ('P','P'),

('K','K'), ('R','R'), ('D','D'), ('E','E'),

('Q','Q'), ('N','N'), ('S','S'), ('T','T'),

('-','-'),

('acidic', 'DE'),

('hydroxyl', 'ST'),

('aliphatic', 'VIL'),

('basic', 'KHR'),

('tiny', 'GAS'),

('aromatic', 'FYWH'),

('charged', 'KHRDE'),

('small', 'AGSVTDNPC'),

('polar', 'KHRDEQNSTC'),

('hydrophobic','IVLFYWHAGMC'),

('turnlike', 'GASHKRDEQNSTC'),

('undef', 'AGVILMFYWHCPKRDEQNST-')]

The function example that employs amino acid categorisation naturally takes this list as

an argument, as well as an alignment to work on. Inside the function a list is defined that



will  be  filled  with  the  residue  categories,  one  for  each  position,  then  the  next  command

makes a profile from the alignment using the previously discussed profile() function.

from MultipleAlign import profile

def getAlignProperties(align, categories):

properties = []

prof = profile(align)

The  bulk  of  the  function  involves  looping  through  the  dictionaries  containing

composition  fractions,  defined  for  the  positions  of  profile,  and  getting  the  list  of  residue

letters. Here the actual composition values are not used, but the dictionary keys give us the

list of residue types found at that location.

for fracDict in prof:

letters = fracDict.keys()

Given the residue letters the task is now to go through each category, which is listed in

order  of  size  from  smallest  to  largest,  to  find  the  first,  and  hence  smallest,  category  that

contains all of the residue letters for this position. Accordingly for each category in turn

we make a loop through each letter and determine if it is within the string of letters for that

category  (group).  If  the  letter  is  not  in  the  category,  then  the  current  category  cannot

possibly be right, so we use the break keyword to quit the current loop through the letters

immediately, thus going on to test the next category. If there is no residue letter that causes

the break to be triggered, i.e. all letters were present for the group, then the program flow

will reach the else: statement, whereupon the name of the current category is added to the

list to represent the current alignment position. With that done we don’t need to test any

more categories for this position so a different break  is  issued,  this  time  to  quit  the  loop

through categories, thus going on to the next position. At the very end of the function the

list of properties is returned as you would expect.

for name, group in categories:

for letter in letters:

if letter not in group:

break # quit inner loop

else: # all letters are in group

properties.append(name)

break # quit outer loop

return properties

The function is tested with an alignment and the categories list for the amino acids. In

this  instance  we  print  the  results  by  looping  through  the  category  names  and  their  index

numbers simultaneously, courtesy of Python’s enumerate() function

catNames = getAlignProperties(align2, AA_CATEGORIES)

for i, category in enumerate(catNames):




print(i, category)


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   197   198   199   200   201   202   203   204   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish