Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet200/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   196   197   198   199   200   201   202   203   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Measuring conservation

As  the  first  example  we  define  the  getConservation  function,  which  expects  input

arguments consisting of an alignment and a substitution matrix, so that we can score the

similarity between sequence elements. Note how initially, after defining a conservation list

which  we  will  fill  and  pass  back  at  the  end,  we  use  the  profile()  function  defined  in  the

previous  chapter  to  convert  the  alignment  into  a  profile:  a  list  of  dictionaries  containing

the composition of each position. In order to use the profile-generating function we must



either  have  defined  it  in  the  same  Python  file  or  use  an  import  command  like  the  one

below  to  access  it  from  another  location.  The  module  import  we  illustrate  is  what  you

would  need  if  importing  from  MultipleAlign.py  which  has  been  placed  on  the  current

Python  import  path  (e.g.  current  working  directory).  This  file  is  downloadable  from  the

on-line

archive


which

accompanies

this

book


(see

http://www.cambridge.org/pythonforbiology

).

from MultipleAlign import profile



def getConservation(align, simMatrix):

conservation = []

prof = profile(align)

So armed with a compositional profile generated from the input alignment we next have

to  convert  the  compositional  data  for  each  alignment  position  into  a  single  value

representing  the  degree  of  residue  conservation,  according  to  some  input  matrix  of

similarity  scores.  Thus  we  loop  through  all  of  the  dictionaries  of  the  profile.  Within  the

loop,  for  each  composition  dictionary  we  call  the  .items()  function,  which  is  an  inbuilt

function  or  method  of  standard  Python  dictionaries.  In  Python  2  this  generates  a  list  of

(key, value) pairs which serves two purposes: firstly we will need both the residue letter

(key)  and  composition  (value)  when  we  look  through  all  the  residue  combinations,  and

secondly having a list allows us to sort the data. In Python 3, the .items() function returns

a view onto the items and a list() function is applied to this to get an actual list.

for compDict in prof:

items = list(compDict.items()) # list() not needed in Python 2

We sort the items list according to the composition value, the second element of each

pair,  so  that  we  get  a  ranking  from  which  we  can  find  the  highest-scoring  residue  code.

Note that if we sorted according to the first element of the pairs we would get the list in

alphabetical  order,  not  the  required  scoring  order.  The  sort  is  performed  using  the

specified key function, which takes as its one argument an element of the list, and returns

the desired comparison value determined by that element. Here that is the second element,

x[1], which is used for the comparison. See

Chapter 5

for a more detailed description of

lambda. If we don’t pass a key function then the list would be sorted according to x itself,

which means the first element, x[0].

items.sort( key=lambda x: x[1] )

Next  we  define  an  initial  score  of  zero  and  then  make  two  loops  through  the  items

containing  residue  codes  and  composition  values,  so  that  we  generate  all  possible

combinations  of  two  residue  types.  The  score  for  each  combination  is  simply  the

multiplication of the compositional fractions and the similarity score from the substitution

matrix. You can imagine the two composition values defining lengths on the side of a unit

square, and thus their product represents an area of the whole.

score = 0.0




for resA, compA in items:

for resB, compB in items:

score += compA * compB * simMatrix[resA][resB]

With  the  score  total  for  this  position  defined  we  now  scale  its  value  so  that  we  can

easily compare different positions. Accordingly, we find the maximum possible value for

the  score  at  this  position  and  divide  by  that,  so  that  all  scores  will  be  between  zero  and

one.  We  define  the  best  possible  score  as  being  the  diagonal  element  of  the  substitution

matrix for the residue with the highest composition at this location. Accordingly, we get

the best residue code from the end of our sorted list of items (index -1 to get the highest

value) and use it as both of the keys in the matrix dictionary look-up.

bestLetter = items[-1][0]

maxScore = simMatrix[bestLetter][bestLetter]

The score is scaled by dividing it by the maximum value and added to the conservation

list, which once filled is then passed back at the very end, outside the profile loop.

score /= maxScore

conservation.append(score)

return conservation

We  can  test  the  function  by  defining  some  alignments  and  calling  the  function  with  a

substitution matrix appropriate to DNA or protein. Note that we can import the previously

defined matrices, which are part of the file Alignments.py.

from Alignments import DNA_1, BLOSUM62

align1 = ['AAGCCGCACACAGACCCTGAG',

'AAGCTGCACGCAGACCCTGAG',

'AGGCTGCACGCAGACCCTGAG',

'AAGCTGCACGTGGACCCTGAG',

'AGGCTGCACGTGGACCCTGAG',

'AGGCTGCACGTGGACCCTGAG',

'AAGCTGCATGTGGACCCTGAG']

print(getConservation(align1, DNA_1))

# [1.0, 0.51, 1.0, 1.0, 0.76, 1.0, …]

align2 = ['QPVHPFSRPAPVVIILIILCVMAGVIGTILLISYGIRLLIK-------------',

'QLVHRFTVPAPVVIILIILCVMAGIIGTILLISYTIRRLIK-------------',

'QLAHHFSEPE---ITLIIFGVMAGVIGTILLISYGIRRLIKKSPSDVKPLPSPD',

'QLVHEFSELV---IALIIFGVMAGVIGTILFISYGSRRLIKKSESDVQPLPPPD',

'MLEHEFSAPV---AILIILGVMAGIIGIILLISYSIGQIIKKRSVDIQPPEDED',

'PIQHDFPALV---MILIILGVMAGIIGTILLISYCISRMTKKSSVDIQSPEGGD',

'QLVHIFSEPV---IIGIIYAVMLGIIITILSIAFCIGQLTKKSSLPAQVASPED',

'-LAHDFSQPV---ITVIILGVMAGIIGIILLLAYVSRRLRKRP-----PADVP-',

'SYHQDFSHAE---ITGIIFAVMAGLLLIIFLIAYLIRRMIKKPLPVPKPQDSPD']

print(getConservation(align2, BLOSUM62))

# [0.3, 0.38, 0.09, 0.8, 0.08, 1.0, 0.64, 0.1, 0.32, 0.27, …]




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   196   197   198   199   200   201   202   203   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish