Python Programming for Biology: Bioinformatics and Beyond


Profile-based multiple alignment



Download 7,75 Mb.
Pdf ko'rish
bet185/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   181   182   183   184   185   186   187   188   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Profile-based multiple alignment

Before  we  define  the  next  function  we  make  the  required  imports  to  get  a  substitution

matrix (used for testing) and the regular pairwise alignment function sequenceAlign(). The

module  we  are  importing  from  relates  to

Chapter  12

 and  is  available  in  the  on-line

material  for  this  book  (

http://www.cambridge.org/pythonforbiology

).  The  profile-based

multiple-alignment function takes a list of unaligned sequences and a substitution matrix

as input. Then we initialise a value for the number of input sequences.

from Alignments import BLOSUM62, sequenceAlign

def simpleProfileMultipleAlign(seqs, simMatrix):

n = len(seqs)

We  begin  the  multiple  alignment  by  aligning  the  first  two  sequences  and  placing  the



gapped output in the multipleAlign list. Obviously we would need to check that we had at

least two sequences before using this function. Note that if we were doing this properly we

would need to start with the edges of a genetic tree.

score, alignA, alignB = sequenceAlign(seqs[0], seqs[1], simMatrix)

multipleAlign = [alignA, alignB]

Next the function loops through an index, one for each input sequence, but starting at 2.

We  start  here  because  we  have  already  used  the  first  two  sequences  to  get  the  multiple

alignment  started.  Then  in  the  loop  we  generate  two  profiles,  one  for  the  existing

alignment and one for the sequence to be added, seqs[i], noting that this is placed inside a

list as required by the profile-generating function.

for i in range(2,n):

profA = profile(multipleAlign)

toAdd = [seqs[i],]

profB = profile(toAdd)

Now  we  do  the  alignment  of  the  two  profiles,  and  get  two  gapped  profile  alignments

back, as well as a score which in this instance we ignore.

score, alignA, alignB = profileAlign(profA, profB, simMatrix)

With the two gapped and aligned profiles, we have to use the positions of gap insertions

to  combine  the  new  sequence  into  the  existing  multiple  alignment.  Note  how  the  output

profiles  are  just  a  guide  to  place  insertions;  the  deepening  alignment  always  uses  the

original sequences. We repeat this operation twice, once to collect and insert gaps for the

starting  multiple  alignment,  and  once  for  the  newly  added  sequences.  Accordingly,  we

loop through the alignment using enumerate() so that we extract both an index number and

a fractions dictionary (which is what a profile is composed of). If the fractions dictionary

is None, i.e. missing, then we have a gap in that profile and we place the current index in a

list of gap positions.

gaps = []

for j, fractions in enumerate(alignA):

if fractions is None:

gaps.append(j)

Once the gap locations are collected, all the sequences in that part of the alignment are

looped through and are redefined with extra dashes, i.e. we add a column of ‘-’ at the gap

location. In this round we are putting dashes into the original multipleAlign list.

for j, seq in enumerate(multipleAlign):

for gap in gaps:

seq = seq[:gap] + '-' + seq[gap:]

multipleAlign[j] = seq

Then  a  second  round  of  gap  insertion  is  done,  this  time  collecting  locations  from  the

second align profile, alignB, and placing the gaps in the toAdd list.

gaps = []

for j, fractions in enumerate(alignB):

if fractions is None:




gaps.append(j)

for j, seq in enumerate(toAdd):

for gap in gaps:

seq = seq[:gap] + '-' + seq[gap:]

toAdd[j] = seq

With all the gaps placed, the toAdd list is then added to the multipleAlign list to form a

new, deeper alignment. The .extend() function is used because we are joining two lists.

multipleAlign.extend(toAdd)

Finally,  outside  the  loop,  having  combined  all  of  the  sequences,  we  pass  back  the

completed multiple alignment and test the function.

return multipleAlign

# To test and print out the result

seqs = ['SRPAPVVLIILCVMAGVIGTILLISYGIRLLIK',

'TVPAPVVIILIILCVMAGIIGTILLLIISYTIRRLIK',

'HHFSEPEITLIIFGVMAGVIGTILLLIISYGIRLIK',

'HFSELVIALIIFGVMAGVIGTILFISYGSRLIK']

align = simpleProfileMultipleAlign(seqs, BLOSUM62)

for k, seq in enumerate(align):

print(k, seq)

# Result:

# 0 -SRPAPVV--LIILCVMAGVIGTI--LLISYGIRLLIK

# 1 -TVPAPVVIILIILCVMAGIIGTILLLIISYTIRRLIK

# 2 HHFSEPEI-TLIIFGVMAGVIGTILLLIISYGIR-LIK

# 3 -HFSELVI-ALIIFGVMAGVIGTI--LFISYGSR-LIK

Note that in the above test, after we have obtained the alignment we loop through the

list of sequences of which it is composed using the inbuilt enumerate() function. This is so

we efficiently get back an index number at the same time as the one-letter sequences.


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   181   182   183   184   185   186   187   188   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish