Python Programming for Biology: Bioinformatics and Beyond


Calculating an alignment score



Download 7,75 Mb.
Pdf ko'rish
bet167/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   163   164   165   166   167   168   169   170   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Calculating an alignment score

Our  first  Python  examples  dealing  with  sequence  alignments  are  simple  subroutines  to

give a measure of how good an alignment is. This is important because although there are

many  possible  ways  of  arranging  letters  to  get  an  alignment  the  one  we  almost  always

want is the best one: the one with the highest score. The easiest way to calculate a score

for two aligned sequences is to calculate how many residue pairs are identical; this is to

say we can measure the sequence identity.

Sequence identity

The  Python  function  to  measure  sequence  identity  is  fairly  simple.  It  accepts  two

sequences and compares them under the assumption that these sequences are aligned, i.e.

that the first position in one is equivalent to the first position in the other. The procedure is

then  to  start  with  a  score  of  zero  and  each  time  we  have  a  position  in  both  sequences

where  the  letter  is  the  same  we  add  one  to  the  score.  At  the  end  we  give  back  the  score

divided by how many places we compared, so that we have an average. We also multiply

this by 100 so that we get a percentage figure, which is the conventional representation.

Note that in the function we define the number of places for comparison (numPlaces) as

the length of the shortest sequence, i.e. we take the minimum of the two sequence lengths

to guarantee that we won’t overshoot the smallest of the pair.

def calcSeqIdentity(seqA, seqB):

numPlaces = min(len(seqA), len(seqB))

score = 0.0

for i in range(numPlaces):

if seqA[i] == seqB[i]:

score += 1.0

return 100.0 * score/numPlaces

We  can  now  test  this  function  by  defining  some  short  protein  sequences,  and  then



applying the function to pairs of sequences

seq1 = 'ALIGNMENTS'

seq2 = 'ALIGDVENTS'

seq3 = 'ALIGDPVENTS'

seq4 = 'ALIGN-MENTS'

print(calcSeqIdentity(seq1, seq2)) # 80.0%

print(calcSeqIdentity(seq1, seq3)) # 40.0%

print(calcSeqIdentity(seq4, seq3)) # 72.7%

Note  that,  as  expected  when  seq1  is  compared  to  seq2  the  result  returned  is  80%.

However, for seq1 and seq3, which as you will note only differ by one extra ‘P’ residue,

the value falls to 40%. This is because although the first few residues align well, once the

extra residue is reached the sequences are no longer in step. The situation is resolved by

inserting a dash (representing a gap) in the shorter sequence, so that ‘P’ is paired with ‘-’

and  the  ends  of  the  sequences  align  nicely.  Accordingly  seq4  and  seq3  give  a  score  of

72.7%  (eight  out  of  eleven).  As  you  can  see  inserting  a  gap  in  the  right  place  can  be

crucial. We will illustrate later how gaps can be inserted automatically to give an optimal

alignment without human intervention.


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   163   164   165   166   167   168   169   170   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish