Python Programming for Biology: Bioinformatics and Beyond


Calculating sequence similarity



Download 7,75 Mb.
Pdf ko'rish
bet169/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   165   166   167   168   169   170   171   172   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Calculating sequence similarity

The  next  example  of  a  Python  function  will  consider  a  substitution  matrix  like  the  ones

discussed and use it to calculate an overall similarity score for two aligned sequences. The

inputs to the function are two strings of sequence letters and the similarity matrix.

def calcSeqSimilarity(seqA, seqB, simMatrix):

numPlaces = min(len(seqA), len(seqB))

totalScore = 0.0

for i in range(numPlaces):

residueA = seqA[i]

residueB = seqB[i]

totalScore += simMatrix[residueA][residueB]



return totalScore

# Test with pre-defined substitution matrices

# DNA example

print(calcSeqSimilarity('AGCATCGCTCT', 'AGCATCGTTTT', DNA_2))

# Protein example

print(calcSeqSimilarity('ALIGNMENT', 'AYIPNVENT', BLOSUM62))

The  calcSeqSimilarity()  function  is  very  similar  in  construction  to  the  previous

calcSeqIdentity() function, except that this time rather than seeing if two residue letters are

equal, we use them as keys to look up a similarity score in the substitution matrix. Note

that  this  function  has  one  big  deficiency:  it  cannot  deal  with  gaps  (‘-’).  To  address  this

problem  we  could  put  entries  for  gaps  into  the  similarity  matrix.  However,  a  simpler

solution  is  to  introduce  a  separate  gap  penalty;  gaps  are  generally  undesirable  but  are

tolerable if the subsequent alignment matches well. If we complicate things slightly more

we can have different gap penalties depending on whether we are inserting a new gap or

extending an existing one. Generally extending an existing gap (i.e. putting one dash after

another)  has  the  smaller  penalty.  This  is  equivalent  to  saying  that  we  score  alignments

more  highly  if  they  use  fewer,  longer  gapped  regions.  The  following  modified  function

uses gap penalties insert and extend, which both carry default values. Note that we have a

different  name  for  the  new  function  and  its  input  sequences  (alignA  and  alignB)  to

reinforce  the  fact  that  it  is  working  on  a  pair  or  aligned  sequences,  including  any  gaps,

rather than just plain sequences:

Pay  special  attention  to  the  logic  above  in  the  if/elif/else  statement.  If  a  gap  is  not

amongst the two residue codes then the score for the pair is obtained as before from the

similarity matrix dictionary. Otherwise, we do have a gap and thus carry on to check the

other two conditions. If the position in the sequence i is not at the very start (i > 0) and one



of the previous positions was a gap we subtract the extend penalty. And if all else fails we

have a gap that starts anew, so we subtract the insert penalty.




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   165   166   167   168   169   170   171   172   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish