Python Programming for Biology: Bioinformatics and Beyond


Generating an alignment profile



Download 7,75 Mb.
Pdf ko'rish
bet182/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   178   179   180   181   182   183   184   185   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Generating an alignment profile


The  second  alignment  analysis  we  will  do  is  to  generate  a  profile.  A  profile  is  a  per-

position  statistic  saying  how  much  of  each  kind  of  residue  there  is  at  a  given  location.

Because  a  profile  has  positional  information  it  can  still  be  aligned  just  like  a  simple

sequence alignment, only this time we are dealing with fractions of residue types, rather

than single residues. The Python function to generate a profile from an alignment is given

below.  It  is  very  similar  to  the  consensus  generation,  excepting  that  we  don’t  have  to

choose a winning residue. The profile that is generated at the end is represented as a list of

dictionaries; we locate an alignment position with a list index, then the sub-dictionary for

this  index  gives  the  fractions  of  each  residue  type  present.  Note  how  we  use  the  counts

dictionary initially to do as the name suggests, and store counts, but later it is used to store

the fractions; there is no reason to introduce another variable.

def profile(alignment):

n = len(alignment[0])

nSeq = float(len(alignment))

prof = []

for i in range(n):

counts = {}

for seq in alignment:

letter = seq[i]

if letter == '-':

continue

counts[letter] = counts.get(letter, 0) + 1

for letter in counts:

counts[letter] /= nSeq

prof.append(counts)

return prof

alignment = ['SRPAPVVIILIILCVMAGVIGTILLISYGIRLLIK',

'TVPAPVVIILIILCVMAGIIGTILLISYTIRRLIK',

'HHFSEPEITLIIFGVMAGVIGTILLISYGIRRLIK',

'HEFSELVIALIIFGVMAGVIGTILFISYGSRRLIK']

print(profile(alignment))

# First sub-dict: {'H': 0.5, 'S': 0.25, 'T': 0.25}

Profiles of this kind are often used as position-specific scoring matrices, for example, in

programs  like  PSI-BLAST.

2

 The  idea  here  is  that  you  build  a  profile  for  a  family  of



sequences which have something of interest in common, and then search other sequences

with the whole profile. This allows you to find sequences that share the properties of your

aligned  family  as  a  whole,  rather  than  finding  ones  that  are  similar  to  the  individual

members;  this  increases  the  sensitivity  of  similarity  searches.  A  family  profile  conveys

family-specific  information,  like  the  presence  of  a  highly  conserved  (or  invariant)  site,

which would not be recorded if you looked for sequences with a general substitution table.





Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   178   179   180   181   182   183   184   185   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish