Python Programming for Biology: Bioinformatics and Beyond


Sequence Residue proportions



Download 7,75 Mb.
Pdf ko'rish
bet151/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   147   148   149   150   151   152   153   154   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Sequence

Residue proportions

Relative entropy (bits)

G

C

A

T

GGGGGGGG


1.000

0.000


0.000

0.000


2.00

TCTCTCTC


0.000

0.500


0.000

0.500


1.00

GGGGCCCC


0.500

0.500


0.000

0.000


1.00

GAAGACGA


0.375

0.125


0.500

0.000


0.59

GCATTACG


0.250

0.250


0.250

0.250


0.00


The  makeup  of  the  function  is  quite  simple:  it  takes  the  input  sequence  and  search

window  size,  and  also  an  optional  variable  to  flip  between  using  DNA  and  protein

sequences.  Note  that  the  length  of  the  input  sequence  lenSeq  is  recorded  before  we

perform our cheat of copying residue codes from the beginning of the sequence to make

an artificial extension. Before entering the loop we define the residue codes that we will

use  for  the  relative  entropy  calculation,  i.e.  the  four  nucleotides  for  DNA  or  20  amino

acids for protein. Within the loop, rather than using the .append()method to add each score

to the list we first make a blank list of the right size by repeating zeros ([0.0] * lenSeq), to

which the individual scores are then added. This technique can often be used to make your

code  execute  more  quickly.  The  index  i  in  the  for  loop  goes  all  the  way  up  to  lenSeq,

because  we  have  added  the  extra  sequence  and  the  entropy  score  is  calculated  from  the

sub-sequence that is sliced out of the input sequence.

def relativeEntropySearch(seq, winSize, isProtein=False):

"""Scan a sequence for repetitiveness by calculating relative

information entropy.

"""


lenSeq = len(seq)

scores = [0.0] * lenSeq

extraSeq = seq[:winSize]

seq += extraSeq

if isProtein:

resCodes = 'ACDEFGHIKLMNPQRSTVWY'

else:

resCodes = 'GCAT'



for i in range(lenSeq):

subSeq = seq[i:i+winSize]

scores[i] = calcRelativeEntropy(subSeq, resCodes)

return scores

We can then test the function in the usual manner, and make a graph of the results:

from matplotlib import pyplot

dnaScores = relativeEntropySearch(dnaSeq, 6)

proteinScores = relativeEntropySearch(proteinSeq, 10, isProtein=True)

pyplot.plot(dnaScores)

pyplot.plot(proteinScores)

pyplot.show()


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   147   148   149   150   151   152   153   154   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish