Python Programming for Biology: Bioinformatics and Beyond


Protein hydrophobicity plot



Download 7,75 Mb.
Pdf ko'rish
bet148/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   144   145   146   147   148   149   150   151   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Protein hydrophobicity plot

Now we will move on to another example which produces data which we can display as a

graph, but this time it will be for a protein sequence. The task here is to generate a plot of

how water-hating, or to use the proper term hydrophobic, a given stretch of residues is. An

amino acid may  be hydrophobic if  the atoms in  its side chain

9

represent an arrangement



that  does  not  favour  interactions  with  water  molecules;  typically  this  means  they  don’t

carry a charge or chemical groups that can form any hydrogen bonds. It is often useful to




find  such  hydrophobic  regions  because  by  shunning  water  they  make  important

interactions  inside  the  folded  core  of  proteins  or  allow  a  protein  to  be  inserted  into  a

cellular membrane. It is in the context of cell membranes that this example is based.

A cellular membrane is a double layer (bilayer) of hydrophobic lipid

10

molecules into



which  specific  proteins  are  embedded  by  virtue  of  a  hydrophobic  anchor.  A  membrane

defines  the  outer  extent  of  each  cell,  and  various  internal  compartments,  with  special

functions,  inside  it.  Biologically  the  lipid  component  of  a  membrane  creates  a  barrier  to

most molecules and the protein component allows selective passage for some molecules,

in line with the requirements of the cell.

The  next  example  function  aims  to  predict  whether  a  protein  possesses  a  sufficiently

hydrophobic  segment  of  residues  (which  will  fold  into  a  helix)  that  will  allow  it  to  be

inserted  into  a  cell’s  system  of  membranes.  This  is  a  simplistic  prediction,  as  in  reality

there  are  other  factors  that  govern  whether  a  segment  is  used,  but  nonetheless  it  is

sufficiently accurate to find over 70% of membrane spans.

Initially  we  define  a  hydrophobicity  scale:  a  number  associated  with  each  amino  acid

letter that says how water-hating it is. For this example we will use the GES scale,

11

but


there are several others to choose from.

GES_SCALE = {'F':-3.7,'M':-3.4,'I':-3.1,'L':-2.8,'V':-2.6,

'C':-2.0,'W':-1.9,'A':-1.6,'T':-1.2,'G':-1.0,

'S':-0.6,'P': 0.2,'Y': 0.7,'H': 3.0,'Q': 4.1,

'N': 4.8,'E': 8.2,'K': 8.8,'D': 9.2,'R':12.3}

We define the function that will perform the search so that it accepts a protein sequence

and hydrophobicity scale dictionary as mandatory inputs, and an optional input to specify

a  search  window  size.  The  philosophy  of  this  function  differs  a  little  from  those  above

because  it  includes  an  optimisation  to  calculate  quickly;  i.e.  minimising  the  number  of

operations performed.

An index i is defined to loop through the sequence and, because it is useful in several

spots, we define j to be i plus the search width. The adding up of the hydrophobicity score

for  each  segment  can  take  place  inside  one  of  two  separate  sections,  depending  on  the

result of an if statement. This statement is set up such that the first time we add up scores

(detected  by  the  score  being  at  its  start  value  of  None)  we  consider  all  of  the  positions

from  i  up  to  j.  After  this  first  summation,  rather  than  repeating  the  summation  for  the

whole  of  the  next  section,  we  use  the  fact  that  the  next  section  only  differs  from  the

previous one at its first and last positions. Accordingly, to get the score for the next section

we take the existing score and take away the score of the residue we have just left behind

(i-1) and add the score of the new end residue (j-1: we go up to but do not include position

j). This is a speed optimisation because overall fewer operations are performed, but it will

be prone to the accumulation of small floating point errors: however, such errors will not

grow to anything significant for something as short as a protein sequence.

def hydrophobicitySearch(seq, scale, winSize=15):

"""Scan a protein sequence for hydrophobic regions using the GES

hydrophobicity scale.

"""



score = None

scoreList = []

for i in range(len(seq)- winSize):

j = i + winSize

if score is None:

score = 0

for k in range(i,j):

score += scale[seq[k]]

else:

score += scale[seq[j-1]]



score -= scale[seq[i-1]]

scoreList.append(score)

return scoreList

As  before  we  can  execute  the  function  with  an  example  sequence  and  plot  the  results

with Matplotlib.

from matplotlib import pyplot

scores = hydrophobicitySearch(proteinSeq, GES_SCALE)

pyplot.plot(scores)

pyplot.show()


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   144   145   146   147   148   149   150   151   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish