Python Programming for Biology: Bioinformatics and Beyond


Designing a molecular structure data model



Download 7,75 Mb.
Pdf ko'rish
bet100/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   96   97   98   99   100   101   102   103   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Designing a molecular structure data model

In this chapter we construct and implement an example data model which represents the

three-dimensional structures of large biological molecules. If you are unfamiliar with the

basic  principles  of  biological  molecules  and  their  structures,  see  the  introductions  to

Chapters  11

 and


15

,  which  aim  to  be  suitable  for  non-biologists.  Specifically  the  data

model  will  be  for  linear  polymers,  such  as  DNA,  RNA  and  protein,  where  a  longer

molecule  is  built  of  smaller  components  linked  together  into  a  chain.  It  is  a  relatively

simple  data  model,  and  it  could  certainly  be  extended,  but  we  will  avoid  adding

complications and keep things as clear as possible for this book. As such, we will make

various simplifying assumptions about molecules and biology, but that is the case with all

data  models,  it  is  all  just  a  matter  of  degree.  Specifically,  we  will  ignore  issues  such  as

how the molecules might have a few extra or a few absent atoms (mostly hydrogen ions

and small modifications) or how the molecules might have extra links, which are not part

of the main linear chain, like the disulphide links found in some proteins. We will not use

any formal computer methods to describe the construction of the data model. Instead, we

will rely upon relatively plain English. There are formal modelling techniques, like UML

(Unified Modeling Language), for example, but such things are well beyond the scope of

this book.

Our model will describe the identities and the relative three-dimensional positions of all

of the atoms which collectively can be considered a macromolecular structure; the precise

shape  of  large  biological  molecules.  This  structure  may  be  composed  of  any  number  of

polymer  molecules  that  come  together,  but  is  frequently  used  to  describe  just  one

molecule.  Each  molecular  chain  will  have  a  distinct  biological  type,  i.e.  DNA,  RNA  or

protein, and we can mix polymer types however we like. For example, we might want to

consider the structure of a protein bound to a section of DNA.

We  will  sometimes  expect  more  than  one  set  of  three-dimensional  coordinates  for  a

given  molecule,  which  means  that  for  the  same  set  of  atoms  we  can  describe  alternative

arrangements  or  conformations.  Describing  multiple  conformations  is  useful  to  indicate

situations  where  the  precise  structure  is  uncertain  and  to  describe  the  outcome  of

dynamical  simulations  of  the  molecule,  where  each  set  of  coordinates  could  represent  a

different  point  in  time  or  a  different  outcome.  By  allowing  discrete  collections  of

coordinates  for  a  given  molecule,  we  generate  what  is  sometimes  referred  to  as  a

structural  ensemble.  This  term  is  used  to  emphasise  the  ‘togetherness’  of  a  bundle  of

related conformations.

In  our  model  we  will  identify  a  given  structure  by  a  name,  which  will  be  a  textual

identifier,  and  we  will  also  include  a  non-mandatory  property,  the  Protein  Data  Bank

identifier,  to  indicate  when  the  data  has  come  from  an  entry  in  the  main  biological

coordinate  database.  The  Worldwide  Protein  Data  Bank

1

is a publicly available database



that  stores  the  structures  of  molecules.  These  were  mostly  determined  by  X-ray

crystallography  but  many  have  been  determined  by  other  techniques  such  as  nuclear

magnetic  resonance  (NMR).  Despite  the  name  suggesting  that  the  PDB  database  is  only

for  proteins,  these  days  it  contains  coordinate  data  for  DNA  and  RNA  too,  although  the

protein structures vastly outnumber the other types. The structures that we are modelling

might  have  been  entered  into  this  database,  and  we  want  to  keep  track  of  that.




Accordingly,  we  use  the  textual  PDB  identifier  that  is  unique  to  each  entry  in  the  PDB.

Naturally,  the  PDB  has  its  own  data  model  to  describe  biological  structures  and  their

associated  data,  and  it  is  far  more  extensive  and  complicated  than  the  one  we  are  using

here. In their data model the PDB identifier is mandatory, but in our data model we will

make it optional; the data doesn’t have to come from this database in every case.

There  are  many  design  decisions  in  our  example  data  model,  about  which  things  to

describe, which things we ignore and what rules we apply. We will discuss the aspects of

our  particular  model  as  we  go  through  the  example.  However,  which  precise  details  we

have  chosen  is  not  the  most  important  thing;  the  idea  is  to  empower  you  to  create  your

own data models to do exactly what you want.




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   96   97   98   99   100   101   102   103   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish