Python Programming for Biology: Bioinformatics and Beyond


Designing a molecular structure database



Download 7,75 Mb.
Pdf ko'rish
bet306/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   302   303   304   305   306   307   308   309   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Designing a molecular structure database

In this section we use the model for molecular structures described in

Chapter 8

to create a

database for structures. Recall that we had four classes (object specifications): Structure,

Chain, Residue and Atom. We will create a table for each class, with the name of the table

being the same as that of the corresponding class.

For  each  of  the  tables  (classes)  there  will  be  columns  (attributes)  that  could  act  as

natural  keys  with  real  meaning,  to  identify  each  item  of  data.  However,  instead  of  using

these we will use serial number keys (integers) to identify each item, and let the database

automatically  generate  them.  Hence  they  will  be  unique  by  construction.  Both  MySQL

and SQLite have the ability to automatically generate serials. Note that these automatically

generated  serial  numbers  are  unique  across  the  whole  table,  not  just  unique  relative  to  a

local  context  (e.g.  a  parent  container  item).  Although  it  might  seem  better  to  use

meaningful natural keys, like the Atom.name that we set up in the structure data model, a

serial number key has many advantages, not least of which is that if it is unique across the

table  (as  it  is  here)  then  it  only  requires  this  one  column  to  specify  the  key.  If  we  used

natural keys, or a serial key that was only unique relative to the parent, then child tables

would have to include the full key of the parent table, and so that includes the grandparent

key, the great-grandparent key etc., and that gets rather long when you are deep down in

the  parent-child  containment  hierarchy.  Take,  for  example,  the  Atom:  again  the  full  key,

considering  all  the  parent  links,  would  need  to  include  Structure.pdbId,

Structure.conformation,  Chain.code,  Residue.seqId  and  Atom.name.  We  do  need  another

column  in  the  child  table,  for  the  serial  key  of  the  parent,  in  order  that  we  know  which

child goes with which parent, but that is a total of two columns only. That second column



is called a foreign key because it refers to a key in a different table.

The  Structure  class  has  three  attributes,  name,  conformation  and  pdbId.  In  the  class

only name was mandatory, but here we will assume that all three are now mandatory. And

then  the  conformation  and  pdbId  together  provide  a  natural  key  for  the  table.  When  we

add a new Structure  to  the  database  we  want  to  make  sure  that  (conformation,  pdbId)  is

not already used, and the database will not automatically do that because we are using a

serial key, not the natural key. Note that the Structure constructor code cannot check that

these are unique because there is no parent class (containing other structures), so there is

no way for the constructor to check what other (conformation, pdbId) values have already

been used. We could have modelled a parent class for Structure, and called it Database, in

which case this check could have been put in the Structure constructor, but we did not do

it  this  way.  So  instead  we  have  to  check  the  uniqueness  of  (conformation,  pdbId)  in  our

own code that acts as a bridge between the class and the table. In summary, the Structure

table will have four columns: id (for the serial key), name, conformation and pdbId, which

are all mandatory.

For  all  the  other  classes,  we  do  not  have  this  issue  with  the  natural  key,  because  the

uniqueness  is  explicitly  checked  in  the  corresponding  class  constructor.  The  Chain  class

has two attributes, code and molType, and a link to the parent Structure. As discussed, the

key for the Chain table will be an auto-generated serial, and the link to the parent structure

will be modelled as a foreign key. So the Chain table will have four columns: id (for the

serial  key),  structureId  (for  the  parent  serial  key),  code  and  molType.  The  Residue  class

has two attributes, seqId and code, and a link to the parent. So the Residue table will have

four  columns:  id  (for  the  serial  key),  chainId  (for  the  parent  serial  key)  and  seqId  and

code.


The Atom class has three attributes, name, coords and element, and a link to the parent.

Here we face an issue, because coords has three values (x, y and z), not one, and not all

SQL implementations allow many-valued columns (as the ARRAY data type). There are

two  possible  solutions  here.  We  could  introduce  a  new  table,  just  for  the  coordinates.

Having a new table is a possible approach for many-valued attributes. In this approach we

could call the new table AtomCoord,  with  columns  id  (for  a  serial  key),  atomId  (for  the

relevant atom serial key), dim (for the dimension being considered, 1, 2 or 3) and coord

(for the coordinate for that dimension). This works pretty well in the SQL context. But we

will take another approach here, relying on the fact that coords is always of length 3. So

we  will  stay  with  one  table  and  split  the  attribute  coords into three columns, x,  y  and  z.

The code that bridges between the Atom class and the Atom table will have to deal with

translating from coords to (x, y, z) and back again. If we had introduced a new table then

the bridging code would have had to deal with that, which is more complicated. The Atom

table will thus have seven columns, id (for the serial key), residueId (for the parent’s serial

key),  name  (for  the  atom  name),  x,  y,  z  (for  the  coordinates)  and  element  (for  the  atom

element type, e.g. ‘C’ or ‘N’).




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   302   303   304   305   306   307   308   309   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish