Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet103/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   99   100   101   102   103   104   105   106   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Structure

We start with the construction of the Structure class. As mentioned, we require a name and

optionally  provide  a  PDB  identifier  code.  More  than  one  Structure  object  with  the  same

name will be allowed, each with its own set of coordinates. Hence, we introduce another

mandatory  attribute  called  conformation,

4

 which  is  a  number  that  specifies  which  set  of



coordinates within an ensemble we are considering. In many circumstances we will only

have one conformation, so we issue a default value of 0, even though it is mandatory.




This naturally leads to the following first attempt at the class definition and constructor

code:


class Structure:

def __init__(self, name, conformation=0, pdbId=None):

if not name:

raise Exception('name must be set to non-empty string')

self.name = name

self.conformation = conformation

self.pdbId = pdbId

Remembering that the __init__ function, the constructor, is called each time an instance

of this class of object is made, we store the name, conformation and pdbId as attributes by

binding their values onto variables that are linked to self, which provides a handle to any

actual object instance made using this class. Note that we have used a convention whereby

attribute  names  are  lower  case  except  when  a  new  ‘word’  starts,  and  then  the  first

character  of  that  is  capitalised,  thus  here  giving  pdbId.  A  popular  alternative  is  to  keep

attribute  names  all  lower  case  but  use  underscores  to  separate  the  words,  which  would

give  pdb_id.  No  doubt  there  are  other  conventions,  and  it  mostly  doesn’t  much  matter

what you do, as long as you are consistent as an aid to readability.

In  the  class  constructor  we  check  whether  name  is  defined.  This  is  done  using  the

clause ‘if not name’ to check if the value is logically false, e.g. None or an empty string,

and  in  these  cases  we  deem  the  name  to  be  undefined.  An  undefined  name  means

something  is  wrong,  so  we  cause  an  error  by  raising  an  exception  object.  However,  we

have  not  checked  that  name  is  actually  a  text  string.  Someone  could  try  to  create  a

Structure  by  passing  in  any  Python  object  that  evaluates  as  true  (a  non-zero  number,  for

example) and it would pass the check but violate our intention about what the name should

be. Hence, if you were being cautious you would check the type of name before using it.

Similarly,  checks  can  be  made  for  the  other  input  arguments,  and  so  in  effect  introduce

run-time  type  checking  into  the  constructor.  Here,  for  the  sake  of  brevity,  we  will  avoid

such caution.

Another thing we have not checked here is whether values are meaningful, even if they

are  of  the  correct  Python  data  type.  For  example,  we  do  not  know  whether  the  pdbId,  if

set, is actually a valid identifier. To determine if the pdbId is a valid PDB identifier code is

not trivial, but the example at the start of

Chapter 15

will give you a hint at a solution if

you are really keen.

5

We ignore such issues here, but it illustrates that no matter how many



checks you make, there are almost certainly some checks that you have not made. Also,

part of the solution is to not pass junk into your data model in the first place (despite the

fact that users may try).

For  the  pdbId  we  have  set  the  default  to  None  rather  than  an  empty  string.  This  is  a

matter  of  taste,  but  generally  in  such  situations  we  use  None  because  this  pretty  much

always means ‘not set’. For pdbId an empty string could be taken to mean the same thing,

since real PDB identifiers are never empty strings, but in other situations an empty string

might be a legitimate setting.

In data modelling there is the notion of an object’s key; this is something that uniquely

identifies an object amongst other objects of the same class. Here we intend that the name




and  the  conformation  uniquely  identify  Structure  objects,  so  these  two  attributes  taken

together are a natural key for this class. If we were diligent, and really wanted to enforce

this to be a key, then we should add a check in the constructor that (name, conformation)

has  not  already  been  used  by  an  existing  Structure.  Again,  for  reasons  of  simplicity  we

ignore that issue here, but if we wanted to worry about it then we would have to keep track

either  of  all  the  Structure  objects  that  we  created  or  of  all  the  associated  names  and

conformations, for example, using a set or list of (name, conformation) pairs.

This  brings  up  another  design  decision:  a  Structure  object  has  a  name  and

conformation,  but  we  have  not  stated  whether  we  are  allowed  to  change  them.  This

depends on how we intend to use them. For example, if we have an application where the

name is intended to be a friendly way of identifying a Structure to the user then we might

want to allow the user to change it to something they prefer. In contrast, the conformation

is effectively just an index number into the coordinate elements of a structural ensemble,

and so there is no reason to allow that to be changed. Indeed if it could be modified then

that might create more trouble than it was worth. If we allow an attribute to change we call

it changeable and otherwise we call it frozen. When an attribute is frozen it can only ever

be set once, and normally that would mean in the constructor (when the object is made). In

Python you have to take some extra steps to make attributes frozen, and we will discuss

this later. For now we will in effect assume that everything is changeable.

Another  issue  with  attributes  is  the  matter  of  how  many  items  they  are  allowed  to

represent, according to the data model, which is termed their cardinality. Specifically, the

cardinality  is  represented  with  whole  numbers  where  the  low  cardinality  represents  the

minimum number of items that can be represented, while the high  cardinality  represents

the  maximum  number.  Because  we  have  stated  that  the  name  is  mandatory  it  always

represents exactly one thing, thus the low cardinality is 1 and the high cardinality is also 1.

We  can  write  the  overall  cardinality  of  this  attribute,  minimum  to  maximum,  as  being

‘1..1’.  Similarly,  the  cardinality  of  conformation  is  also  ‘1..1’.  Conversely,  because  the

pdbId is optional there might be none or one, so for this the cardinality is ‘0..1’.

Perhaps at some point we decide that we are going to allow references to more than one

PDB  identifier  in  a  given  Structure  object.  This  would  fundamentally  change  the  data

model,  and  the  constructor  then  might  become,  noting  the  plural  name  for  the  last

attribute:

class Structure:

def __init__(self, name, conformation=0, pdbIds=None):

# etc.

Here  we  might  intend  that  pdbIds  is  specified  as  a  list  or  tuple,  containing  strings



representing  PDB  identifier  codes,  or  otherwise  left  undefined  as  None.  The  low

cardinality is still 0, because there might be no PDB identifiers, but we now have no upper

limit,  so  the  high  cardinality  is  effectively  unbounded,  which  we  label  as  ‘*’.  This  case

gives an overall cardinality of ‘0..*’. Obviously the high cardinality for any attribute has to

be  greater  than  0,  otherwise  it  can  never  exist.  If  it  is  1  then  the  attribute  is  normally

spelled  in  the  singular  (pdbId)  and  if  it  is  greater  than  1  then  the  attribute  is  normally

spelled as a plural (pdbIds).



When the high cardinality is greater than 1 another issue comes into play. In this case

we have a collection and there is the question of whether the items in the collection are in

any particular order, or not. For pdbIds we have stated that we intended it to be defined by

a list or tuple, collections that do have ordered items. Consequently, it is natural to assume

that the attribute is also ordered. Alternatively, we might have allowed it to be defined by

a set, in which case it is natural to assume it to be unordered. Deciding whether something

is  ordered  or  unordered  can  be  critical  in  some  contexts.  In  any  case,  from  here  on  we

stick with the singlular pdbId attribute, rather than pdbIds.

Changing the high cardinality of an attribute changes the data model fairly dramatically

(it has to be specifically coded in the classes) so it is a good idea to think carefully about

the  situation  being  modelled.  It  might  be  tempting  to  always  assume  that  the  high

cardinality is unbounded (‘*’) because it is more general, but this is a bad idea if it really

ought to be 1. For one thing it means dealing with a collection containing a single object

instead of just the single object itself, which can make for confusing and error-prone code.

Finally,  we  create  a  Structure  object  in  the  usual  way,  by  using  the  name  of  the  class

and passing in values for the attributes:

structure = Structure('Chromosome Regulator', 0, "1A12")

or we could write it using named input attributes:

structure = Structure(name='Chromosome Regulator', pdbId="1A12")

As another example we could avoid passing in a PDB identifier, given that this attribute

is not mandatory and will take the default value of None.

structure = Structure(name='Chromosome Regulator')



Chain

As  we  mentioned  previously,  a  structure  may  comprise  more  than  one  molecule.  Each

molecule,  because  it  is  a  chain  of  linked  amino  acids  or  nucleotides,  will  be  described

using  the  Chain  class.  Our  data  model  is  made  with  the  assumption  that  each  chain

belongs  to  a  unique  structure,  so  is  effectively  contained  by  that  structure.  This  is  an

important design decision and has all kinds of ramifications. What we are describing here,

when one kind of object is said to contain another, is what is known in data modelling as a


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   99   100   101   102   103   104   105   106   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish