parent-child relationship. The Structure object is the parent and the Chain object is the
child. Also, we often talk in terms of classes rather than objects and say that Structure is
the parent class of Chain.
If a parent object contains children, then each child object must belong to that parent; a
child object is only meaningful within the context of its parent. A consequence of this is
that the parent object must be created before its children. Also, if a parent object is deleted
then all of its children must also be deleted. An alternative here would be to make a Chain
free-standing, meaning it could appear in more than one Structure. That would be a
perfectly plausible scenario, but it is not especially helpful here and not the way we will
model it. A parent class can have many different kinds of child classes. Thus, we could
have a Technique class, which represents how the structure was determined. This could
also be a child of Structure. When a Structure is deleted both Technique children and
Chain children would need to be deleted. To keep things simple, here we just consider
Chain. Note that Structure itself has no parent. Had we decided to model things
differently, we might have introduced a class Ensemble, as the parent of Structure, and
then Ensemble would have no parent.
A parent-child relationship is an example of a ‘link’ between objects. Links are
generally harder to manage than simple attributes (like name and pdbId), because there are
two ends to consider, one for each object, and they need to be made consistent with each
other. One way to manage links is to have one of the classes keep track of everything. For
example, in the previous chapter we had a parent class, Protein, and a child class,
AminoAcid, and the parent class managed everything. So a Protein object had a list
self.aminoAcids. But an AminoAcid object had no reference to the Protein being its
parent. It’s quite possible that this is not a problem; for example, an AminoAcid might
only appear in contexts where the Protein also appears. Having all the information at one
end of a link makes it easier to manage. On the other hand, it usually makes it harder to
use.
Here we choose to manage object to object links from both ends. Accordingly, a
Structure object will have the attribute self.chains to access its children and the Chain
class will have self.structure to access its parent. We will need to keep both of these
attributes synchronised at creation and deletion of a Chain object. Both ends of this link
have cardinality. A Chain object has to have a Structure parent and there can be only one
of them, so that cardinality is ‘1..1’. A Structure object can have any number of Chain
children (zero or more), and so that cardinality is ‘0..*’. We will assume that Chain
children are ordered for a given parent, and that this order is the order in which they were
created.
We will model the Chain class as having an identifying code, which is a string, and a
descriptive molType, where the latter can be ‘protein’ or ‘DNA’ or ‘RNA’. Relative to the
parent Structure, we will assume that the code uniquely identifies the Chain. Or to put it
another way, a given Structure object has only one Chain child with a given code. Thus,
the object key for a Chain, relative to its parent, is code and the full key is (structure,
code). This is a typical situation for a child class; it has a key that identifies it relative to
its parent and then together with the parent itself this is the full key for the child. This then
leads to the following proposals for the implementation of Structure and Chain
respectively:
class Structure:
def __init__(self, name, conformation=0, pdbId=None):
if not name:
raise Exception('name must be set to non-empty string')
self.name = name
self.conformation = conformation
self.pdbId = pdbId
self.chains = [] # For the children
def delete(self):
for chain in self.chains:
chain.delete()
def getChain(self, code):
for chain in self.chains:
if chain.code == code:
return chain
return None
The attribute that links the Structure parent to children is self.chains, and this is
initialised as an empty list, to be filled in as the child objects are made. The delete()
function is fairly straightforward: when the structure is deleted its chains disappear. It is
notable that there is no specific deletion of the structure object itself, i.e. we don’t do del
self, because it is at the top of the hierarchy and will simply disappear when it is no longer
associated with any Python variables (it will be garbage collected). We have also included
a function to help get hold of a specific Chain object, whereby a code string is accepted as
an argument. Here the function loops through the list of children (self.chains) in order to
find a Chain with a matching attribute, and, because this is the unique key to identify the
Chain within its parent, there will never be more than one possible match.
The Chain class is constructed by accepting the parent structure, the code value as its
key and a molecule type. The __init__() performs some checks to make sure the
arguments are reasonably sensible, and this includes determining whether the parent
structure already contains a Chain with the input code, which needs to be unique. If there
are no errors, the attributes are associated with the self. variables. Lastly, the constructor
adds the Chain (represented here by self, but filled in with an actual object at run time)
onto the structure.chains list, the link from the structure parent to its children. Note that we
do not modify the parent’s link to its child until after we have checked that everything is
ok.
class Chain:
allowedMolTypes = ('protein', 'DNA', 'RNA')
def __init__(self, structure, code, molType='protein'):
if not code:
raise Exception('code must be set to non-empty string')
if molType not in self.allowedMolTypes:
raise Exception('molType="%s" must be one of %s' %
(molType, self.allowedMolTypes))
# check that key code is not already used
chain = structure.getChain(code)
if chain:
raise Exception('code="%s" already used' % code)
self.structure = structure
self.code = code
self.molType = molType
structure.chains.append(self)
def delete(self):
self.structure.chains.remove(self)
The delete() function for the Chain is a simple matter of removing the object (again
represented by self in the class definition) from its parent’s list of children. Also, unless
we have a specific handle on the object, all notion of it will be lost and it will eventually
be removed from memory when Python performs garbage collection.
Note that in the above example we have chosen to store the chains of a structure using a
list. That means that the structure.getChain() function is not particularly efficient, because
it has to loop over a list looking for a matching item. If the chains had been unordered
(relative to the structure) then an alternative would have been to have a dictionary,
chainDict, where the key is code and the value is the chain. The chains could be obtained
from chainDict.values(). If we desire both efficiency and ordered children we could have
both the dictionary, chainDict, and the list, chains, although we would have to keep them
synchronised:
6
# Alternative Chain implementation
class Structure:
def __init__(self, name, conformation=0, pdbId=None):
# … initial part as before
self.chainDict = {}
self.chains = []
def getChain(self, code):
return self.chainDict.get(code)
class Chain:
def __init__(self, structure, code, molType='protein'):
# … initial part as before
structure.chainDict[code] = self
structure.chains.append(self)
def delete(self):
del self.structure.chainDict[self.code]
self.structure.chains.remove(self)
Practically, the inefficiency with lists isn’t an issue here because we expect that a given
structure will at most have only a few chains, so we will stick with the simpler list
implementation.
As a final point on the Chain class, although we have only used a single value as the
identifying key, it would also be possible to have a key consisting of two values, e.g.
(code, molType), where the key is passed around as a tuple.
Do'stlaringiz bilan baham: |