Structure
We start with the construction of the Structure class. As mentioned, we require a name and
optionally provide a PDB identifier code. More than one Structure object with the same
name will be allowed, each with its own set of coordinates. Hence, we introduce another
mandatory attribute called conformation,
4
which is a number that specifies which set of
coordinates within an ensemble we are considering. In many circumstances we will only
have one conformation, so we issue a default value of 0, even though it is mandatory.
This naturally leads to the following first attempt at the class definition and constructor
code:
class Structure:
def __init__(self, name, conformation=0, pdbId=None):
if not name:
raise Exception('name must be set to non-empty string')
self.name = name
self.conformation = conformation
self.pdbId = pdbId
Remembering that the __init__ function, the constructor, is called each time an instance
of this class of object is made, we store the name, conformation and pdbId as attributes by
binding their values onto variables that are linked to self, which provides a handle to any
actual object instance made using this class. Note that we have used a convention whereby
attribute names are lower case except when a new ‘word’ starts, and then the first
character of that is capitalised, thus here giving pdbId. A popular alternative is to keep
attribute names all lower case but use underscores to separate the words, which would
give pdb_id. No doubt there are other conventions, and it mostly doesn’t much matter
what you do, as long as you are consistent as an aid to readability.
In the class constructor we check whether name is defined. This is done using the
clause ‘if not name’ to check if the value is logically false, e.g. None or an empty string,
and in these cases we deem the name to be undefined. An undefined name means
something is wrong, so we cause an error by raising an exception object. However, we
have not checked that name is actually a text string. Someone could try to create a
Structure by passing in any Python object that evaluates as true (a non-zero number, for
example) and it would pass the check but violate our intention about what the name should
be. Hence, if you were being cautious you would check the type of name before using it.
Similarly, checks can be made for the other input arguments, and so in effect introduce
run-time type checking into the constructor. Here, for the sake of brevity, we will avoid
such caution.
Another thing we have not checked here is whether values are meaningful, even if they
are of the correct Python data type. For example, we do not know whether the pdbId, if
set, is actually a valid identifier. To determine if the pdbId is a valid PDB identifier code is
not trivial, but the example at the start of
Chapter 15
will give you a hint at a solution if
you are really keen.
5
We ignore such issues here, but it illustrates that no matter how many
checks you make, there are almost certainly some checks that you have not made. Also,
part of the solution is to not pass junk into your data model in the first place (despite the
fact that users may try).
For the pdbId we have set the default to None rather than an empty string. This is a
matter of taste, but generally in such situations we use None because this pretty much
always means ‘not set’. For pdbId an empty string could be taken to mean the same thing,
since real PDB identifiers are never empty strings, but in other situations an empty string
might be a legitimate setting.
In data modelling there is the notion of an object’s key; this is something that uniquely
identifies an object amongst other objects of the same class. Here we intend that the name
and the conformation uniquely identify Structure objects, so these two attributes taken
together are a natural key for this class. If we were diligent, and really wanted to enforce
this to be a key, then we should add a check in the constructor that (name, conformation)
has not already been used by an existing Structure. Again, for reasons of simplicity we
ignore that issue here, but if we wanted to worry about it then we would have to keep track
either of all the Structure objects that we created or of all the associated names and
conformations, for example, using a set or list of (name, conformation) pairs.
This brings up another design decision: a Structure object has a name and
conformation, but we have not stated whether we are allowed to change them. This
depends on how we intend to use them. For example, if we have an application where the
name is intended to be a friendly way of identifying a Structure to the user then we might
want to allow the user to change it to something they prefer. In contrast, the conformation
is effectively just an index number into the coordinate elements of a structural ensemble,
and so there is no reason to allow that to be changed. Indeed if it could be modified then
that might create more trouble than it was worth. If we allow an attribute to change we call
it changeable and otherwise we call it frozen. When an attribute is frozen it can only ever
be set once, and normally that would mean in the constructor (when the object is made). In
Python you have to take some extra steps to make attributes frozen, and we will discuss
this later. For now we will in effect assume that everything is changeable.
Another issue with attributes is the matter of how many items they are allowed to
represent, according to the data model, which is termed their cardinality. Specifically, the
cardinality is represented with whole numbers where the low cardinality represents the
minimum number of items that can be represented, while the high cardinality represents
the maximum number. Because we have stated that the name is mandatory it always
represents exactly one thing, thus the low cardinality is 1 and the high cardinality is also 1.
We can write the overall cardinality of this attribute, minimum to maximum, as being
‘1..1’. Similarly, the cardinality of conformation is also ‘1..1’. Conversely, because the
pdbId is optional there might be none or one, so for this the cardinality is ‘0..1’.
Perhaps at some point we decide that we are going to allow references to more than one
PDB identifier in a given Structure object. This would fundamentally change the data
model, and the constructor then might become, noting the plural name for the last
attribute:
class Structure:
def __init__(self, name, conformation=0, pdbIds=None):
# etc.
Here we might intend that pdbIds is specified as a list or tuple, containing strings
representing PDB identifier codes, or otherwise left undefined as None. The low
cardinality is still 0, because there might be no PDB identifiers, but we now have no upper
limit, so the high cardinality is effectively unbounded, which we label as ‘*’. This case
gives an overall cardinality of ‘0..*’. Obviously the high cardinality for any attribute has to
be greater than 0, otherwise it can never exist. If it is 1 then the attribute is normally
spelled in the singular (pdbId) and if it is greater than 1 then the attribute is normally
spelled as a plural (pdbIds).
When the high cardinality is greater than 1 another issue comes into play. In this case
we have a collection and there is the question of whether the items in the collection are in
any particular order, or not. For pdbIds we have stated that we intended it to be defined by
a list or tuple, collections that do have ordered items. Consequently, it is natural to assume
that the attribute is also ordered. Alternatively, we might have allowed it to be defined by
a set, in which case it is natural to assume it to be unordered. Deciding whether something
is ordered or unordered can be critical in some contexts. In any case, from here on we
stick with the singlular pdbId attribute, rather than pdbIds.
Changing the high cardinality of an attribute changes the data model fairly dramatically
(it has to be specifically coded in the classes) so it is a good idea to think carefully about
the situation being modelled. It might be tempting to always assume that the high
cardinality is unbounded (‘*’) because it is more general, but this is a bad idea if it really
ought to be 1. For one thing it means dealing with a collection containing a single object
instead of just the single object itself, which can make for confusing and error-prone code.
Finally, we create a Structure object in the usual way, by using the name of the class
and passing in values for the attributes:
structure = Structure('Chromosome Regulator', 0, "1A12")
or we could write it using named input attributes:
structure = Structure(name='Chromosome Regulator', pdbId="1A12")
As another example we could avoid passing in a PDB identifier, given that this attribute
is not mandatory and will take the default value of None.
structure = Structure(name='Chromosome Regulator')
Chain
As we mentioned previously, a structure may comprise more than one molecule. Each
molecule, because it is a chain of linked amino acids or nucleotides, will be described
using the Chain class. Our data model is made with the assumption that each chain
belongs to a unique structure, so is effectively contained by that structure. This is an
important design decision and has all kinds of ramifications. What we are describing here,
when one kind of object is said to contain another, is what is known in data modelling as a
Do'stlaringiz bilan baham: |