database for structures. Recall that we had four classes (object specifications): Structure,
Chain, Residue and Atom. We will create a table for each class, with the name of the table
For each of the tables (classes) there will be columns (attributes) that could act as
natural keys with real meaning, to identify each item of data. However, instead of using
these we will use serial number keys (integers) to identify each item, and let the database
automatically generate them. Hence they will be unique by construction. Both MySQL
and SQLite have the ability to automatically generate serials. Note that these automatically
generated serial numbers are unique across the whole table, not just unique relative to a
local context (e.g. a parent container item). Although it might seem better to use
meaningful natural keys, like the Atom.name that we set up in the structure data model, a
serial number key has many advantages, not least of which is that if it is unique across the
table (as it is here) then it only requires this one column to specify the key. If we used
natural keys, or a serial key that was only unique relative to the parent, then child tables
would have to include the full key of the parent table, and so that includes the grandparent
key, the great-grandparent key etc., and that gets rather long when you are deep down in
the parent-child containment hierarchy. Take, for example, the Atom: again the full key,
considering all the parent links, would need to include Structure.pdbId,
Structure.conformation, Chain.code, Residue.seqId and Atom.name. We do need another
column in the child table, for the serial key of the parent, in order that we know which
child goes with which parent, but that is a total of two columns only. That second column
is called a
foreign key because it refers to a key in a different table.
The Structure class has three attributes, name, conformation and pdbId. In the class
only name was mandatory, but here we will assume that all three are now mandatory. And
then the conformation and pdbId together provide a natural key for the table. When we
add a new Structure to the database we want to make sure that (conformation, pdbId) is
not already used, and the database will not automatically do that because we are using a
serial key, not the natural key. Note that the Structure constructor code cannot check that
these are unique because there is no parent class (containing other structures), so there is
no way for the constructor to check what other (conformation, pdbId) values have already
been used. We could have modelled a parent class for Structure, and called it Database, in
which case this check could have been put in the Structure constructor, but we did not do
it this way. So instead we have to check the uniqueness of (conformation, pdbId) in our
own code that acts as a bridge between the class and the table. In summary, the Structure
table will have four columns: id (for the serial key), name, conformation and pdbId, which
are all mandatory.
For all the other classes, we do not have this issue with the natural key, because the
uniqueness is explicitly checked in the corresponding class constructor. The Chain class
has two attributes, code and molType, and a link to the parent Structure. As discussed, the
key for the Chain table will be an auto-generated serial, and the link to the parent structure
will be modelled as a foreign key. So the Chain table will have four columns: id (for the
serial key), structureId (for the parent serial key), code and molType. The Residue class
has two attributes, seqId and code, and a link to the parent. So the Residue table will have
four columns: id (for the serial key), chainId (for the parent serial key) and seqId and
code.
The Atom class has three attributes, name, coords and element, and a link to the parent.
Here we face an issue, because coords has three values (x, y and z), not one, and not all
SQL implementations allow many-valued columns (as the ARRAY data type). There are
two possible solutions here. We could introduce a new table, just for the coordinates.
Having a new table is a possible approach for many-valued attributes. In this approach we
could call the new table AtomCoord, with columns id (for a serial key), atomId (for the
relevant atom serial key), dim (for the dimension being considered, 1, 2 or 3) and coord
(for the coordinate for that dimension). This works pretty well in the SQL context. But we
will take another approach here, relying on the fact that coords is always of length 3. So
we will stay with one table and split the attribute coords into three columns, x, y and z.
The code that bridges between the Atom class and the Atom table will have to deal with
translating from coords to (x, y, z) and back again. If we had introduced a new table then
the bridging code would have had to deal with that, which is more complicated. The Atom
table will thus have seven columns, id (for the serial key), residueId (for the parent’s serial
key), name (for the atom name), x, y, z (for the coordinates) and element (for the atom
element type, e.g. ‘C’ or ‘N’).
Do'stlaringiz bilan baham: