Python Programming for Biology: Bioinformatics and Beyond

Download 7,75 Mb.

Pdf ko'rish

bet	301/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 297 298 299 300 301 302 303 304 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

A brief introduction to relational databases

20

Databases

Contents

A brief introduction to relational databases

Tables

Schemas

Basic SQL

Creating a table

Running SQL

Manipulating records

Designing a molecular structure database

SQL creation of the database

Python interaction with the database

Adding a structure to the database

Getting a structure from the database

Querying the database

A brief introduction to relational databases

Any collection of data can be considered to be a database, however it is stored or utilised.

However, the common use of the word ‘database’ usually refers to a relational database.

Relational databases were introduced in the 1970s and model their data in terms of tables

with rows and columns. There is an associated language, SQL (Structured Query

Language), that can be used to send messages to the database to allow the database to be

queried and modified: inserting, changing and deleting data elements. SQL also provides

the ability to make connections between the data in different tables. In terms of

mathematics, relational databases can be thought of as ‘first-order predicate logic’, and

this mathematical underpinning of the principles of relational databases is one reason they

are conceptually attractive.

Tables

A table in a relational database has a name and also has some named columns and each

row in the table represents one record of data. The type of the data in each column can be

specified and the data in any column can be stated to be mandatory, or not. One or more

columns in each table define the key. Each record in the table must have a unique key; the

key identifies the individual record. Sometimes a table has a ‘natural’ key but sometimes

there is no obvious key and so instead a counter (a ‘serial’ or ‘ID’ number) is used, which

is set (in many database implementations automatically) to 1 for the first record, 2 for the

second, etc. At a simplistic level, spreadsheets (e.g. as used in Excel) can be thought of as

tables in a weak substitute for a relational database. A table can have one or more columns

that refer to one or more other tables, and this is a way that information between tables can

be linked. In the database jargon, a query that involves relating information across more

than one table is called a join.

Many databases (relational or otherwise) provide reliable transaction control, through a

mechanism named with the memorable acronym ACID (Atomicity, Consistency, Isolation,

Durability), and this, more than anything else, is what makes databases so ubiquitous in

the modern world. In effect, multiple agents (people or computers) can reliably access a

database at the same time. Relational databases are the most significant and common

examples of ACID-compliant databases, although not all relational databases are

necessarily ACID-compliant.

Schemas

As with other areas in computing, one of the most important jobs in relational databases is

their design, in terms of the definitions of the tables and the relations between the tables.

This can be thought of as an exercise in data modelling. In the database context, this is

called designing the database schema. Sometimes the nature of relational databases forces

non-ideal design decisions. In other areas of computing it is quite common to use object-

oriented methodology, but the mapping between objects and relational databases is not

always easy. In terms of the object-relational mapping, a class of object is mapped onto

one or more tables. Each column in a table would represent a property (an attribute or link

to another object) of the class. It would be ideal if there could just be one table per class,

but that is not always possible because an attribute that is allowed to have multiple

possibilities (so a high cardinality of more than 1, in data-modelling jargon) requires an

additional table in relational databases, and any link between classes that allows multiple

possibilities at both ends (so a many-to-many link in data-modelling jargon) also requires

an additional table.

Another possibility would be to map more than one class onto one table. This is called

denormalisation of the database, and it (usually) means that there is redundant information

in the table. We could even map all of the classes onto one ‘universal’ table. Whether this

is a good idea depends on how the database is going to be used. If the database is only or

is mainly for querying, then a universal table works well. But if the database is frequently

updated and the contents modified, then a universal table does not work well. In this

chapter we will largely avoid these subtler points but in practice they need to be

considered.

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 297 298 299 300 301 302 303 304 ... 514