Python Programming for Biology: Bioinformatics and Beyond

Download 7,75 Mb.

1 ... 74 75 76 77 78 79 80 81 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Reading FASTA files

The FASTA text format is used to store nucleotide or protein sequences, using one-letter

codes (as illustrated in Figure 6.4). Each file can store one or more sequence entries. An

individual sequence entry spans two or more lines: the first line is a textual comment or

identifier for the sequence (e.g. a gene or protein name with database codes), and the

remaining lines contain the sequence of residue codes. The first comment line for each

sequence begins with the ‘>’ character. (Originally the semicolon character ‘;’ was also

allowed at the beginning of a line to indicate a comment, but that is no longer commonly

used.) Each line subsequent to the comment line contains part of the sequence, until either

the end of the file is reached or the next comment line occurs, indicating the start of the

next sequence. The lines of residue codes are normally no more than 60 characters long,

each character representing one nucleotide or amino acid in the sequence.

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 74 75 76 77 78 79 80 81 ... 514