Reading FASTA files
The FASTA text format is used to store nucleotide or protein sequences, using one-letter
codes (as illustrated in Figure 6.4). Each file can store one or more sequence entries. An
individual sequence entry spans two or more lines: the first line is a textual comment or
identifier for the sequence (e.g. a gene or protein name with database codes), and the
remaining lines contain the sequence of residue codes. The first comment line for each
sequence begins with the ‘>’ character. (Originally the semicolon character ‘;’ was also
allowed at the beginning of a line to indicate a comment, but that is no longer commonly
used.) Each line subsequent to the comment line contains part of the sequence, until either
the end of the file is reached or the next comment line occurs, indicating the start of the
next sequence. The lines of residue codes are normally no more than 60 characters long,
each character representing one nucleotide or amino acid in the sequence.
Do'stlaringiz bilan baham: |