Table 11.2. The relative entropy formulation to measure sequence repetitiveness.
The repetitiveness of a biological sequence may be formulated mathematically by
calculating a relative entropy value D
KL
(the Kullback-Leibler divergence). This is
simply the summation, considering all the residue (amino acid or nucleotide,
depending on the molecule) types, of the observed proportion of each type (P
i
)
multiplied by the log-ratio of the observed proportion divided by the proportion
expected in random sequences (Q
i,
which is always 0.25 in the above example). The
relative entropy is illustrated for various degrees of sequence repetition, showing that
the measure represents the variety of different residue types, in a sample of fixed size.
Do'stlaringiz bilan baham: |