Nucleotides
Percentage
Fixed Encoding
Huffman Encoding
A
40.5%
00
0
C
29.2%
01
10
G
14.5%
10
110
T
15.8%
11
111
Weighted bit average
2.00
1.90
By multiplying the number of bits of the two encodings by their percentage and
summing everything, you obtain the weighted average of bits used by each of
them. In this case, the result is 1.9 for the Huffman encoding versus 2.0 for the
fixed encoding. It means that you obtain a five percent bit saving in this example.
You could save even more space when having genes with an even more unbal-
anced distribution in favor of some nucleotide.
The following example generates a random DNA sequence and shows how the
code systematically generates the encoding. (If you change the seed value, the
random generation of the DNA sequences may lead to a different result, both in
the distribution of nucleotides and Huffman encoding.)
from heapq import heappush, heappop, heapify
from collections import defaultdict, Counter
from random import shuffle, seed
296
PART 5
Do'stlaringiz bilan baham: |