Algorithms For Dummies



Download 7,18 Mb.
Pdf ko'rish
bet457/651
Sana15.07.2021
Hajmi7,18 Mb.
#120357
1   ...   453   454   455   456   457   458   459   460   ...   651
Bog'liq
Algorithms

 

»

Encoding frequent long sequences of characters efficiently: This is similar 

to shrinking long sequences of identical bits, but it works with character 

sequences rather than single characters. This is the strategy used by LZW

which learns data patterns on the fly and creates a short encoding for long 

sequences of characters.



272

 

   


  PART 4 

 Struggling with Big Data

To understand how rethinking encoding can help in compression, start with the 

first  reason. Scientists  working on the Genome Project  around 2008  (

https://


www.genome.gov/10001772/all-about-the--human-genome-project-hgp/

managed to drastically reduce the size of their data using a simple encoding trick. 



Using this trick made the task of mapping the entire human DNA simpler, helping 

the scientists understand more about the life, disease, and death scripted into our 

body cells.

Scientists describe DNA using sequences of the letters A, C, T, and G (representing 

the four nucleotides present in all living beings). The human genome contains six 

billion nucleotides (you find them associated in couples, called bases) that add up 

to more than 50GB using ASCII encoding. In fact, you can represent A, C, T, and G 

in ASCII encoding as follows:

print (' '.join(['{0:08b}'.format(ord(l))

                 for l in "ACTG"]))

01000001 01000011 01010100 01000111

The sum of the preceding line is 32 bits, but because DNA maps just four charac-

ters, you can use 2 bits each, saving 75 percent of the previously used bits:

00 01 10 11

Such a gain demonstrates the reason to choose the right encoding. The encoding 

works fine in this case because the DNA alphabet is made of four letters, and using 

a full ASCII table based on 8 bits is overkill. If a problem requires that you use the 

complete ASCII alphabet, you can’t compress the data by redefining the encoding 

used. Instead, you have to approach the problem using Huffman compression.

If you can’t shrink the character encoding (or you have already done it), you can 

still shrink long sequences, reducing them to a simpler encoding. Observe how 

binary data can repeat long sequences of ones and zeros:

00000000 00000000 01111111 11111111 10000011 11111111

In this case, the sequence starts from zero. You can therefore count the number of 

zeros, and then count the number of ones that follow, and then repeat with the 

next count of zeros, and so on. Because the sequence has only zeros and ones, you 

can count them and obtain sequence of counts to compress the data. In this case, 

the data compresses into values of 17 15 5 10. Translating these counts into bytes 

shortens the initial data in an easily reversible way:

00010001 00001111 00000101 00001010




CHAPTER 14


Download 7,18 Mb.

Do'stlaringiz bilan baham:
1   ...   453   454   455   456   457   458   459   460   ...   651




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2025
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish