Algorithms For Dummies

Download 7,18 Mb.

Pdf ko'rish

bet	449/651
Sana	15.07.2021
Hajmi	7,18 Mb.
	#120357

1 ... 445 446 447 448 449 450 451 452 ... 651

Bog'liq
Algorithms

Compressing Data

267

knows data only as bits because it has only circuitry to store bits. However, from

a higher point of view, computer software can interpret bits as letters, ideograms,

pictures, films, and sounds, which is where encoding comes into play.

Encoding uses a sequence of bits to represent something other than the number

expressed by the sequence itself. For instance, you can represent a letter using a

particular sequence of bits. Computer software commonly represents the letter A

using the number 65, or binary 01000001 when working with the American Stan-

dard Code for Information Interchange (ASCII) encoding standard. You can see

sequences used by ASCII system at

http://www.asciitable.com/

. ASCII uses just

7 bits for its encoding (8 bits, or a byte, in the extended version), which means

that you can represent 128 different characters (the extended version has

256 characters). Python can represent the string “Hello World” using bytes:

print (''.join(['{0:08b}'.format(ord(l))

for l in "Hello World"]))

0100100001100101011011000110110001101111001000000101011101

101111011100100110110001100100

When using extended ASCII, a computer knows that a sequence of exactly 8 bits

represent a character. It can separate each sequence into 8-bit bytes and, using a

conversion table called a symbolic table, it can turn these bytes into characters.

ASCII encoding can represent the standard Western alphabet, but it doesn’t sup-

port the variety of accented European characters or the richness of non-European

alphabets, such as the ideograms used by the Chinese and Japanese languages.

Chances are that you’re using a robust encoding system such as UTF-8 or another

form of Unicode encoding (see

http://unicode.org/

for more information).

Unicode encoding is the default encoding in Python 3.

Using a complex encoding system requires that you use longer sequences than

those required by ASCII. Depending on the encoding you choose, defining a

character may require up to 4 bytes (32 bits). When representing textual informa-

tion, a computer creates long bit sequences. It decodes each letter easily because

encoding uses fixed-length sequences in a single file. Encoding strategies, such as

Unicode Transformation Format 8 (UTF-8), can use variable numbers of bytes

(1 to 4 in this case). You can read more about how UTF-8 works at

http://www.

fileformat.info/info/unicode/utf8.htm

Download 7,18 Mb.

Do'stlaringiz bilan baham:

1 ... 445 446 447 448 449 450 451 452 ... 651