Algorithms For Dummies



Download 7,18 Mb.
Pdf ko'rish
bet405/651
Sana15.07.2021
Hajmi7,18 Mb.
#120357
1   ...   401   402   403   404   405   406   407   408   ...   651
Bog'liq
Algorithms

  Managing Big Data 

     235


Sampling means drawing a limited set of examples from your stream and treating 

them as if they represented the entire stream. It is a well-known tool in statistics 

through which you can make inferences on a larger context (technically called the 

universe or the population) by using a small part of it.

Reserving the right data

Statistics was born in a time when obtaining a census was impossible. A census is 

a systematic investigation on a population, counting it, and acquiring useful data 

from it. The government asks all the people in a country about where they live, 

their family, their daily life, and their work. The census has its origins in ancient 

times. In the Bible, a census occurs in the book of Numbers; the Israelite popula-

tion is counted after the exodus from Egypt. For tax purposes, the ancient Romans 

periodically held a census to count the population of their large empire. Historical 

documents provide accounts of similar census activities in ancient Egypt, Greece, 

India, and China.

Statistics,  in  particular  the  branch  of  statistics  called  inferential  statistics,  can 

achieve the same outcome as a census, with an acceptable margin of error, by 

interrogating a smaller number of individuals (called a sample). Thus, by querying 

a few people, pollsters can determine the general opinion of a larger population on 

a  variety  of  issues,  such  as  who  will  win  an  election.  In  the  United  States,  for 

instance, the statistician Nate Silver made news by predicting the winner of the 

2012 presidential election in all 50 states, using data from samples (

https://www.

cnet.com/news/obamas-win-a-big-vindication-for-nate-silver-king- 

of-the-quants/

).

Clearly, holding a census implies huge costs (the larger the population, the greater 



the costs) and requires a lot of organization (which is why censuses are infre-

quent), whereas a statistical sample is faster and cheaper. Reduced costs and 

lower organizational requirements also make statistics ideal for big data stream-

ing: Users of big data streaming don’t need every scrap of information and they 

can summarize the data’s complexity.

However, there’s a problem with using statistical samples. At the core of statistics 

is sampling, and sampling requires randomly picking a few examples from the pool 

of the entire population. The key element of the recipe is that every element from 

the population has exactly the same probability of being part of the sample. If a 

population consists of a million people and your sample size is one, each person’s 

probability of being part of the sample is one out of a million. In mathematical 

terms, if you represent the population using the variable N and the sample size is 

n, the probability of being part of a sample is n/N, as shown in Figure 12-2. The 

represented sample is a simple random sample. (Other sample types have greater 

complexity; this is the simplest sample type and all the others build upon it.)



236


Download 7,18 Mb.

Do'stlaringiz bilan baham:
1   ...   401   402   403   404   405   406   407   408   ...   651




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2025
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish