Algorithms For Dummies


PART 4   Struggling with Big Data



Download 7,18 Mb.
Pdf ko'rish
bet406/651
Sana15.07.2021
Hajmi7,18 Mb.
#120357
1   ...   402   403   404   405   406   407   408   409   ...   651
Bog'liq
Algorithms

 

   


  PART 4 

 Struggling with Big Data

Using a simple random sample is just like playing the lottery, but you need to have 

all the numbers inside an urn in order to extract a few to represent the whole. You 

can’t easily put data streams into a repository from which you can extract a 

 sample; instead, you have to extract your sample on the fly. What you really need 

is another sample type named reservoir sampling. Just as a reservoir retains water 

for later use, yet its water isn’t still because some enters and some leaves, so this 

algorithm works by randomly choosing elements to keep as samples until other 

elements arrive to replace them.

The reservoir sampling algorithm is more sophisticated than another algorithmic 

strategy, called windowing, in which you create a queue and let new elements enter 

the queue (see Figure 12-3). Older elements leave the queue based on a trigger. 

This method applies when you want reports from the stream at exact time inter-

vals. For instance, you may want to know how many pages users request from an 

Internet server each minute. Using windowing, you start queuing page requests a 

minute of time, count the elements in the queue, report the number, discard the 

content of the queue, and start queuing again.

Another motivation for using windowing is to have a fixed amount of the most 

recent data. In that case, every time you insert an element into the queue, the old-

est element leaves. A queue is a First In, First Out (FIFO) structure discussed in 

Chapter 6.

Windowing  looks  at  samples  using  a  sliding  window  —  it  shows  the  elements 

under the window, which represent a certain time slice or a certain segment of the 

stream. Reservoir sampling represents the entire stream scope instead by offering 

a manageable amount of data, which is a statistical sample of the stream.

FIGURE 12-2: 

How sampling 

from an urn 

works.



CHAPTER 12


Download 7,18 Mb.

Do'stlaringiz bilan baham:
1   ...   402   403   404   405   406   407   408   409   ...   651




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2025
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish