Source code online books for professionals by professionals


SOrteD arraYS, treeS, aND DICtS: ChOICeS, ChOICeS



Download 4,67 Mb.
Pdf ko'rish
bet99/266
Sana31.12.2021
Hajmi4,67 Mb.
#213682
1   ...   95   96   97   98   99   100   101   102   ...   266
Bog'liq
2 5296731884800181221

SOrteD arraYS, treeS, aND DICtS: ChOICeS, ChOICeS
bisection (on sorted arrays), binary search trees, and dicts (that is, hash tables) all implement the same basic 
functionality: they let you search efficiently. there are some important differences, though. bisection is fast, with 
little overhead, but works only on sorted arrays (such as python lists). and sorted arrays are hard to maintain; 
adding elements takes linear time. Search trees have more overhead but are dynamic and let you insert and 
remove elements. in many cases, though, the clear winner will be the hash table, in the form of 
dict
. its average 
asymptotic running time is constant (as opposed to the logarithmic running time of bisection and search trees), 
and it is close to that in practice, with little overhead.
hashing requires you to be able to compute a hash value for your objects. in practice, you can almost always do 
this, but in theory, bisection and search trees are a bit more flexible here—they need only to compare objects and 
figure out which one is smaller.
4
 this focus on ordering also means that search trees will let you access your values 
in sorted order—either all of them or just a portion. trees can also be extended to work in multiple dimensions  
(to search for points inside a hyperrectangular region) or to even stranger forms of search criteria, where hashing 
may be hard to achieve. there are more common cases, too, when hashing isn’t immediately applicable. For 
example, if you want the entry that is closest to your lookup key, a search tree would be the way to go.
4
Actually, more flexible may not be entirely correct. There are many objects (such as complex numbers) that can be hashed but that 
cannot be compared for size.


Chapter 6 

 DiviDe, Combine, anD Conquer
123
Selection
I’ll round off this section on “searching by half” with an algorithm you may not end up using a lot in practice but that 
takes the idea of bisection in an interesting direction. Besides, it sets the stages for quicksort (next section), which is 
one of the classics.
The problem is to find the kth largest number in an unsorted sequence, in linear time. The most important case 
is, perhaps, to find the median—the element that would have been in the middle position (that is, (n+1)//2), had the 
sequence been sorted.
5
 Interestingly, as a side effect of how the algorithm works, it will also allow us to identify which 
objects are smaller than the object we seek. That means we’ll be able to find the k smallest (and, simultaneously, the 
n-k largest) elements with a running time of Q(n), meaning that the value of k doesn’t matter!
This may be stranger than it seems at first glance. The running time constraint rules out sorting (unless we can 
count occurrences and use counting sort, as discussed in Chapter 4). Any other obvious algorithm for finding the k 
smallest objects would use some data structure to keep track of them. For example, you could use an approach similar 
to insertion sort: Keep the k smallest objects found so far either at the beginning of the sequence or in a separate 
sequence.
If you kept track of which one of them was largest, checking each large object in the main sequence would be fast 
(just a constant-time check). If you needed to add an object, though, and you already had k, you’d have to remove one. 
You’d remove the largest, of course, but then you’d have to find out which one was now largest. You could keep them 
sorted (that is, stay close to insertion sort), but the running time would be Q(nk) anyway.
One step up from this (asymptotically) would be to use a heap, essentially transforming our “partial insertion 
sort” into a “partial heap sort,” making sure that there are never more than k elements in the heap. (See the “Black 
Box” sidebar about binary heaps, heapq, and heapsort for more information.) This would give you a running time of 
Q(n lg k), and for a reasonably small k, this is almost identical to Q(n), and it lets you iterate over the main sequence 
without jumping around in memory, so in practice it might be the solution of choice.
Tip
 

  if you’re looking for the k smallest (or largest) objects in an iterable in python, you would probably use the  
nsmallest
 (or 
nlargest
) function from the 
heapq
 module if your k is small, relative to the total number of objects. if the k 
is large, you should probably sort the sequence (either using the 
sort
 method or using the 
sorted
 function) and pick out 
the k first objects. time your results to see what works best—or just choose the version that makes your code as clear as 
possible.
So, how can we take the next step, asymptotically, and remove dependence on k altogether? It turns out that 
guaranteeing a linear worst case is a bit knotty, so let’s focus on the average case. Now, if I tell you to try applying the 
idea of divide and conquer, what would you do? A first clue might be that we’re aiming for a linear running time; what 
“divide by half” recurrence does that? It’s the one with a single recursive call (which is equivalent to the knockout 
tournament sum): T(n) = T(n/2) + n. In other words, we divide the problem in half (or, for now, in half on average) by 
performing linear work, just like the more canonical divide-and-conquer approach, but we manage to eliminate one 
half, taking us closer to binary search. What we need to figure out, in order to design this algorithm, is how to partition 
the data in linear time so that we end up with all our objects in one half.
As always, systematically going through the tools at our disposal, and framing the problem as clearly as we can, 
makes it much easier to figure out a solution. We’ve arrived at a point where what we need is to partition a sequence 
into two halves, one consisting of small values and one of large values. And we don’t have to guarantee that the halves 
are equal—only that they’ll be equal on average. A simple way of doing this is to choose one of the values as a so-called 
pivot and use it to divide the others: All those smaller than (or equal to) the pivot end up in the left half, while those 
5
In statistics, the median is also defined for sequences of even length. It is then the average of the two middle elements. That’s not 
an issue we worry about here.


Chapter 6 

 DiviDe, Combine, anD Conquer
124
larger end up on the right. Listing 6-3 gives you a possible implementation of partition and select. Note that this version 
of partition is primarily meant to be readable; Exercise 6-11 asks you to see whether you can remove some overhead. 
The way select is written here, it returns the kth smallest element; if you’d rather have all the k smallest elements, you 
can simply rewrite it to return lo instead of pi.

Download 4,67 Mb.

Do'stlaringiz bilan baham:
1   ...   95   96   97   98   99   100   101   102   ...   266




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish