BLaCK BOX: BINarY heapS, heapQ, aND heapSOrt
a priority queue is a generalization of the LiFo and FiFo queues discussed in Chapter 5. instead of basing the
order only on when an item is added, each item receives a priority, and you always retrieve the remaining item
with the lowest priority. (You could also use maximum priority, but you normally can’t have both in the same
structure.) this kind of functionality is important as a component of several algorithms, such as prim’s, for finding
minimum spanning trees (Chapter 7), or Dijkstra’s, for finding shortest paths (Chapter 9). there are many ways of
implementing a priority queue, but probably the most common data structure used for this purpose is the binary
heap. (there are other kinds of heaps, but the unqualified term heap normally refers to binary heaps.)
binary heaps are complete binary trees. that means they are as balanced as they can get, with each level of the
tree filled up, except (possibly) the lowest level, which is filled as far as possible from the left. arguably the most
important aspect of their structure, though, is the so-called heap property: the value of every parent is smaller
than those of both children. (this holds for a minimum heap; for a maximum heap, each parent is greater.) as a
consequence, the root has the smallest value in the heap. the property is similar to that of search trees but not
quite the same, and it turns out that the heap property is much easier to maintain without sacrificing the balance
of the tree. You never modify the structure of the tree by splitting or rotating nodes in a heap. You only ever need
to swap parent and child nodes to restore the heap property. For example, to “repair” the root of a subtree
(which is too big), you simply swap it with its smallest child and repair that subtree recursively (as needed).
the
heapq
module contains an efficient heap implementation that represents its heaps in lists, using a common
“encoding”: if
a
is a heap, the children of
a[i]
are found in
a[2*i+1]
and
a[2*i+2]
. this means that the root
(the smallest element) is always found in
a[0]
. You can build a heap from scratch, using the
heappush
and
heappop
functions. You might also start out with a list that contains lots of values, and you’d like to make it into
a heap. in that case, you can use the
heapify
function.
17
it basically repairs every subtree root, starting at the
bottom right, moving left and up. (in fact, by skipping the leaves, it needs only work on the left half of the array.)
the resulting running time is linear (see exercise 6-9). if your list is sorted, it’s already a valid heap, so you can
just leave it alone.
here’s an example of building a heap piece by piece:
>>> from heapq import heappush, heappop
>>> from random import randrange
>>> Q = []
>>> for i in range(10):
... heappush(Q, randrange(100))
...
>>> Q
[15, 20, 56, 21, 62, 87, 67, 74, 50, 74]
>>> [heappop(Q) for i in range(10)]
[15, 20, 21, 50, 56, 62, 67, 74, 74, 87]
17
It is quite common to call this operation build-heap and to reserve the name heapify for the operation that repairs a single node.
Thus, build-heap runs heapify on all nodes but the leaves.
Chapter 6
■
DiviDe, Combine, anD Conquer
136
Just like
bisect
, the
heapq
module is implemented in C, but it used to be a plain python module. For example,
here is the code (from python 2.3) for the function that moves an object down until it’s smaller than both of its
children (again, with my comments):
def sift_up(heap, startpos, pos):
newitem = heap[pos] # The item we're sifting up
while pos > startpos: # Don't go beyond the root
parentpos = (pos - 1) >>1 # The same as (pos - 1) // 2
parent = heap[parentpos] # Who's your daddy?
if parent <= newitem: break # Valid parent found
heap[pos] = parent # Otherwise: copy parent down
pos = parentpos # Next candidate position
heap[pos] = newitem # Place the item in its spot
note that the original function was called
_siftdown
because it’s sifting the value down in the list. i prefer to
think of it as sifting it up in the implicit tree structure of the heap, though. note also that, just like
bisect_right
,
the implementation uses a loop rather than recursion.
in addition to
heappop
, there is
heapreplace
, which will pop the smallest item and insert a new element at the
same time, which is a bit more efficient than a
heappop
followed by a
heappush
. the
heappop
operation returns
the root (the first element). to maintain the shape of the heap, the last item is moved to the root position, and
from there it is swapped downward (in each step, with its smallest child) until it is smaller than both its children.
the
heappush
operation is just the reverse: the new element is appended to the list and is repeatedly swapped
with its parent until it is greater than its parent. both of these operations are logarithmic (also in the worst case,
because the heap is guaranteed to be balanced).
Finally, the module has (since version 2.6) the utility functions
merge
,
nlargest
, and
nsmallest
for merging
sorted inputs and finding the n largest and smallest items in an iterable, respectively. the latter two functions,
unlike the others in the module, take the same kind of
key
argument as
list.sort
. You can simulate this in the
other functions with the DSu pattern, as mentioned in the sidebar on
bisect
.
although you would probably never use them that way in python, the heap operations can also form a simple,
efficient, and asymptotically optimal sorting algorithm called heapsort. it is normally implemented using a max-heap
and works by first performing
heapify
on a sequence, then repeatedly popping off the root (as in
heappop
), and
finally placing it in the now empty last slot. Gradually, as the heap shrinks, the original array is filled from the right
with the largest element, the second largest, and so forth. in other words, heap sort is basically selection sort
where a heap is used to implement the selection. because the initialization is linear and each of the n selections
is logarithmic, the running time is loglinear, that is, optimal.
Summary
The algorithm design strategy of divide and conquer involves a decomposition of a problem into roughly equal-sized
subproblems, solving the subproblems (often by recursion), and combining the results. The main reason this is useful
it that the workload is balanced, typically taking you from a quadratic to a loglinear running time. Important examples
of this behavior include merge sort and quicksort, as well as algorithms for finding the closest pair or the convex hull
of a point set. In some cases (such as when searching a sorted sequence or selecting the median element), all but one
of the subproblems can be pruned, resulting in a traversal from root to leaf in the subproblem graph, yielding even
more efficient algorithms.
Chapter 6
■
DiviDe, Combine, anD Conquer
137
The subproblem structure can also be represented explicitly, as it is in binary search trees. Each node in a search
tree is greater than the descendants in its left subtree but less than those in its right subtree. This means that a binary
search can be implemented as a traversal from the root. Simply inserting random values haphazardly will, on average,
yield a tree that is balanced enough (resulting in logarithmic search times), but it is also possible to balance the tree,
using node splitting or rotations, to guarantee logarithmic running times in the worst case.
If You’re Curious ...
If you like bisection, you should look up interpolation search, which for uniformly distributed data has an average-case
running time of O(lg lg n). For implementing sets (that is, efficient membership checking) other than sorted
sequences, search trees and hash tables, you could have a look at Bloom filters. If you like search trees and related
structures, there are lots of them out there. You could find tons of different balancing mechanisms ( red black trees,
Do'stlaringiz bilan baham: |