Chapter 6
■
DiviDe, Combine, anD Conquer
136
Just like
bisect
, the
heapq
module is implemented in C, but it
used to be a plain python module. For example,
here is the code (from python 2.3) for the function that moves an object down until it’s smaller than both of its
children (again, with my comments):
def sift_up(heap, startpos, pos):
newitem = heap[pos] # The item we're sifting up
while pos > startpos: # Don't go beyond the root
parentpos = (pos - 1) >>1 # The same as (pos - 1) // 2
parent = heap[parentpos] # Who's your daddy?
if parent <= newitem: break # Valid parent found
heap[pos] = parent # Otherwise: copy parent down
pos = parentpos # Next candidate position
heap[pos] = newitem # Place the item in its spot
note that the original
function was called
_siftdown
because it’s sifting the value
down in the list. i prefer to
think of it as sifting it
up in the implicit tree structure of the heap, though. note also that, just like
bisect_right
,
the implementation uses a loop rather than recursion.
in addition to
heappop
, there is
heapreplace
, which will pop the smallest item and insert a new element at the
same time, which is a bit more efficient than a
heappop
followed by a
heappush
. the
heappop
operation returns
the root (the first element). to maintain the shape of the heap, the last item is moved to the root position, and
from there it is swapped downward (in each step, with its smallest child) until it is smaller than both its children.
the
heappush
operation is just the reverse: the new element is appended to the list
and is repeatedly swapped
with its parent until it is greater than its parent. both of these operations are logarithmic (also in the worst case,
because the heap is guaranteed to be balanced).
Finally, the module has (since version 2.6) the utility functions
merge
,
nlargest
, and
nsmallest
for merging
sorted inputs and finding the
n largest and smallest items in an iterable, respectively. the latter two functions,
unlike the others in the module, take the same kind of
key
argument as
list.sort
. You can simulate this in the
other functions with the DSu pattern, as mentioned in the sidebar on
bisect
.
although you would probably never
use them that way in python, the heap operations can also form a simple,
efficient, and asymptotically optimal sorting algorithm called
heapsort. it is normally implemented using a max-heap
and works by first performing
heapify
on a sequence, then repeatedly popping off the root (as in
heappop
), and
finally placing it in the now empty last slot. Gradually, as the heap shrinks, the original array is filled from the right
with the largest element, the second largest, and so forth. in other words, heap sort is basically selection sort
where a heap is used to implement the selection. because the initialization is linear and each of the
n selections
is logarithmic, the
running time is loglinear, that is, optimal.
Summary
The algorithm design strategy of divide and conquer involves a decomposition of a problem into roughly equal-sized
subproblems, solving the subproblems (often by recursion), and combining the results. The main reason this is useful
it that the workload is balanced, typically taking you from a quadratic to a loglinear running time. Important examples
of this behavior include merge sort and quicksort, as well as algorithms for finding the closest pair or the convex hull
of a point set. In some cases (such as when searching a sorted sequence or selecting the median element), all but one
of the subproblems can be pruned, resulting in a traversal from root to leaf in the subproblem graph, yielding even
more efficient algorithms.
Chapter 6
■
DiviDe, Combine, anD Conquer
137
The subproblem structure can also be represented explicitly, as it is in binary search trees.
Each node in a search
tree is greater than the descendants in its left subtree but less than those in its right subtree. This means that a binary
search can be implemented as a traversal from the root. Simply inserting random values haphazardly will, on average,
yield a tree that is balanced enough (resulting in logarithmic search times), but it is also possible to
balance the tree,
using node splitting or rotations, to guarantee logarithmic running times in the worst case.
If You’re Curious ...
If you like bisection, you should look up
interpolation search, which for uniformly distributed data has an average-case
running time of
O(lg lg
n). For implementing sets (that is, efficient membership checking) other than sorted
sequences, search trees and hash tables, you could have a look at
Bloom filters. If you like search trees and related
structures, there are lots of them out there. You could find tons of different balancing mechanisms (
red black trees,
Do'stlaringiz bilan baham: