BLaCK BOX: BISeCt
binary search can be applied in many settings, but the straight “search for a value on a sorted sequence” version is
available in the standard library, in the
bisect
module. it contains the
bisect
function, which works as expected:
>>> from bisect import bisect
>>> a = [0, 2, 3, 5, 6, 8, 8, 9]
>>> bisect(a, 5)
4
Well, it’s sort of what you’d expect ... it doesn’t return the position of the 5 that’s already there. rather, it reports
the position to insert the new 5, making sure it’s placed after all existing items with the same value. in fact,
bisect
is another name for
bisect_right
, and there’s also a
bisect_left
:
>>> from bisect import bisect_left
>>> bisect_left(a, 5)
3
the
bisect
module is implemented in C, for speed, but in earlier versions (prior to python 2.4) it was actually a
plain python module, and the code for
bisect_right
was as follows (with my comments):
def bisect_right(a, x, lo=0, hi=None):
if hi is None: # Searching to the end
hi = len(a)
while lo < hi: # More than one possibility
mid = (lo+hi)//2 # Bisect (find midpoint)
if x < a[mid]: hi = mid # Value < middle? Go left
else: lo = mid+1 # Otherwise: go right
return lo
as you can see, the implementation is iterative, but it’s entirely equivalent to the recursive version.
there is also another pair of useful functions in this module:
insort
(alias for
insort_right
) and
insort_left
.
these functions find the right position, like their
bisect
counterparts, and then actually insert the element. While
the insertion is still a linear operation, at least the search is logarithmic (and the actual insertion code is pretty
efficiently implemented).
Chapter 6
■
DiviDe, Combine, anD Conquer
120
Sadly, the various functions of the
bisect
library don’t support the
key
argument, used in
list.sort
, for
example. You can achieve similar functionality with the so-called decorate, sort, undecorate (or, in this case,
decorate, search, undecorate) pattern, or DSu for short:
>>> seq = "I aim to misbehave".split()
>>> dec = sorted((len(x), x) for x in seq)
>>> keys = [k for (k, v) in dec]
>>> vals = [v for (k, v) in dec]
>>> vals[bisect_left(keys, 3)]
'aim'
or, you could do it more compactly:
>>> seq = "I aim to misbehave".split()
>>> dec = sorted((len(x), x) for x in seq)
>>> dec[bisect_left(dec, (3, ""))][1]
'aim'
as you can see, this involves creating a new, decorated list, which is a linear operation. Clearly, if we do this
before every search, there’d be no point in using
bisect
. if, however, we can keep the decorated list between
searches, the pattern can be useful. if the sequence isn’t sorted to begin with, we can perform the DSu as part of
the sorting, as in the previous example.
Traversing Search Trees ... with Pruning
Binary search is the bee’s knees. It’s one of the simplest algorithms out there, but it really packs a punch. There is one
catch, though: To use it, your values must be sorted. Now, if we could keep them in a linked list, that wouldn’t be a
problem. For any object we wanted to insert, we’d just find the position with bisection (logarithmic) and then insert
it (constant). The problem is—that won’t work. Binary search needs to be able to check the middle value in constant
time, which we can’t do with a linked list. And, of course, using an array (such as Python’s lists) won’t help. It’ll help
with the bisection, but it ruins the insertion.
If we want a modifiable structure that’s efficient for search, we need some kind of middle ground. We need a
structure that is similar to a linked list (so we can insert elements in constant time) but that still lets us perform a
binary search. You may already have figured the whole thing out, based on the section title, but bear with me. The first
thing we need when searching is to access the middle item in constant time. So, let’s say we keep a direct link to that.
From there, we can go left or right, and we’ll need to access the middle element of either the left half or the right half.
So ... we can just keep direct links from the first item to these two, one “left” reference and one “right” reference.
In other words, we can just represent the structure of the binary search as an explicit tree structure! Such a tree
would be easily modifiable, and we could traverse it from root to leaf in logarithmic time. So, searching is really
our old friend traversal—but with pruning. We wouldn’t want to traverse the entire tree (resulting in a so-called
linear scan). Unless we’re building the tree from a sorted sequence of values, the “middle element of the left half”
terminology may not be all that helpful. Instead, we can think of what we need to implement our pruning. When we
look at the root, we need to be able to prune one of the subtrees. (If we found the value we wanted in an internal node
and the tree didn’t contain duplicates, we wouldn’t continue in either subtree, of course.)
The one thing we need is the so-called search tree property: For a subtree rooted at r, all the values in the left
subtree are smaller than (or equal to) the value of r, while those in the right subtree are greater. In other words, the
value at a subtree root bisects the subtree. An example tree with this property is shown in Figure
6-6
, where the node
labels indicate the values we’re searching. A tree structure like this could be useful in implementing a set; that is, we
could check whether a given value was present. To implement a mapping, however, each node would contain both a
key, which we searched for, and a value, which was what we wanted.
Chapter 6
■
DiviDe, Combine, anD Conquer
121
Usually, you don’t build a tree in bulk (although that can be useful at times); the main motivation for using trees
is that they’re dynamic, and you can add nodes one by one. To add a node, you search for where it should have been
and then add it as a new leaf there. For example, the tree in Figure
6-6
might have been built by initially adding 8 and
then 12, 14, 4, and 6, for example. A different ordering might have given a different tree.
Listing 6-2 gives you a simple implementation of a binary search tree, along with a wrapper that makes it look a
bit like a dict. You could use it like this, for example:
>>> tree = Tree()
>>> tree["a"] = 42
>>> tree["a"]
42
>>> "b" in tree
False
As you can see, I’ve implemented insertion and search as free-standing functions, rather than methods. That’s so
that they’ll work also on None nodes. (You don’t have to do it like that, of course.)
Do'stlaringiz bilan baham: |