Chapter 4
■
InduCtIon and reCursIon ... and reduCtIon
80
By default, I’m just sorting objects based on their values. By supplying a key function, you can sort by anything
you’d like. note that the keys must be integers in a limited range. If this range is 0...
k–1, the running time is then
Q(
n +
k). (although the common implementation simply
counts the elements and then figures out where to put
them in
B
, python makes it easy to just build value lists for each key and then concatenate them.) If several values
have the same key, they’ll end up in the original order with respect to each other. sorting algorithms with this
property
are called stable.
Counting-sort does need more space than an in-place algorithm like Quicksort, for example, so if your data set and
value range is large, you might get a slowdown from a lack of memory. this can partly be handled by handling
the value range more efficiently. We can do this by sorting numbers on
individual digits (or strings on individual
characters or bit vectors on fixed-size chunks). If you first sort on the
least significant digit, because of stability,
sorting on the
second least significant digit won’t destroy the internal ordering from the first run.
(this is a bit like sorting column by column in a spreadsheet.) this means that for
d digits, you can sort
n
numbers in
Q(
dn) time. this algorithm is called
radix sort, and exercise 4-11 asks you to implement it.
another somewhat similar linear-time sorting algorithm is
bucket sort. It assumes
that your values are evenly
(uniformly) distributed in an interval, for example, real numbers in the interval [0,1), and uses
n buckets, or
subintervals, that you can put your values into directly. In a way, you’re hashing each value into its proper slot,
and the average (expected) size of each bucket is
Q(1). Because the buckets are in order, you can go through
them and have your sorting in
Q(
n) time, in the average case, for random data. (exercise 4-12 asks you to
implement bucket sort.)
The Celebrity Problem
In the celebrity problem, you’re looking for a celebrity in a crowd. It’s
a bit far-fetched, though it could perhaps be
used in analyses of social networks such as Facebook and Twitter. The idea is as follows: The celebrity knows no one,
but everyone knows the celebrity.
10
A more down-to-earth version of the same problem would be examining a set of
dependencies and trying to find a place to start. For example, you might have threads in a multithreaded application
waiting for each other, with even some cyclical dependencies (so-called deadlocks), and you’re looking for one thread
that isn’t waiting for any of the others but that all of the others are dependent on. (A much more realistic way of
handling dependencies—topological sorting—is dealt with in the next section.)
No matter how we dress the problem up, its core can be represented in terms of graphs. We’re looking for one
node
with incoming edges from all other nodes, but with
no outgoing edges. Having gotten a handle on the structures
we’re dealing with, we can implement a brute-force solution, just to see whether it helps us understand anything
(see Listing 4-7).
The naive_celeb function tackles the problem head on. Go through all the people, checking whether each
person is a celebrity. This check goes through all the
others, making sure they all know the
candidate person and that
the candidate person does not know any of them. This version is clearly quadratic, but it’s possible to get the running
time down to linear.
The key, as before, lies in finding a reduction—reducing the problem from
n persons to
n–1 as cheaply as
possible. The naive_celeb implementation does, in fact, reduce the problem step by step. In iteration
k of the outer
loop, we know that none of 0...
k–1 can be the celebrity, so we need to solve the problem only for the remainder, which
is exactly what the remaining iterations do. This
reduction is clearly correct, as is the algorithm. What’s new in this
situation is that we have to try to improve the
efficiency of the reduction. To get a linear algorithm, we need to perform
the reduction in
constant time. If we can do that, the problem is as good as solved. As you can see, this inductive way
of thinking can really help pinpoint where we need to employ our creative problem-solving skills.
10
There are proverbs where this celebrity is replaced with a clown, a fool, or a monkey. Somewhat fitting, perhaps.
Chapter 4
■
InduCtIon and reCursIon ... and reduCtIon
81
Once we’ve zeroed in on what we need to do, the problem isn’t all that hard.
To reduce the problem from n to
n–1, we must find a
noncelebrity, someone who either knows someone or is unknown by someone else. And if we
check G[u][v] for
any nodes u and v, we can eliminate either u or v! If G[u][v] is true, we eliminate u; otherwise, we
eliminate v. If we’re guaranteed that there
is a celebrity, this is all we need. Otherwise, we can still eliminate all but
one candidate, but we need to finish by checking whether they are, in fact, a celebrity, like we did in naive_celeb.
You can find an implementation of the algorithm based on this reduction in Listing 4-8. (You could implement the
algorithm idea even more directly using sets; do you see how?)
Do'stlaringiz bilan baham: