Chapter 9
■
From a to B with edsger and Friends
206
u, p = t, []
while u is not None: # Walk backward from t
p.append(u) # Append every predecessor
u = P[u] # Take another step
p.reverse() # The path is backward
return p
The main idea of the WordSpace class is that it works as a weighted graph so that it can be used with our a_star
implementation. If G is a WordSpace, G['lead'] would be a dict with other words (such as 'load' and 'mead') as keys
and 1 as weight for every edge. The default heuristic I’ve used simply counts the number of positions at which the
words differ.
Using the WordSpace class
is easy enough, as long as you have a word list of some sort. Many UNIX systems have a
file called /usr/share/dict/words or /usr/dict/words, with a single word per line. If you don’t have such a file, you
could get one from
http://ftp.gnu.org/gnu/aspell/dict/en
. If you don’t have this file, you could probably find it
(or something similar) online. You could then construct a WordSpace like this, for example (removing whitespace and
normalizing everything to lowercase):
>>> words = set(line.strip().lower() for line in open("/usr/share/dict/words"))
>>> G = WordSpace(words)
If you’re getting word ladders that you don’t like, feel free to remove some words from the set, of course.
10
Once
you have your WordSpace, it’s time to roll:
>>> G.ladder('lead', 'gold')
['lead', 'load', 'goad', 'gold']
Pretty neat, but not
that impressive, perhaps. Now try the following:
>>> G.ladder('lead', 'gold', h=lambda v: 0)
I’ve simply replaced the heuristic with a completely uninformative one, basically turning our A* into BFS (or, rather,
Dijkstra’s algorithm running on an unweighted graph). On my computer (and with my word list), the difference in
running time is pretty noticeable. In fact, the speedup factor when using the first (default) heuristic is close to 100!
11
Summary
A bit more narrowly focused than the previous ones, this chapter dealt with finding optimal routes in network-like
structures and spaces—in
other words, shortest paths in graphs. Several of the basic ideas and mechanisms used in
the algorithms in this chapter have been covered earlier in the book, and so we could build our solutions gradually.
One fundamental tactic common to all the shortest path algorithms is that of looking for
shortcuts, either through
a new possible next-to-last node along a path, using the relax function or something equivalent (most of the
algorithms do this), or by considering a shortcut consisting of two subpaths, to and from some intermediate node
(the strategy of Floyd-Warshall). The relaxation-based algorithms approach things differently, based on their
assumptions about the graph. The Bellman-Ford algorithm simply tries to construct shortcuts with every edge in
turn and repeats this procedure for at most
n-1 iterations (reporting a negative cycle if there is still potential for
improvement).
10
For example, when working with my alchemical example,
I removed words such as algedo and
dola.
11
That number is 100, not the factorial of 100. (And most certainly not the 11th power of the factorial of 100.)
Chapter 9
■
From a to B with edsger and Friends
207
You saw in Chapter 8 that it’s possible to be more efficient than this; for DAGs, it’s possible to relax each edge only
once, as long as we visit the nodes in topologically sorted order. A topsort isn’t possible for a general graph, but if we
disallow negative edges, we can find a topological sorting that respects the edges that
matter—namely, sorting the
nodes by their distance from the starting node. Of course, we don’t know this sorting to begin with, but we can build it
gradually, by always picking the remaining node with the lowest distance estimate, as in Dijkstra’s algorithm. We know
this
is the thing to do, because we’ve already relaxed the out-edges of all its possible predecessors, so the next one in
sorted order must now have a correct estimate—and the only one this could be is the one with the lowest upper bound.
When finding distances between all pairs of nodes, we have a couple of options. For example, we could run
Dijkstra’s algorithm from every possible start node. This is quite good for rather sparse graphs, and, in fact, we can use
this approach even if the edges aren’t all positive! We do this by first running Bellman-Ford and then adjusting all the
edges so that we (1) maintain the length-ranks of the paths (the shortest is still the shortest) and (2) make the edge
weights positive. Another option is to use dynamic programming, as in the Floyd-Warshall
algorithm, where each
subproblem is defined by its start node, its end node, and the number of the other nodes (in
some predetermined
order) we’re allowed to pass through.
There’s no known method of finding the shortest path from one node to another that is better, asymptotically,
than finding the shortest paths from the starting node to all the others. Still, there are some heuristic approaches that
can give improvements in practice. One of these is to search
bidirectionally, performing a traversal from both the
start node and the end node “simultaneously,” and then terminate when the two meet, thereby reducing the number
of nodes that need be visited (or so we hope). Another approach is using a heuristic “best-first” approach, with a
heuristic function to guide us toward more promising nodes before less promising ones, as in the A* algorithm.
If You’re Curious ...
Most algorithm books will give you explanations and descriptions of the basic algorithms for finding shortest paths.
Some of the more advanced heuristic ones though, such as A*, are more usually discussed in books on artificial
intelligence. There you can also find thorough explanations on how to use such algorithms (and other, related ones) to
search through complex solution spaces that look nothing like the explicit graph structures we’ve been working with.
For a solid foundation in these aspects of artificial intelligence, I heartily recommend the wonderful book by Russell
and Norvig. For ideas on heuristics for the A* algorithm, you could try to do a web search for “shortest path” along
with “landmarks” or “ALT.”
If you want to push Dijkstra’s algorithm on the asymptotic front, you could look into Fibonacci heaps. If you swap
out the binary heap for a Fibonacci heap, Dijkstra’s algorithm gets an improved asymptotic running time, but chances
are that your performance will still take a hit, unless you’re working with really large instances, as Python’s heap
implementation is
really fast, and a Fibonacci heap (a rather complicated affair) implemented in Python probably
won’t be. But still—worth a look.
Finally, you might want to combine the bidirectional version of Dijkstra’s algorithm with the heuristic mechanism
of A*. Before you do, though, you should research the issue a bit—there are pitfalls here that could invalidate your
algorithm. One (slightly advanced) source of information on this and the use of landmark-based heuristics (as well as
the challenges of a graph that changes over time) is the paper by Nannicini et al. (see “References”).
Exercises
9-1. In some cases, discrepancies in exchange rates between currencies make it possible to exchange
from
one currency to another, continuing until one gets back to the original, having made a profit. How
would you use the Bellman-Ford algorithm to detect the presence of such a situation?
9-2. What happens in Dijkstra’s algorithm if more than one node has the same distance from the start
node? Is it still correct?
9-3. Why is it a really bad idea to represent edge length using dummy nodes, like in Figure
9-3
?
Chapter 9
■
From a to B with edsger and Friends
208
9-4. What would the running time of Dijkstra’s algorithm be if you implemented it with an unsorted list
instead of a binary heap?
9-5. Why can we be certain that the adjusted weights in Johnson’s algorithm are nonnegative? Are
there cases where things can go wrong?
9-6. In Johnson’s algorithm, the
h function is based on the Bellman-Ford algorithm. Why can’t we just
use an arbitrary function here? It would disappear in the telescoping sum anyway?
9-7. Implement the memoized version of Floyd-Warshall so it saves memory in the same way as the
iterative one.
9-8. Extend the memoized version of Floyd-Warshall to compute a P table, just like the iterative one.
9-9. How would you modify the Floyd-Warshall algorithm so it detects the
presence of paths, rather
than finding the
shortest paths (Warshall’s algorithm)?
9-10. Why does correctness for the tighter stopping criterion for the bidirectional version of Dijkstra’s
algorithm imply correctness for the original?
9-11. In the correctness proof for the bidirectional version of Dijkstra’s algorithm, I posited a
hypothetical path that would be shorter than the best one we’d found so far
and stated that it had to
contain an edge (
u,
v) such that
d(
s,
u) <
l and
d(
v,
t) <
r. Why is this the case?
9-12. Rewrite bidir_dijkstra so it doesn’t require the input graph to be symmetric, with zero-weight
self-edges.
9-13. Implement a bidirectional version of BFS.
9-14. Why is
h(
v) a lower bound on
d(
v,
t) when
w’ is feasible?
References
Dijkstra, E. W. (1959). A note on two problems in connexion with graphs.
Numerische Mathematik, 1(1):269-271.
Nannicini, G., Delling, D., Liberti, L., and Schultes, D. (2008). Bidirectional A* search for time-dependent fast paths.
In
Proceedings of the 7th international conference on Experimental algorithms, Lecture Notes in Computer Science,
pages 334-346.
Russell, S. and Norvig, P. (2009).
Artificial Intelligence: A Modern Approach, third edition. Prentice Hall.