Chapter 5
■
traversal: the skeleton key of algorithmiCs
110
Imagine performing a DFS on this graph (possibly traversing from several starting
points to ensure you cover
the entire graph). Now consider the finish times of the nodes in, say, the strong components A and B. As you can see,
there is an edge from A to B, but there is no way to get from B to A. This has consequences for the finish times. You can
be certain that A will be finished later than B. That is, the last finish time in A will be later than the last finish time in
B. Take a look at Figure
5-7
, and it should be obvious why this is so.
If you start in B, you can never get into A, so B will
finish completely before you even
start (let alone
finish) your traversal of A. If, however, you start in A, you know that
you’ll never get stuck in there (every node can reach every other), so before finishing the traversal, you
will eventually
migrate to B, and you’ll have to finish that (and,
in this case, C) completely before you backtrack to A.
In fact, in general, if there is an edge from any strong component X to another strong component Y, the last finish
time in X will be later than the latest in Y. The reasoning is the same as for our example (see Exercise 5-16). I based my
conclusion on the fact that you couldn’t get from B to A, though—and this is, in fact, how it works for SCCs in general,
because SCCs form a DAG! Therefore, if there’s
an edge from X to Y, there will never be any path from Y to X.
Consider the highlighted components in Figure
5-7
. If you contract them to single “supernodes” (keeping edges
where there were edges originally), you end up with a graph—let’s call it the SCC graph—which looks like this:
This is clearly a DAG, but why will such an SCC graph
always be acyclic? Just assume that there is a cycle in the
SCC graph. That would mean that you could get from one SCC to another and back again. Do you see a problem with
that? Yeah, exactly: every node in the first SCC could
reach every node in the second, and vice versa; in fact, all SCCs
on such a cycle would combine to form a
single SCC, which is a contradiction of our initial assumption that they were
separate.
Now, let’s say you flipped all the edges in the graph. This won’t affect which nodes belong together in SCCs
(see Exercise 5-15), but it
will affect the SCC graph. In our example, you could no longer get out of A. And if you had
traversed
A and started a new round in B, you couldn’t escape from that, leaving only C. And ... wait a minute ... I just
found the strong components there, didn’t I? To apply this idea in general, we always need to start in the SCC without
any in-edges in the original graph (that is, with no out-edges after they’re flipped). Basically, we’re looking for the first
SCC in a topological sorting of the SCC graph. (And then we’ll
move on to the second, and so on.) Looking back at our
initial DFS reasoning, that’s where we’d be if we started our traversal with the node that has the
latest finish time. In
fact, if we choose our starting points for the final traversal by decreasing finish times, we’re guaranteed to fully explore
one SCC at the time because we’ll be blocked from moving to the next one by the reverse edges.
This line of reasoning
can be a bit tough to follow, but the main idea isn’t all that hard. If there’s an edge from
A to B, A will have a later (final) finish time than B. If we choose starting points for our (second) traversal based on
decreasing finish times, this means that we’ll visit A before B. Now, if we
reverse all the edges, we can still explore all of
A, but we can’t move on to B, and this lets us explore a single SCC at a time.
What follows is an outline of the algorithm. Note that instead of “manually” using DFS
and sorting the nodes in
reverse by finish time, I simply use the dfs_topsort function, which does that job for me.
18
1. Run dfs_topsort on the graph, resulting in a sequence seq.
2. Reverse all edges.
3. Run a full traversal, selecting starting points (in order) from seq.
For an implementation of this, see Listing 5-11.
A
B
C
18
This might seem like cheating because I’m using topological sorting on a non-DAG. The idea is just to get the nodes sorted by
decreasing finish time, though, and that’s exactly what dfs_topsort does—in linear time.