Exploring the World of Graphs
It’s called a spider trap because spammers devised it as a way to catch search
engine software spiders in a loop and let them believe that the only websites were
the ones inside the closed network.
Struggling with a naive implementation
Given a graph made by using Python and NetworkX, you can extract its structure
and render it as a transition matrix, a matrix that represents nodes in columns
and rows:
»
Columns: Contain the node a web surfer is on
»
Rows: Contain the probability that the surfer will visit other nodes because of
outbound links
In the real web, the transition matrix that feeds the PageRank algorithm is built
by spiders’ continuous exploration of links.
def initialize_PageRank(graph):
nodes = len(graph)
M = nx.to_numpy_matrix(graph)
outbound = np.squeeze(np.asarray(np.sum(M, axis=1)))
prob_outbound = np.array(
[1.0/count
if count>0 else 0.0 for count in outbound])
G = np.asarray(np.multiply(M.T, prob_outbound))
p = np.ones(nodes) / float(nodes)
if np.min(np.sum(G,axis=0)) < 1.0:
print ('Warning: G is substochastic')
return G, p
FIGURE 11-3:
A spider trap.
CHAPTER 11
Do'stlaringiz bilan baham: |