Discovering Graph Secrets
199
»
Closed: All three people know each other. Think about a family setting in this
case, in which everyone knows everyone else.
»
Open: One person knows two other people, but the two other people don’t
know each other. Think about a person who knows an individual at work and
another individual at home, but the individual at work doesn’t know anything
about the individual at home.
»
Connected pair: One person knows one of the other people in a triad but
doesn’t know the third person. This situation involves two people who know
something about each other meeting someone new — someone who
potentially wants to be part of the group.
»
Unconnected: The triad forms a group, but no one in the group knows each
other. This last one might seem a bit odd, but think about a convention or
seminar. The people at these events form a group, but they may not know
anything about each other. However, because they have similar interests, you
can use clustering to understand the behavior of the group.
Triads occur naturally in relationships, and many Internet social networks have
leveraged this idea to accelerate the connections between participants. The density
of connections is important for any kind of social network because a connected
network can spread information and share content more easily. For instance, when
LinkedIn, the professional social network (
https://www.linkedin.com/
), decided
to increase the connection density of its network, it started by looking for open
triads and trying to close them by inviting people to connect. Closing triads is at
the foundation of LinkedIn’s Connection Suggestion algorithm. You can discover
more about how it works by reading the Quora’s answer at:
https://www.quora.
com/How-does-LinkedIns-People-You-May-Know-work
.
The example in this section relies on the Zachary’s Karate Club sample graph
described at
https://networkdata.ics.uci.edu/data.php?id=105
. It’s a small
graph that lets you see how networks work without spending a lot of time loading
a large dataset. Fortunately, this dataset appears as part of the
networkx
package
introduced in Chapter 8. The Zachary’s Karate Club network represents the friend-
ship relationships between 34 members of a karate club from 1970 to 1972.
Sociologist Wayne W. Zachary used it as a topic of study. He wrote a paper on it
entitled “An Information Flow Model for Conflict and Fission in Small Groups.”
The interesting fact about this graph and its paper is that in those years, a conflict
arose in the club between one of the karate instructors (node number 0) and the
president of the club (node number 33). By clustering the graph, you can almost
perfectly predict the split of the club into two groups shortly after the occurrence.
Because this example also draws a graph showing the groups (so that you can
visualize them easier), you also need to use the
matplotlib
package. The following
code shows how to graph the nodes and edges of the dataset. (You can find this
200
PART 3
Do'stlaringiz bilan baham: |