Graphs.
Plan
In computer science, a graph
Operations
3.Distributed memory
In computer science, a graph is an abstract data type that is meant to implement the undirected graph and directed graph concepts from the field of graph theory within mathematics.
A graph data structure consists of a finite (and possibly mutable) set of vertices (also called nodes or points), together with a set of unordered pairs of these vertices for an undirected graph or a set of ordered pairs for a directed graph. These pairs are known as edges (also called links or lines), and for a directed graph are also known as edges but also sometimes arrows or arcs. The vertices may be part of the graph structure, or may be external entities represented by integer indices or references.
A graph data structure may also associate to each edge some edge value, such as a symbolic label or a numeric attribute (cost, capacity, length, etc.).
In the distributed memory model, the usual approach is to partition the vertex set {\displaystyle V} of the graph into {\displaystyle p} sets {\displaystyle V_{0},\dots ,V_{p-1}}. Here, {\displaystyle p} is the amount of available processing elements (PE). The vertex set partitions are then distributed to the PEs with matching index, additionally to the corresponding edges. Every PE has its own subgraph representation, where edges with an endpoint in another partition require special attention. For standard communication interfaces like MPI, the ID of the PE owning the other endpoint has to be identifiable. During computation in a distributed graph algorithms, passing information along these edges implies communication.
Partitioning the graph needs to be done carefully - there is a trade-off between low communication and even size partitioning But partitioning a graph is a NP-hard problem, so it is not feasible to calculate them. Instead, the following heuristics are used.
1D partitioning: Every processor gets {\displaystyle n/p} vertices and the corresponding outgoing edges. This can be understood as a row-wise or column-wise decomposition of the adjacency matrix. For algorithms operating on this representation, this requires an All-to-All communication step as well as {\displaystyle {\mathcal {O}}(m)} message buffer sizes, as each PE potentially has outgoing edges to every other PE.
2D partitioning: Every processor gets a submatrix of the adjacency matrix. Assume the processors are aligned in a rectangle {\displaystyle p=p_{r}\times p_{c}}, where {\displaystyle p_{r}} and {\displaystyle p_{c}} are the amount of processing elements in each row and column, respectively. Then each processor gets a submatrix of the adjacency matrix of dimension {\displaystyle (n/p_{r})\times (n/p_{c})}. This can be visualized as a checkerboard pattern in a matrix. Therefore, each processing unit can only have outgoing edges to PEs in the same row and column. This bounds the amount of communication partners for each PE to {\displaystyle p_{r}+p_{c}-1} out of {\displaystyle p=p_{r}\times p_{c}} possible ones.
The basic operations provided by a graph data structure G usually include:[1]
adjacent(G, x, y): tests whether there is an edge from the vertex x to the vertex y;
neighbors(G, x): lists all vertices y such that there is an edge from the vertex x to the vertex y;
add_vertex(G, x): adds the vertex x, if it is not there;
remove_vertex(G, x): removes the vertex x, if it is there;
add_edge(G, x, y): adds the edge from the vertex x to the vertex y, if it is not there;
remove_edge(G, x, y): removes the edge from the vertex x to the vertex y, if it is there;
get_vertex_value(G, x): returns the value associated with the vertex x;
set_vertex_value(G, x, v): sets the value associated with the vertex x to v.
Structures that associate values to the edges usually also provide:
get_edge_value(G, x, y): returns the value associated with the edge (x, y);
set_edge_value(G, x, y, v): sets the value associated with the edge (x, y) to v.
Do'stlaringiz bilan baham: |