2
Related Work and Background
As described in the introduction, we see the trend that many-core processors
employ NoCs and communication takes place via send and receive operations.
Some current multi-/many-core processors like the Intel Xeon Phi [
4
] do not
114
M. Frieb et al.
employ send/receive operations or a NoC. Instead, they rely on shared memory
and complex coherence protocols. In our opinion, these approaches do not scale
well, because shared resources become the bottleneck when adding more cores.
Therefore, we see the future of many-cores in employing NoCs, like e.g. in the
Intel Single-chip Cloud Computer (SCC) [
13
]. Thereby, small buffers are beneci-
fial, because otherwise most of the chip area would be occupied with buffers. In
the remainder, we only consider architectures working with NoCs and explicit
message passing.
Classical synchronization approaches were developed for distributed sys-
tems [
14
,
15
], where several constraints have to be respected. For example, commu-
nication times might be very long, packets or their parts might get lost or a node
may drop out surprisingly. In a NoC, all nodes are reliable and communication
times are short [
1
,
2
]. However, NoCs contribute a lot to the power consumption
of many-core chips. The NoC of the Intel 80-core Teraflops research chip consumes
28% of the power per tile [
17
]. This percentage increases when more cores are put
on the chip [
3
]. A high amount of this contribution stems from buffers. They need
a lot of chip area, e.g. 60% of the tile area of the Tilera TILE64 many-core [
18
].
Nevertheless, compared to buffers in distributed systems, buffers in NoCs seem to
be very small.
1
Therefore, flow control has to take place.
Our approach is a variation of stop-and-wait protocols (s-a-ws) [
8
,
16
]: in the
original s-a-w after sending a flit, the sender has to wait for an acknowledge-
ment from the receiver before sending the next flit. This means that each flit
has to be acknowledged separately and leads to a high overhead. In contrast,
in our approach the sender waits for a synchronization flit before starting to
send. Instead of acknowledging each flit several flits can be sent. Then, the next
synchronization takes place (see details in Sect.
3
).
Another concept is credit-based flow control [
7
], which works as follows: When
a node wants to send data to another node, it asks it for credit. Then, the receiver
node tells the sender how many receive buffer slots it can use. Therefore, the
sender knows how many flits it can send. While sending, the receiver might
update the credit, then the sender can send more flits. Credit-based flow con-
trol is implemented e.g. in the Æthereal NoC [
6
]. Thereby, a forward channel is
used to send data and a reverse channel to give feedback about buffer utiliza-
tion. Our approach does not dynamically exchange detailed information about
buffer utilization. Instead, it is intended to only find the starting point of the
communication.
In Message Passing Interface (MPI), the standard for message-based commu-
nication [
9
], a function MPI Ssend is defined for synchronous sending/receiving.
It requires that the receiving nodes have already called the function before the
sending node calls it. Therefore, it implements something similar as our synchro-
nization for synchronous communication. However, this takes place at a higher
abstraction level, while our synchronization is realized at low software level or
even at hardware level.
1
At distributed systems, there is plenty of buffer space because the main memory and
swap space (hard disk) may be employed.
Lightweight Hardware Synchronization for Avoiding Buffer Overflows in NoC
115
Do'stlaringiz bilan baham: |