Print indd

Download 18,42 Mb.

Pdf ko'rish

bet	126/366
Sana	31.12.2021
Hajmi	18,42 Mb.
	#276933

1 ... 122 123 124 125 126 127 128 129 ... 366

Bog'liq
(Lecture Notes in Computer Science 10793) Mladen Berekovic, Rainer Buchty, Heiko Hamann, Dirk Koch, Thilo Pionteck - Architecture of Computing Systems – ARCS

Fig. 1. Node C ﬁlls the receive buﬀer of node B, which is currently waiting for ﬂits from
node A. Thus, node B is busy processing received ﬂits before it can answer the request
from node A which it was originally waiting for. Boxes represent local computation
times and arrows the delivery of ﬂits.
The problem is illustrated in Fig.
1
: There are three nodes A, B and C run-
ning a parallel application. Each of them does some local computation (boxes),
followed by communication (ﬂits represented as arrows). The computation of
node A takes a little bit longer than on nodes B and C. Meanwhile, node C
ﬁnishes its local computation and sends several ﬂits to node B. Node A sends
a request to node B, but node B is busy processing ﬂits sent by node C. In the
case when the receive buﬀer of node B is full, the request from node A even
cannot be stored there. Thus, A has to wait until C is ﬁnished, then it can send
its request to B again.
To avoid buﬀer overﬂows, we propose to add a synchronization mechanism:
each node planing to send data has to wait for a ﬂit from its intended receiver
indicating that it is ready to handle incoming ﬂits. A receiver node sends this ﬂit
when reaching its receive operation, ensuring that it is fully capable to process
incoming ﬂits. When implementing this synchronization in software, there might
still arrive a lot of synchronization ﬂits at a node (at most one from each other
node). Thus, we suggest to realize it with hardware support. Our hardware
implementation stores synchronization information and makes it available for the
processing element when it asks for it. Thereby, we focus on minimal hardware
and synchronization overhead.
Altogether, the contribution of our paper is a cheap and simple hardware syn-
chronization mechanism which can easily be controlled in software, to increase
performance while decreasing receive buﬀer size and hardware costs. Our app-
roach is independent of router design and network topology.
The remainder of this paper is structured as follows: In the next section,
we present related work and backgrounds. Afterwards, ﬁrst our synchroniza-
tion concept is explained in Sect.
3
, followed by the description of the hardware
implementation in Sect.
4
and subsequently it is evaluated in Sect.
5
. Finally, the
paper is concluded in Sect.
6
.

Download 18,42 Mb.

Do'stlaringiz bilan baham:

1 ... 122 123 124 125 126 127 128 129 ... 366