Bog'liq (Lecture Notes in Computer Science 10793) Mladen Berekovic, Rainer Buchty, Heiko Hamann, Dirk Koch, Thilo Pionteck - Architecture of Computing Systems – ARCS
3.2 CaCAO Approach (γ) If one only has (
α) and (β), no one-fits-all solution would be available. But we
overcome the deficiencies of both software lock-based (
α) as well as software
lock-free (
β) implementations by combining their conceptual advantages: zero
retries and lock-freeness.
We propose a dedicated hardware (γ) implementation that outsources and
atomically executes the whole critical section in a dedicated hardware module
(near the memory where the shared data is stored), thereby guaranteeing zero
retries by design. Since an upper bound for the execution time can be given, this
approach is not only lock-free, but also wait-free.
In the best-case, the whole execution (NoC travel time plus atomic read-
modify-write cycles) of the lock-free primitives (
β and γ) is approximately as
small as the minimal time for lock acquisition of the lock-based variant (
α).
More importantly, the average and worst case times for the proposed dedicated
hardware solution (
γ) do not rise much, since no interfering concurrency is pos-
sible and therefore no retries are necessary. Especially for remote accesses the
atomic read-modify-write cycles with constant duration after bus grant are much
shorter than the travel time over the NoC. Even if there are several concurrent
contenders, they cannot interfere one another due to the atomic read-modify-
write cycles in hardware, thereby guaranteeing wait-free operation.
This approach has general validity and outperforms the software lock-based
as well as lock-free variants by design. This local or remote site execution in a
dedicated hardware module (CaCAO approach) helps to tackle the data-to-task
locality problem of distributed shared memory architectures.
In contrast to software based lock-free implementations that might get quite
complex [
4
], CaCAO does not need atomic operations inside the critical section,
since the atomicity is intrinsically provided by the dedicated hardware module.
However, the disadvantage of this approach is its very application specific
nature due to the need of implementing each needed functionality in dedicated
hardware.
In future work, we plan to further extend the functionality and complexity of
CaCAO. Because of the compositional nature of this approach, various already
144
S. Rheindt et al.
(a) lock-based
(b) lock-free
(c) dedicated hardware