Bog'liq (Lecture Notes in Computer Science 10793) Mladen Berekovic, Rainer Buchty, Heiko Hamann, Dirk Koch, Thilo Pionteck - Architecture of Computing Systems – ARCS
4 Balancing Accuracy and Overhead The selection of sampling periods involves a trade-off between accuracy of the
measurements and overhead on the system CPUs for running the measurement
system. Shorter sampling periods better approximate continuous signals and thus
increase the accuracy, but also result in higher CPU load. To quantify this trade-
off, we have conducted a series of measurements on our heterogeneous compute
node with the rapl, nvml, and maxeler components enabled in PAPI. Each
measurement has lasted for a period of 60 s on an idling system, i.e., except the
operating system no tasks have been executed. The sampling periods have been
varied between 10 ms and 100 ms with an increment of 10 ms. Each measurement
has been repeated 41 times and the results have been averaged.
Figure
5
shows the results and displays the average power dissipation and the
utilization, combined for both CPUs, as functions of the sampling period. As a
baseline, we have developed a minimal program which is just able to read the
CPU energy consumption and utilization without making use of the Ampehre
framework, i.e., Ampehre and PAPI libraries. This simple program has been
executed under (i) an entirely booted CentOS Linux (black curve) and (ii) a
Linux kernel executing BusyBox [
13
] (gray curve). The blue, red, and green
curves in Fig.
5
illustrate the impact of Ampehre when sampling the CPU, GPU,
and FPGA, respectively. In addition, the purple curve shows Ampehre’s impact
if all three components are enabled with identical sampling periods.
As expected, the results show that longer sampling periods lead to less over-
heads on the CPUs for both utilization and average power dissipation. For exam-
ple, when increasing the sampling period (purple curve) from 10 ms to 100 ms
the average power dissipation drops from 33.56 W to 16.66 W, and the CPU
utilization decreases from 13.29% to just 1.24%. Moreover, it can be seen that
higher sampling periods show results close to the baseline implementation exe-
cuted under CentOS. When comparing Ampehre with the minimal measurement
program executed under CentOS for sampling all three components, the average
power consumption drops from 17.57 W at a sampling period of 10 ms to 1.83 W
at 100 ms. Contrasting the two baselines, it becomes apparent that a fully booted
operating system leads to a distinct increase in power dissipation which varies
in the range of 4.24 W and 6.97 W between the baselines.
In a real use of Ampehre the heterogeneous components are likely to be
sampled at different periods, which makes the estimation of the CPU overheads
more involved. We have conducted a second series of measurements where we
82
A. L¨
osch et al.
10
15
20
25
30
35
P
o
w
er
[W]
10
20
30
40
50
60
70
80
90
100
Sampling Period [ms]
0
5
10
15
Utilization
[%]
CPU (rapl)
GPU (nvml)
FPGA (maxeler)
Kernel + CentOS
Kernel + BusyBox
CPU + GPU + FPGA