Print indd

Download 18,42 Mb.

Pdf ko'rish

bet	97/366
Sana	31.12.2021
Hajmi	18,42 Mb.
	#276933

1 ... 93 94 95 96 97 98 99 100 ... 366

Bog'liq
(Lecture Notes in Computer Science 10793) Mladen Berekovic, Rainer Buchty, Heiko Hamann, Dirk Koch, Thilo Pionteck - Architecture of Computing Systems – ARCS

2
Related Work
There are diﬀerent approaches in current research to increase the simulation
speed of virtual prototyping tools without sacriﬁcing the accuracy. Since the
amount of processor cores per SoC steadily rises, a sequential simulation cannot
keep up with the new hardware. Therefore, simulators were created which also
parallelize the emulation of the multi-core platforms. Manifold [
14
] is an example
for a simulation framework that was exactly designed for this purpose of creating
a multi-core system. It shows tremendous speedups of up to 12 times the speed of
the sequential emulation of a 64-core platform. The next step is to distribute the
threads of the simulation tool across multiple host computers which results in

A Hybrid Approach for Runtime Analysis
87
distributed simulation introducing new challenges. An overview of the diﬀerent
issues and methods of parallel and distributed discrete event simulation is given
in [
7
]. While these approaches are the future solutions for handling the systems
with many cores, they don’t optimize a single complex core very well. This issue
is addressed in this paper.
The Sniper [
3
] simulator is a tool increasing the exactness of the instruction
accurate simulation (called one-IPC (instruction per cycle) simulation) with-
out introducing the overhead of a cycle accurate simulator. It separates the
instruction stream into intervals which are analyzed regarding their architec-
tural behavior and stored in an execution window during the emulation. This
allows to model time penalties of real hardware occurring because of data depen-
dencies between instructions or cache misses. In their paper they also suggested
to parallelize the execution of the simulation. They achieve an average absolute
error of less than 23.8% for the SPLASH-2 benchmark but have a slow-down of
2–3 times in comparison to the one-IPC simulation. The work presented in this
paper is intended as an alternative way to achieve similar beneﬁts like Sniper.
Switching the processor models of gem5 like presented in this work was done
before by Hsieh et al. [
9
]. They use this approach to fast-forward to their region
of interest. As soon as the inaccurate (they call it “in-order”) model reaches the
point which has to be investigated, the accurate out-of-order model is switched
in. How this region is found and how they keep track of the instruction ﬂow is
not explained. Additionally, since this work was not their main topic, no further
comparisons of the accuracy achieved for the simulation time required for the full
program was made. The mechanics of gem5 to exchange certain processor models
was also used to emulate dynamic voltage and frequency scaling. Haririan et al.
implemented this feature for gem5 [
8
]. However, their main focus for evaluation
lied on the accuracy of the method. Thus, they did not try to accelerate their
work or to compare it regarding its simulation speed.
This section shows that there are already many approaches to improve the
eﬃciency of CPU simulation. But to the knowledge of the authors, no evaluations
integrating both dimensions, the simulation time and the accuracy, were made.
Hence, the newly proposed methodology is evaluated in a way that includes both
metrics. For the future, it is expected that some of the related work presented
here might also beneﬁt from the proposed methodology.

Download 18,42 Mb.

Do'stlaringiz bilan baham:

1 ... 93 94 95 96 97 98 99 100 ... 366