B. Qureshi / Future Generation Computer Systems 94 (2019) 453–467
455
compute nodes on the performance and energy consumption of
Hadoop clusters. They conducted a series of experiments to ex-
plore the implications of DVFS settings on power consumption
in Hadoop clusters. They developed policy-based framework to
reduce the overall consumption of power in a Hadoop cluster.
Authors in [
30
,
31
] address the performance of Apache Spark
[
32
] based clusters in data centers. Spark leverage the available
memory to improve the overall performance by introducing re-
silient distributed datasets (RDDs) where intermediate processing
results are cached across machines in the cluster. Researchers
in [
30
] conduct an experimental study to compare the performance
of Hadoop Cluster against Spark cluster for various data sets using
the PageRank algorithm. The results show that performance of the
cluster depends on the availability of memory and the frequency
of disk caching during I/O. Duan et al. in [
31
] propose a RDD
selection algorithm in Spark which improves the iterative compu-
tation by improving the disk caching using the least recently used
mechanism. However, with increased workload, the activity of
caching increases consequently affecting the Storage I/O therefore
increasing the overall cost of running a workflow.
Yang et al. in [
33
] address the Storage I/O inefficiency in Data
Centers using VMware ESXi virtualization platform on traditional
Storage medium such as SATA HDD organized in RAID configu-
rations. They argue that the utilization of high speed SSDs, al-
though improve the overall storage I/O performance, can be fur-
ther improved by addressing the queuing mechanism in the Non-
Volatile Memory Express (NVMe). Intuitively, the proposed im-
provements using faster and energy efficient SSDs could yield
lower energy consumption. Fukushima et al. in [
34
] address the
QoS degradation due to server migration in virtualized data cen-
ters. They build an integer programming model to determine when
and to which location servers should migrate to minimize the total
monetary penalty incurred by the service provider. The proposed
model considers network latency and bottleneck in determining
the monetary cost of VM migration. Bhimani et al. in [
35
] consider
comparison of Container-based virtualization such as Docker [
36
]
against VM placement strategies for data centers. They conclude
that the newer container based virtualization is more performance
and energy efficient, however the results of this study cannot be
generalized to all data centers.
Li [
22
] proposed Oasis, a datacenter expansion strategy for
scaling data center infrastructure while considering power/carbon
emission constraints. Oasis allows switching between green en-
ergy power supplies for optimizing power consumption. While the
benefits of switching power resources is evident, it not clear how
the switching affects the performance of workflows within the data
center. Kong et al. [
26
] presented Green Planning, a framework
to find balance among multiple energy sources, grid power, and
energy storage devices for a data center. The framework minimizes
the lifetime total cost including both capital and operational costs
for a data center. They conducted extensive simulations to eval-
uate Green Planning with a real-life computational workload and
meteorological data traces.
Do'stlaringiz bilan baham: