B. Qureshi / Future Generation Computer Systems 94 (2019) 453–467
457
Table 1
Configurations of RIoTU testbed.
No of VMs per server
CPU
RAM (GB)
HDD (GB)
1 server (stand-alone)
20%
16
500
1 VM
70%
4
50
2 VM
35%
2
50
4 VM
18%
1
50
8 VM
9%
1
50
Fig. 1.
RIoTU testbed with 4 servers.
3.2. Twitter workload
To create data workloads for this work, we used the Twitter
API [
50
] for collecting tweets. Twitter4J’s java extension library
is available as twitter4j-core-4.0.4.jar [
51
] and was used to col-
lect the data. A stand-alone, simple GUI-based application named
TweetCollector was written in Java Swing to connect to the Twitter
API. This application allows user to provide search terms, collects
tweets containing these search terms and stores in text files. For
this experimentation, search terms including ‘‘Donald Trump’’,
‘‘Hillary Clinton’’, and ‘‘2016 US Elections’’ were used to collect
various sets of tweets over a period of two weeks in December
2016. The resulting tweets were stored as text files (datasets)
in HDFS on the RIoTU testbed. To analyze the performance and
energy consumption on the cluster, we created five datasets of
sizes 0.2 GB, 1 GB, 3 GB, 12 GB and 60 GB.
Further, we used SentiStrength [
18
] to investigate the power
consumption behavior of applications executing on the cluster.
SentiStrength is an algorithm that extracts the sentiment strength
from informal English text and has been widely applied in research
on sentiment analysis on Twitter data. SentiStrength is a single-
process application, for parallel processing, the authors modified
SentiStrength into a MapReduce-based application that executes
in the Hadoop environment. To execute the program, various sets
of parameters are provided including (i) paths for Twitter data
text files, (ii) number of map and reduce jobs, (iii) output file
paths, and (iv) the SentiStrength lookup folder. As the application is
launched in the cluster, a certain number of map and reduce jobs
are initiated. At the initial stage of the process, the data files are
read from the specified input folder. The map jobs split the input
files into data blocks that are read in parallel by the application. As a
map job completes, the reducers start executing the SentiStrength
algorithm and write to the output file specified. The output file
contains the ranking of sentiments in the range of
−
5 (negative
sentiment) to 5 (positive sentiment), giving the overall sentiment
of the tweets present in the file.
Fig. 2.
Power usage profile for SentiStrength application.
Do'stlaringiz bilan baham: