IMENSIONAL
A
NALYSIS
Remember in Chemistry class, how you solved virtually every prob-
lem by simply setting up the units such that they canceled out, and some-
how the answers popped out as a result? That chemical magic is known
by the highfalutin name of dimensional analysis and it turns out it is
useful in computer systems analysis too.
Let’s do an example to see how dimensional analysis works and why
it is useful. In this case, assume you have to figure out how long, in mil-
liseconds, a single rotation of a disk takes. Unfortunately, you are given
only the RPM of the disk, or rotations per minute. Let’s assume we’re
talking about a 10K RPM disk (i.e., it rotates 10,000 times per minute).
How do we set up the dimensional analysis so that we get time per rota-
tion in milliseconds?
To do so, we start by putting the desired units on the left; in this case,
we wish to obtain the time (in milliseconds) per rotation, so that is ex-
actly what we write down:
T ime (ms)
1 Rotation
. We then write down everything
we know, making sure to cancel units where possible. First, we obtain
1 minute
10,000 Rotations
(keeping rotation on the bottom, as that’s where it is on
the left), then transform minutes into seconds with
60 seconds
1 minute
, and then
finally transform seconds in milliseconds with
1000 ms
1 second
. The final result is
this equation, with units nicely canceled, is:
T ime (ms)
1 Rot.
=
1
minute
10,000 Rot.
·
60
seconds
1
minute
·
1000 ms
1
second
=
60,000 ms
10,000 Rot.
=
6 ms
Rotation
As you can see from this example, dimensional analysis makes what
seems obvious into a simple and repeatable process. Beyond the RPM
calculation above, it comes in handy with I/O analysis regularly. For
example, you will often be given the transfer rate of a disk, e.g.,
100 MB/second, and then asked: how long does it take to transfer a
512 KB block (in milliseconds)? With dimensional analysis, it’s easy:
T ime (ms)
1 Request
=
512
KB
1 Request
·
1
M B
1024
KB
·
1
second
100
M B
·
1000 ms
1
second
=
5 ms
Request
37.4 I/O Time: Doing The Math
Now that we have an abstract model of the disk, we can use a little
analysis to better understand disk performance. In particular, we can
now represent I/O time as the sum of three major components:
T
I/O
= T
seek
+ T
rotation
+ T
transf er
(37.1)
O
PERATING
S
YSTEMS
[V
ERSION
0.80]
WWW
.
OSTEP
.
ORG
H
ARD
D
ISK
D
RIVES
409
Cheetah 15K.5
Barracuda
Capacity
300 GB
1 TB
RPM
15,000
7,200
Average Seek
4 ms
9 ms
Max Transfer
125 MB/s
105 MB/s
Platters
4
4
Cache
16 MB
16/32 MB
Connects via
SCSI
SATA
Table 37.1: Disk Drive Specs: SCSI Versus SATA
Note that the rate of I/O (R
I/O
), which is often more easily used for
comparison between drives (as we will do below), is easily computed
from the time. Simply divide the size of the transfer by the time it took:
R
I/O
=
Size
T ransf er
T
I/O
(37.2)
To get a better feel for I/O time, let us perform the following calcu-
lation. Assume there are two workloads we are interested in. The first,
known as the random workload, issues small (e.g., 4KB) reads to random
locations on the disk. Random workloads are common in many impor-
tant applications, including database management systems. The second,
known as the sequential workload, simply reads a large number of sec-
tors consecutively from the disk, without jumping around. Sequential
access patterns are quite common and thus important as well.
To understand the difference in performance between random and se-
quential workloads, we need to make a few assumptions about the disk
drive first. Let’s look at a couple of modern disks from Seagate. The first,
known as the Cheetah 15K.5 [S09b], is a high-performance SCSI drive.
The second, the Barracuda [S09a], is a drive built for capacity. Details on
both are found in Table
37.1
.
As you can see, the drives have quite different characteristics, and
in many ways nicely summarize two important components of the disk
drive market. The first is the “high performance” drive market, where
drives are engineered to spin as fast as possible, deliver low seek times,
and transfer data quickly. The second is the “capacity” market, where
cost per byte is the most important aspect; thus, the drives are slower but
pack as many bits as possible into the space available.
From these numbers, we can start to calculate how well the drives
would do under our two workloads outlined above. Let’s start by looking
at the random workload. Assuming each 4 KB read occurs at a random
location on disk, we can calculate how long each such read would take.
On the Cheetah:
T
seek
= 4 ms, T
rotation
= 2 ms, T
transf er
= 30 microsecs
(37.3)
c
2014, A
RPACI
-D
USSEAU
T
HREE
E
ASY
P
IECES
410
H
ARD
D
ISK
D
RIVES
T
IP
: U
SE
D
ISKS
S
EQUENTIALLY
When at all possible, transfer data to and from disks in a sequential man-
ner. If sequential is not possible, at least think about transferring data
in large chunks: the bigger, the better. If I/O is done in little random
pieces, I/O performance will suffer dramatically. Also, users will suffer.
Also, you will suffer, knowing what suffering you have wrought with
your careless random I/Os.
The average seek time (4 milliseconds) is just taken as the average time
reported by the manufacturer; note that a full seek (from one end of the
surface to the other) would likely take two or three times longer. The
average rotational delay is calculated from the RPM directly. 15000 RPM
is equal to 250 RPS (rotations per second); thus, each rotation takes 4 ms.
On average, the disk will encounter a half rotation and thus 2 ms is the
average time. Finally, the transfer time is just the size of the transfer over
the peak transfer rate; here it is vanishingly small (30 microseconds; note
that we need 1000 microseconds just to get 1 millisecond!).
Thus, from our equation above, T
I/O
for the Cheetah roughly equals
6 ms. To compute the rate of I/O, we just divide the size of the transfer
by the average time, and thus arrive at R
I/O
for the Cheetah under the
random workload of about 0.66 MB/s. The same calculation for the Bar-
racuda yields a T
I/O
of about 13.2 ms, more than twice as slow, and thus
a rate of about 0.31 MB/s.
Now let’s look at the sequential workload. Here we can assume there
is a single seek and rotation before a very long transfer. For simplicity,
assume the size of the transfer is 100 MB. Thus, T
I/O
for the Barracuda
and Cheetah is about 800 ms and 950 ms, respectively. The rates of I/O
are thus very nearly the peak transfer rates of 125 MB/s and 105 MB/s,
respectively. Table
37.2
summarizes these numbers.
The table shows us a number of important things. First, and most
importantly, there is a huge gap in drive performance between random
and sequential workloads, almost a factor of 200 or so for the Cheetah
and more than a factor 300 difference for the Barracuda. And thus we
arrive at the most obvious design tip in the history of computing.
A second, more subtle point: there is a large difference in performance
between high-end “performance” drives and low-end “capacity” drives.
For this reason (and others), people are often willing to pay top dollar for
the former while trying to get the latter as cheaply as possible.
Cheetah
Barracuda
R
I/O
Random
0.66 MB/s
0.31 MB/s
R
I/O
Sequential
125 MB/s
105 MB/s
Table 37.2: Disk Drive Performance: SCSI Versus SATA
O
PERATING
S
YSTEMS
[V
ERSION
0.80]
WWW
.
OSTEP
.
ORG
H
ARD
D
ISK
D
RIVES
411
A
SIDE
: C
Do'stlaringiz bilan baham: |