Access memory less
When I first wrote microcode to squeeze the last bit of
efficiency out of a processor, a good rule of thumb was that
the system could execute nine instructions while waiting
for a memory read to complete. Today, that factor is 200 to
500, depending on the architecture. Memory has become
relatively slower. In response, hardware architects have
added systems of registers, pipelines, and caches to keep
the instructions flowing. This has major implications for
software: How do I organize my code and data to minimize
memory usage, cache misses, and so on? My first-order
answer is
•
don’t store data unnecessarily,
•
keep data compact, and
•
access memory in a predictable manner.
This again has implications on software design. Consider
a simple example: generate
N
random integers and insert
them into a sequence so that each is in its proper position in
the numerical order. For example, if the random numbers
are 5, 1, 4, and 2, the sequence should grow like this:
5
1 5
1 4 5
1 2 4 5
Once the
N
elements are in order, we remove them one
at a time by selecting a random position in the sequence
and removing the element there. For example, if we choose
positions 1, 2, 0, and 0 (using 0 as the origin), the sequence
should shrink like this:
1 2 4 5
1 4 5
1 4
4
Now, for which
N
is it better to use a linked list than a
vector (or an array) to represent the sequence? If we naively
apply complexity theory, that answer will be something
like, “Is this a trick question? A list, of course!” We can
insert an element into and delete from a linked list without
moving other elements. In a vector, every element after the
position of an inserted or deleted element must be moved.
Worse still, if you don’t know the maximum number of
elements in advance, it’s occasionally necessary to copy
the entire vector to make room for another element.
Depending on the machine architecture and the pro-
gramming language, the answer will be that the vector
is best for small to medium values of
N
. When I ran the
experiment on my 8-Gbyte laptop, I found
N
to be much
larger than 500,000. The red line in Figure 1 shows the
time taken for the list, and the blue line the time taken
by the vector.
This isn’t a subtle difference. The
x
-axis is in 100,000
elements, and the
y
-axis in seconds. So for “small lists,”
a vector is a better representation of a list than a linked
structure. This is also true for numbers too small to show
on this graph.
Why? First, it takes 4 bytes to store a 4-byte integer in
a vector, but it takes 12 bytes to store it in a doubly linked
list (assuming 4-byte pointers as links). Saying “list” tri-
pled the memory requirement. Actually, it’s worse because
many general-purpose list types store each element as
a separate object in dynamic (heap, free store) memory,
adding another word or two of memory overhead per ele-
ment. The green line in Figure 1 is a list that I optimized
by preallocating space for elements and where each ele-
ment wasn’t a separate object. This demonstrates that even
though allocation overheads are significant, they don’t
explain the vector’s fundamental advantage.
Not only are the list nodes large, they’re scattered in
memory, implying that when we traverse the list to find
a position for insertion or deletion, we randomly access
memory locations in the area that stored the list, causing
cache misses. On the other hand, the hardware really likes
the vector’s sequential access of words in memory. In an
attempt at fairness, I didn’t use a binary search to speed
up insertion into the vector. Nor did I use random access
to find the deletion point in the vector version. This keeps
the number of elements traversed the same for all ver-
sions. In fact, I used identical generic code for the vector
and the lists.
Is this an academic curiosity? No. Infrastructure
application developers tell me that compactness and
predictable access patterns are essential for efficiency.
Power consumption is roughly proportional to the
number of memory accesses, so the red (list) and blue
(vector) lines are first-order approximations to the drain
on a smartphone battery or the number of server farms
needed.
Do'stlaringiz bilan baham: |