Understanding the Need to Sort and Search
Putting everything into buckets
Until now, the search and sort routines in the book work by performing a series of
comparisons until the algorithm finds the correct value. The act of performing
comparisons slows the algorithms because each comparison takes some amount
of time to complete.
A smarter way to perform the task involves predicting the location of a particular
data item in the data structure (whatever that structure might be) before actually
looking for it. That’s what a hash table does —provides the means to create an
index of keys that points to individual items in a data structure so that an algo-
rithm can easily predict the location of the data. Placing keys into the index
involves using a hash function that turns the key into a numeric value. The numeric
value acts as an index into the hash table, and the hash table provides a pointer to
the full record in the dataset. Because the hash function produces repeatable
results, you can predict the location of the required data. In many cases, a hash
table provides a search time of O(1). In other words, you need only one comparison
to find the data.
A hash table contains a specific number of slots that you can view as buckets for
holding data. Each slot can hold one data item. The number of filled slots when
compared to the number of available slots is the load factor. When the load factor
is high, the potential for collisions (where two data entries have the same hash
value) becomes greater as well. The next section of the chapter discusses how to
avoid collisions, but all you really need to know for now is that they can occur.
One of the more typical methods for calculating the hash value for an input is to
obtain the modulus of the value divided by the number of slots. For example, if
you want to store the number 54 into a hash table containing 15 slots, the hash
value is 9. Consequently, the value 54 goes into slot 9 of the hash table when the
slots are numbers from 0 through 14 (given that the table has 15 slots). A real hash
table will contain a considerably greater number of slots, but 15 works fine for the
purposes of this section. After placing the item into the hash slot, you can use the
hash function a second time to find its location.
Theoretically, if you have a perfect hash function and an infinite number of slots,
every value you present to the hash function will produce a unique value. In some
cases, the hash calculation can become quite complex to ensure unique values
most of the time. However, the more complex the hash calculation, the less ben-
efit you receive from hashing, so keeping things simple is the best way to go.
Hashing can work with all sorts of data structures. However, for the purposes of
demonstration, the following example uses a simple list to hold the original data
and a second list to hold the resulting hash. (You can find this code in the
A4D;
07; Hashing.ipynb
file on the Dummies site as part of the downloadable code;
see the Introduction for details.)
CHAPTER 7
Do'stlaringiz bilan baham: |