Module 22. Hash Map
Learning Objectives
- Understand the primarily implemented through dictionaries, focuses on understanding their structure, functionality, and applications.
- Define the hash maps store unique key-value pairs, enabling efficient data access through a hash function that determines the index of each key in an array.
- Hash maps offer unique keys, mutability for dynamic data management, and efficient average-case time complexity of O (1) for operations like insertion, deletion, and retrieval.
- Explain hash maps that are indexed data structure and hash function that is the core of implementing a hash map.
- Implementation of hash map for optimizing search, e.g., dictionaries are example of hash maps.
1. Introduction
Hash maps are indexed data structures. A hash map makes use of a hash function to compute an index with a key into an array of buckets or slots. Its value is mapped to the bucket with the corresponding index. The key is unique and immutable. Think of a hash map as a cabinet having drawers with labels for the things stored in them. For example, storing user information- consider email as the key, and we can map values corresponding to that user such as the first name, last name etc to a bucket.
Hash function is the core of implementing a Hash Map. It takes in the key and translates it to the index of a bucket in the bucket list. Ideal hashing should produce a different index for each key. However, collisions can occur. When hashing gives an existing index, we can simply use a bucket for multiple values by appending a list or by rehashing.
In Python, dictionaries are examples of hash maps. We’ll see the implementation of hash map from scratch in order to learn how to build and customize such data structures for optimizing search.
The Hash Map design will include the following functions:
A Hash Map is a form of Hash Table data structure that usually holds a large number of entries. Using a Hash Map we can search, add, modify, and remove entries really fast.
- set_val(key, value): Inserts a key-value pair into the hash map. If the value already exists in the hash map, update the value.
- get_val(key): Returns the value to which the specified key is mapped, or “No record found” if this map contains no mapping for the key.
- delete_val(key): Removes the mapping for the specific key if the hash map contains the mapping for the key.
Below is the implementation.
Output:
Time Complexity:
Memory index access takes constant time and hashing takes constant time. Hence, the search complexity of a hash map is also constant time, that is, O(1).
2. Hash Maps
Hash Maps are used to find detailed information about something. In the implementation below, people are stored in a Hash Map. A person can be looked up using a person’s unique social security number (the Hash Map key), and then we can see that person’s name (the Hash Map value).
Output:
It is easier to understand how Hash Maps work if you first have a look at the two previous pages about Hash Tables and Hash Sets. It is also important to understand the meaning of the words below.
- Entry: Consists of a key and a value, forming a key-value pair.
- Key: Unique for each entry in the Hash Map. Used to generate a hash code determining the entry's bucket in the Hash Map. This ensures that every entry can be efficiently located.
- Hash Code: A number generated from an entry's key, to determine what bucket that Hash Map entry belongs to.
- Bucket: A Hash Map consists of many such buckets, or containers, to store entries.
- Value: Can be nearly any kind of information, like name, birth date, and address of a person. The value can be many different kinds of information combined.
3. Finding the Hash Code
A hash code is generated by a hash function.
The hash function in the simulation above takes the numbers in the social security number (not the dash), add them together, and does a modulo 10 operation (% 10) on the sum of characters to get the hash code as a number from 0 to 9.
This means that a person is stored in one of ten possible buckets in the Hash Map, according to the hash code of that person's social security number. The same hash code is generated and used when we want to search for or remove a person from the Hash Map.
The Hash Code gives us instant access as long as there is just one person in the corresponding bucket.
In the simulation above, Charlotte has social security number 123-4567. Adding the numbers together gives us a sum 28, and modulo 10 of that is 8. That is why she belongs to bucket 8.
Direct Access in Hash Maps
Searching for Charlotte in the Hash Map, we must use the social security number 123-4567 (the Hash Map key), which generates the hash code 8, as explained above.
This means we can go straight to bucket 8 to get her name (the Hash Map value), without searching through other entries in the Hash Map.
In cases like this we say that the Hash Map has constant time O(1) for searching, adding, and removing entries, which is really fast compared to using an array or a linked list.
But, in a worst case scenario, all the people are stored in the same bucket, and if the person we are trying to find is last person in this bucket, we need to compare with all the other social security numbers in that bucket before we find the person we are looking for.
In such a worst case scenario the Hash Map has time complexity O(n), which is the same time complexity as arrays and linked lists.
To keep Hash Maps fast, it is therefore important to have a hash function that will distribute the entries evenly between the buckets, and to have around as many buckets as Hash Map entries.
Having a lot more buckets than Hash Map entries is a waste of memory, and having a lot less buckets than Hash Map entries is a waste of time.
Hash Map Implementation
Hash Maps in Python are typically done by using Python's own dictionary data type, but to get a better understanding of how Hash Maps work we will not use that here.
To implement a Hash Map in Python we create a class SimpleHashMap.
Inside the SimpleHashMap class we have a method __init__ to initialize the Hash Map, a method hash_function for the hash function, and methods for the basic Hash Map operations: put, get, and remove.
We also create a method print_map to better see how the Hash Map looks like.
Output: