Understanding and Implementing Hash Tables in Python

Hash tables are a fundamental data structure in computer science, and are used to implement associative arrays, maps, and sets. They are a key component of many algorithms and data structures, and are widely used in software development.

In this blog, we will explore the concept of hash tables and how they work, and then look at how to implement them in Python.

What are Hash Tables?

A hash table is a data structure that stores key-value pairs in an array, and uses a hash function to map the keys to indices in the array. The hash function maps the keys to integers, which are then used to index into the array to store or retrieve the associated values.

The key advantage of hash tables is that they provide constant-time O(1) average-case performance for operations such as insert, delete, and search. This makes them much faster than other data structures such as linked lists or trees, which have O(n) average-case performance for these operations.

However, it is important to note that hash tables have a worst-case performance of O(n), which can occur when the hash function maps all keys to the same index, resulting in a situation known as "collision". To avoid this, hash tables typically use a collision resolution strategy such as chaining or open addressing to handle collisions and maintain good performance.

How Hash Tables Work

Here is a high-level overview of how hash tables work:

First, a hash function is applied to the key to map it to an integer, which is then used to index into the array.
If the index is empty, the key-value pair is stored at that index.
If the index is already occupied, a collision has occurred. The collision resolution strategy is then applied to determine where to store the key-value pair.

It is important to choose a good hash function that distributes the keys evenly across the array, in order to minimize collisions and ensure good performance.

Implementing Hash Tables in Python

Now that we have an understanding of how hash tables work, let's look at how to implement them in Python.

We will start by creating a HashTable class, which will contain an array of size n to store the key-value pairs, and a hash function to map the keys to indices in the array. We will use chaining as our collision resolution strategy, which involves creating a linked list at each index to store the key-value pairs that hash to that index.

Here is the initial implementation of our HashTable class:

class HashTable:
    def __init__(self, n):
        self.n = n
        self.array = [None] * n

Next, let's define our hash function. We will use the built-in hash() function in Python, which returns a 32-bit integer hash value for a given object.

def hash_function(self, key):
    return hash(key) % self.n

Now that we have a hash function, let's define methods to add, retrieve, and delete key-value pairs from the hash table.

To add a key-value pair, we will first apply the hash function to the key to get the index, and then check if there are any key-value pairs already stored at that index. If there are, we will add the new key-value pair to the

linked list at that index. If there are not, we will create a new linked list and add the key-value pair as the first element.

Here is the implementation of the add() method:

def add(self, key, value):
    index = self.hash_function(key)
    if self.array[index] is None:
        self.array[index] = LinkedList()
    self.array[index].add((key, value))

To retrieve a value for a given key, we will again apply the hash function to the key to get the index, and then search the linked list at that index for the key-value pair. If the pair is found, we will return the value. If it is not found, we will return None.

Here is the implementation of the get() method:

def get(self, key):
    index = self.hash_function(key)
    if self.array[index] is None:
        return None
    return self.array[index].search(key)

To delete a key-value pair, we will again apply the hash function to the key to get the index, and then search the linked list at that index for the key-value pair. If the pair is found, we will delete it from the linked list.

Here is the implementation of the delete() method:

def delete(self, key):
    index = self.hash_function(key)
    if self.array[index] is None:
        return None
    self.array[index].delete(key)

Now that we have implemented all of the basic methods for our hash table, let's test it out by creating a hash table and adding some key-value pairs to it.

ht = HashTable(10)
ht.add("key1", "value1")
ht.add("key2", "value2")
ht.add("key3", "value3")

print(ht.get("key1"))  # Output: "value1"
print(ht.get("key2"))  # Output: "value2"
print(ht.get("key3"))  # Output: "value3"

ht.delete("key2")
print(ht.get("key2"))  # Output: None

Here is the complete code for the hash table implementation in Python:

class LinkedList:
    def __init__(self):
        self.head = None
        self.tail = None

    def add(self, key, value):
        new_node = (key, value)
        if self.head is None:
            self.head = new_node
            self.tail = new_node
        else:
            self.tail.next = new_node
            self.tail = new_node

    def search(self, key):
        curr = self.head
        while curr is not None:
            if curr[0] == key:
                return curr[1]
            curr = curr.next
        return None

    def delete(self, key):
        curr = self.head
        prev = None
        while curr is not None:
            if curr[0] == key:
                if prev is None:
                    self.head = curr.next
                else:
                    prev.next = curr.next
                if curr.next is None:
                    self.tail = prev
                return
            prev = curr
            curr = curr.next


class HashTable:
    def __init__(self, n):
        self.n = n
        self.array = [None] * n

    def hash_function(self, key):
        return hash(key) % self.n

    def add(self, key, value):
        index = self.hash_function(key)
        if self.array[index] is None:
            self.array[index] = LinkedList()
        self.array[index].add((key, value))

    def get(self, key):
        index = self.hash_function(key)
        if self.array[index] is None:
            return None
        return self.array[index].search(key)

    def delete(self, key):
        index = self.hash_function(key)
        if self.array[index] is None:
            return None
        self.array[index].delete(key)


ht = HashTable(10)
ht.add("key1", "value1")
ht.add("key2", "value2")
ht.add("key3", "value3")

print(ht.get("key1"))  # Output: "value1"
print(ht.get("key2"))  # Output: "value2"
print(ht.get("key3"))  # Output: "value3"

ht.delete("key2")
print(ht.get("key2"))  # Output: None

That's it! We have now successfully implemented a hash table in Python.

Hash tables are a powerful and efficient data structure that are widely used in software development. By understanding how they work and how to implement them in Python, you can use them to solve a wide range of problems in your own projects.

CodeSleuth

Understanding and Implementing Hash Tables in Python

What are Hash Tables?

How Hash Tables Work

Implementing Hash Tables in Python

Posted by Umang Kumar

Post a Comment

1 Comments

Popular Posts

Binary Search

Trie Data Structure | Implementation in C++

The Secret Weapon of Successful Programmers: Competitive Programming

Fast Input/Output in C++ for Competitive Programming

Rope Data Structure: A High-Performance C++ Data Structure for Strings

Mastering the Mental Side of Programming: The Key to Success