Skip to main content

Command Palette

Search for a command to run...

Do hash tables really help in reducing time complexity ?

Published
0 min read
Do hash tables really help in reducing time complexity ?

Is hashing a real practical solution for finding, inserting and deleting elements with time complexity of O(1) ? If this question is in your mind then you are at the right spot. Please refer this page to get some basic idea of hash tables.

Let's do a small experiment taking up a problem to find the answer for our question.

Problem statement :

Given an array of integers, and a number ‘sum’, print all pairs in the array whose sum is equal to ‘sum’.

There are two different possible approaches to code which will result in two different solutions. Visit this page to get an idea on the problem and its solutions. You can also watch this video from Google to understand the problem better.

Solution 1 : Using hash table.

We create an empty hash table. Then we will traverse through the given array of numbers and for every number check if its complement is present in the hash table. If it's found then we print the pair. Irrespective of whether its found or not we insert the number to the hash table.

Solution 2 : Solve by sorting.

We sort the array in the increasing order. Check if the sum of the two extreme numbers is same as the given sum. If so, then we will print the pair, if it is less than the given sum then we will remove the smallest number, if it is greater than the given sum then we will remove the greatest number.

Let's write the program in C++ and golang.

In C++, hash table is realized by unordered_map. In our program let's use unordered_set, which is unordered_map with key and value being the same. Let's store the given array of numbers in vector.

In golang, hash table is realized by map. Let's use the same in our program and let's store the given array of numbers in slice.

Solution 1 in C++ :

ostringstream* findUsingHash(intvec &arr, int sum)
{
    int count = 0;
    inthash hash;
    ostringstream* poss = new ostringstream();
    *poss<<"";
    for (int i = 0; i < arr.size(); i++)
    {
        if (i == 0)
        {
            hash.insert(arr[i]);
            continue;
        }

        if (hash.find(sum - arr[i]) != hash.end())
        {
            //cout << arr[i] << "\t" << sum - arr[i] << endl;
            *poss << arr[i] << "\t" << sum - arr[i] << endl;
            count++;
        }

        hash.insert(arr[i]);
    }
    cout << "findUsingHash :: found " << count << " no. of pairs" << endl;
    return poss;
}

Solution 1 in Go :

func usingHash() {
    startTime := time.Now()
    defer printElapsedTime("usingHash", startTime)
    hashTable := make(map[int64]int64, 0)
    first := true
    found := false
    var complement int64
    for _, num := range arr {
        if first {
            hashTable[num] = num
            first = false
            continue
        }

        complement = sum - num
        _, found = hashTable[complement]
        if found {
            pairs = append(pairs, pair{num, complement})
        }

        hashTable[num] = num
    }
}

Solution 2 in C++ :

ostringstream* findUsingVector(intvec &arr, int sum)
{
    int count = 0;
    int start = 0;
    int end = arr.size() - 1;
    sort(arr.begin(), arr.end());
    ostringstream* poss = new ostringstream();
    *poss<<"";
    while (true)
    {
        if (start >= end)
        {
            break;
        }
        if ((arr[start] + arr[end]) < sum)
        {
            start++;
        }
        else if ((arr[start] + arr[end]) > sum)
        {
            end--;
        }
        else
        {
            //cout << arr[start] << "\t" << arr[end] << endl;
            *poss << arr[start] << "\t" << arr[end] << endl;
            count++;
            start++;
            end--;
        }
    }
    cout << "findUsingVector :: found " << count << " no. of pairs" << endl;
    return poss;
}

Solution 2 in Go :

func usingArr() {
    startTime := time.Now()
    defer printElapsedTime("usingArr", startTime)
    arr = quickSort(arr)
    //log.Println("arr after sort = ", arr)
    start := 0
    end := len(arr) - 1

    for true {
        if start >= end {
            break
        }
        if arr[start]+arr[end] < sum {
            start++
        } else if arr[start]+arr[end] > sum {
            end--
        } else {
            pairs = append(pairs, pair{arr[start], arr[end]})
            count++
            start++
            end--
        }
    }
}

Visit the following links for complete source code :

Visit the following links for results :

Result for N = 98765432 is below :

Cpp results :

E:\MyLearnings\Prapository\C_Cpp\sumPairs>app 98765432
----------------------------------------------------------------
num = 98765432
----------------------------------------------------------------
sum = 52944696
----------------------------------------------------------------
4:9:2:529:107:200       before findUsingHash
findUsingHash :: found 994947 no. of pairs
4:9:44:469:680:600      after findUsingHash
----------------------------------------------------------------
4:9:44:479:706:700      before findUsingVector
findUsingVector :: found 800697 no. of pairs
4:10:14:300:84:900      after findUsingVector
----------------------------------------------------------------

Go results :

D:\Prapository\Misc\sumPairs\go>main 98765432
-------------------------------------------------------------------------------------------------------
2019/05/12 08:43:23.599784 num =  98765432
-------------------------------------------------------------------------------------------------------
2019/05/12 08:43:23.600829 sum =  43832647
-------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------
2019/05/12 08:43:23.600829 before usingHash
2019/05/12 08:44:03.433248 usingHash  took  39.7519141s
2019/05/12 08:44:03.433248 after usingHash
2019/05/12 08:44:03.434202 usingHash returned  698168  number of pairs
-------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------
2019/05/12 08:44:03.434265 before usingArr
2019/05/12 08:44:15.749724 usingArr  took  12.315459s
2019/05/12 08:44:15.749954 after usingArr
2019/05/12 08:44:15.749954 usingHash returned  596612  number of pairs
-------------------------------------------------------------------------------------------------------

Interestingly, sorting method has taken lesser execution time than hash table method.

Why did this happen ?

The time complexity of Solution 2 in best case is O(NlogN) + O(N) which gets reduced to O(NlogN). The time complexity of hash table method in best case is O(N) x O(1) which is O(N). With this knowledge, intuitively, we think that hash table method is going to take less execution time compared to sorting method. But practically, it is not. For higher values of N, if elements in the set are picked highly randomly, then quick sort algorithm would tend to work with its best case time complexity. On the other hand, hashing will face the scarcity of buckets. This will affect the time complexity of the algorithm in terms of N.

Let N represent the number of elements, B represent the number of available buckets and T(N) represent the time complexity of finding an element in a hash table of N elements.

Time complexity, T(N) is given by O(N/B). For smaller values of N, B tends to N and the time complexity will almost be constant. For higher values of N, if B is very less compared to N, then N/B will be significant. For the sake of simplicity, let m be 1/B. So, the time complexity of finding the element in a hash table for higher values of N will be O(mN). And hence, the effective time complexity of Solution 1 will be O(N x mN) .

E

As far as i remember from university, it also depends on what you mean by 'finding' in your scenario.
If you need to retrieve a value by key lookup hashmaps are perfect, if you want to find within a range then there are other data structures like B+ trees.
Interesting comparison anyway! Seeing big-O notation made me feel at school again ;-)

3
M
Mark7y ago

The collisions affecting complexity is an interesting point.

But how big is the effect in reality?

If your hashes are 32 bits and have a good distribution, wouldn't you need somewhere in the order of 2^32 elements to get a lot of collisions and noticeably affect time complexity? Does it really already happen at 2% of 2^32 as used?

Meanwhile many array implementations are indexed with 32 bit integers, so they have a hard limit of 2^32 items. Easy to solve, but suggests the limit is rarely reached.

So while theoretically interesting, wouldn't it be safe in practise to consider O(1) complexity for 99.99% of cases?

2
P

Are you saying that there will be 2^32 buckets will be available? If yes, we can print out the bucket id ( i know only in c++). I would take the pleasure of doing it and try to get back with results. Also, I didn't get this point of yours - "So while theoretically interesting, wouldn't it be safe in practise to consider O(1) complexity for 99.99% of cases?".

M
Mark7y ago

PRASHANTH H N Thanks for the answer.

I didn't quite mean that there are 2^32 buckets, but that the number of buckets increases with the number of hashes that have been inserted.

So I'd personally think the performance only starts to degrade away from O(1) when there are so many elements that a substantial part of them are hash collisions

Which I would say happens when the number of elements gets close to 2^32 (for good hash functions). Because there are only 2^32 possible values for an int.

So I figured most hashmaps contain fewer than e.g. 2^30 items, so the time complexity will be pretty much O(1), as there are few collisions.

But of course there is a theoretical limit as you described. Just isn't reached much, I thought :-)

2