Do hash tables really help in reducing time complexity ?

Do hash tables really help in reducing time complexity ?

Is hashing a real practical solution for finding, inserting and deleting elements with time complexity of O(1) ? If this question is in your mind then you are at the right spot. Please refer this page to get some basic idea of hash tables.

Let's do a small experiment taking up a problem to find the answer for our question.

Problem statement :

Given an array of integers, and a number ‘sum’, print all pairs in the array whose sum is equal to ‘sum’.

There are two different possible approaches to code which will result in two different solutions. Visit this page to get an idea on the problem and its solutions. You can also watch this video from Google to understand the problem better.

Solution 1 : Using hash table.

We create an empty hash table. Then we will traverse through the given array of numbers and for every number check if its complement is present in the hash table. If it's found then we print the pair. Irrespective of whether its found or not we insert the number to the hash table.

Solution 2 : Solve by sorting.

We sort the array in the increasing order. Check if the sum of the two extreme numbers is same as the given sum. If so, then we will print the pair, if it is less than the given sum then we will remove the smallest number, if it is greater than the given sum then we will remove the greatest number.

Let's write the program in C++ and golang.

In C++, hash table is realized by unordered_map. In our program let's use unordered_set, which is unordered_map with key and value being the same. Let's store the given array of numbers in vector.

In golang, hash table is realized by map. Let's use the same in our program and let's store the given array of numbers in slice.

Solution 1 in C++ :

ostringstream* findUsingHash(intvec &arr, int sum)
{
    int count = 0;
    inthash hash;
    ostringstream* poss = new ostringstream();
    *poss<<"";
    for (int i = 0; i < arr.size(); i++)
    {
        if (i == 0)
        {
            hash.insert(arr[i]);
            continue;
        }

        if (hash.find(sum - arr[i]) != hash.end())
        {
            //cout << arr[i] << "\t" << sum - arr[i] << endl;
            *poss << arr[i] << "\t" << sum - arr[i] << endl;
            count++;
        }

        hash.insert(arr[i]);
    }
    cout << "findUsingHash :: found " << count << " no. of pairs" << endl;
    return poss;
}

Solution 1 in Go :

func usingHash() {
    startTime := time.Now()
    defer printElapsedTime("usingHash", startTime)
    hashTable := make(map[int64]int64, 0)
    first := true
    found := false
    var complement int64
    for _, num := range arr {
        if first {
            hashTable[num] = num
            first = false
            continue
        }

        complement = sum - num
        _, found = hashTable[complement]
        if found {
            pairs = append(pairs, pair{num, complement})
        }

        hashTable[num] = num
    }
}

Solution 2 in C++ :

ostringstream* findUsingVector(intvec &arr, int sum)
{
    int count = 0;
    int start = 0;
    int end = arr.size() - 1;
    sort(arr.begin(), arr.end());
    ostringstream* poss = new ostringstream();
    *poss<<"";
    while (true)
    {
        if (start >= end)
        {
            break;
        }
        if ((arr[start] + arr[end]) < sum)
        {
            start++;
        }
        else if ((arr[start] + arr[end]) > sum)
        {
            end--;
        }
        else
        {
            //cout << arr[start] << "\t" << arr[end] << endl;
            *poss << arr[start] << "\t" << arr[end] << endl;
            count++;
            start++;
            end--;
        }
    }
    cout << "findUsingVector :: found " << count << " no. of pairs" << endl;
    return poss;
}

Solution 2 in Go :

func usingArr() {
    startTime := time.Now()
    defer printElapsedTime("usingArr", startTime)
    arr = quickSort(arr)
    //log.Println("arr after sort = ", arr)
    start := 0
    end := len(arr) - 1

    for true {
        if start >= end {
            break
        }
        if arr[start]+arr[end] < sum {
            start++
        } else if arr[start]+arr[end] > sum {
            end--
        } else {
            pairs = append(pairs, pair{arr[start], arr[end]})
            count++
            start++
            end--
        }
    }
}

Visit the following links for complete source code :

Visit the following links for results :

Result for N = 98765432 is below :

Cpp results :

E:\MyLearnings\Prapository\C_Cpp\sumPairs>app 98765432
----------------------------------------------------------------
num = 98765432
----------------------------------------------------------------
sum = 52944696
----------------------------------------------------------------
4:9:2:529:107:200       before findUsingHash
findUsingHash :: found 994947 no. of pairs
4:9:44:469:680:600      after findUsingHash
----------------------------------------------------------------
4:9:44:479:706:700      before findUsingVector
findUsingVector :: found 800697 no. of pairs
4:10:14:300:84:900      after findUsingVector
----------------------------------------------------------------

Go results :

D:\Prapository\Misc\sumPairs\go>main 98765432
-------------------------------------------------------------------------------------------------------
2019/05/12 08:43:23.599784 num =  98765432
-------------------------------------------------------------------------------------------------------
2019/05/12 08:43:23.600829 sum =  43832647
-------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------
2019/05/12 08:43:23.600829 before usingHash
2019/05/12 08:44:03.433248 usingHash  took  39.7519141s
2019/05/12 08:44:03.433248 after usingHash
2019/05/12 08:44:03.434202 usingHash returned  698168  number of pairs
-------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------
2019/05/12 08:44:03.434265 before usingArr
2019/05/12 08:44:15.749724 usingArr  took  12.315459s
2019/05/12 08:44:15.749954 after usingArr
2019/05/12 08:44:15.749954 usingHash returned  596612  number of pairs
-------------------------------------------------------------------------------------------------------

Interestingly, sorting method has taken lesser execution time than hash table method.

Why did this happen ?

The time complexity of Solution 2 in best case is O(NlogN) + O(N) which gets reduced to O(NlogN). The time complexity of hash table method in best case is O(N) x O(1) which is O(N). With this knowledge, intuitively, we think that hash table method is going to take less execution time compared to sorting method. But practically, it is not. For higher values of N, if elements in the set are picked highly randomly, then quick sort algorithm would tend to work with its best case time complexity. On the other hand, hashing will face the scarcity of buckets. This will affect the time complexity of the algorithm in terms of N.

Let N represent the number of elements, B represent the number of available buckets and T(N) represent the time complexity of finding an element in a hash table of N elements.

Time complexity, T(N) is given by O(N/B). For smaller values of N, B tends to N and the time complexity will almost be constant. For higher values of N, if B is very less compared to N, then N/B will be significant. For the sake of simplicity, let m be 1/B. So, the time complexity of finding the element in a hash table for higher values of N will be O(mN). And hence, the effective time complexity of Solution 1 will be O(N x mN) .