Sherlock and anagrams (hacker rank) from the Dictionaries and Hashmaps section + extensive time and space complexity analysis.

Sherlock and anagrams (hacker rank) from the Dictionaries and Hashmaps section + extensive time and space complexity analysis.

Hi, guys, today we will discuss the problem "Sherlock and anagrams"

The link to the problem is the following:

https://www.hackerrank.com/challenges/sherlock-and-anagrams/problem?h_l=interview&playlist_slugs%5B%5D=interview-preparation-kit&playlist_slugs%5B%5D=dictionaries-hashmaps


Short description of the problem:

Anagram definition:

Two strings are anagrams of each other if the letters of one string can be rearranged to form the other string.

The task is to:

Find the number of pairs of substrings of a given string that are anagrams of each other.


Example 1:

string = mom

The list of all anagrammatic pairs is: [m,m], [mo, om] at positions [[0],[2]], [[0.1], [1,2]] respectivly. The result will be 2, as we have 2 anagrammatic pairs.


We have a function sherlockAndAnagrams that has the following parameter(s):

  • string s: the given string

The function should return the number of unordered anagrammatic pairs of substrings in s


Example 2:

string: abba

anagrammatic pairs: ( a, a | b, b | ab, ba| abb, bba ) which are 4 in total.



So, the important thing for me is the thought process when I solve those tasks.

What I was initially thinking when I read the problem description was the following:

  1. Those are dictionary problems, so we need to use dictionary data structures.
  2. We need to convert the string to a char array to manipulate it easier.
  3. We need to store the substrings with equal length in separate hashmaps, but in order to

make the processing easy we may use embedded hashmap. The outer hashmap will store the length of the substrings in the inner hashmap, and the inner hashmap will store the strings themselves as a key and the occurrence count as a value.

  1. The hardest thing for me to figure out was that, to count the pairs, they need to be equal as a value. Otherwise, I couldn't do it, so, before adding the strings to the HashMap with correspondent length, we need to use sorting on the substring, to make the anagram pairs look alike. If we represent the substrings as a char array, the sorting could be easily done in Java using the method java.util.Arrays.sort(string) .
  2. And the last thing is that if we keep the count of equal substrings with the same length we need a way to calculate the pairs. This can be achieved using this formula :

pairsCount = (numberOfEqualSubstrings * (numberOfEqualSubstrings - 1) )/2

//pc = (n*(n-1))/2

let's see how this works:

if we have the substrings - "a" "a" "a" "a" in the inner HashMap<String, Integer>

with values HashMap<"a", 6> for example then we shoud have 6 anagram pairs

(4*3)/2 = 6

the 6 anagrammatic pairs are

//letter(index)

a(0) a(1)

a(1) a(2)

a(2) a(3)

a(0) a(2)

a(1) a(3)

a(0) a(3)

Which are 6 pairs


So an example solution to the problem is the following:

public static int sherlockAndAnagrams(String s) {

    // HashMap<AnagramLength, HashMap<SortedSubstring, Counter>>
    HashMap<Integer, HashMap<String, Integer>> hmAnagrams = new HashMap<>();


    //mininum length of substring is 1 and max is length-1     
    for (int substrLength = 1; substrLength < s.length(); substrLength++) {

        //inner HashMap<equalSubstrings, equalSubstringsCounter>
        HashMap<String, Integer> stringCounterHashMap = new HashMap<>();
        
        // from index 0 to index (stringLength - substringLength) 
        //so we don't get IndexOutOfBounds exception.
        for (int start = 0; start <= s.length() - substrLength; start++) {

            String substrValue = s.substring(start, start + substrLength);
            
            String sortedSubstrValue = sortString(substrValue);
            
            stringCounterHashMap.put(sortedSubstrValue, 
                                    stringCounterHashMap.
                                        getOrDefault(sortedSubstrValue, 0) + 1);
        }

        hmAnagrams.put(substrLength, stringCounterHashMap);
    }

    // Calculate the number of anagrammatic pairs
    int count = 0;
    for (HashMap<String, Integer> map : hmAnagrams.values()) {
        for (int value : map.values()) {
            // if value is 1  we don't have a pair.
            if (value > 1) { 
                // If there are n occurrences of a substring,
                // we can form n * (n - 1) / 2 pairs
                count += (value * (value - 1)) / 2;
            }
        }
    }

    return count;
}

// Helper function to sort characters of a string
private static String sortString(String inputString) {
    char[] tempArray = inputString.toCharArray();
    java.util.Arrays.sort(tempArray);
    return new String(tempArray);
}        


Time and Space complexity analysis


For this article, I'll dive a little deeper into time and space complexity analysis, so if you don't enjoy this part as much you're free to skip it.


Time complexity:

  • Outer loop over Substring lengths:

for (int substrLength = 1; substrLength < s.length(); substrLength++) {
...
}        

This loop runs in n - 1 times where n is the length of the input string.


  • Inner loop over given string indexes:

 for (int start = 0; start <= s.length() - substrLength; start++) {
...
}        

For each substring length "substrLength", the inner loop runs n - substrLength +1 times.

  • Substring and sorting:

String substrValue = s.substring(start, start + substrLength);
String sortedSubstrValue = sortString(substrValue);        

if( substringLength = n) then:

  • Extracting a substring of length substringLength takes O(n)
  • Sorting the substring takes O(nlogn)

Simple logarithm examples:

source:

To refresh your log knowledge refer to the link in the image description above.


* for reference here is the algorithms time complexity chart:


  • HashMap put and getOrDefault operations take O(1) constant runtime complexity on average.


Combining all these steps, the time complexity for processing each substring can be approximated by:

* if we represent "substrLength" as n , then

O(n + nlogn) = O(nlogn)

So why is this expression true?

The expression O(n+nlogn) simplifies to O(n log n) due to the properties of the Big-O notation. 

Big O notation describes the upper bound of an algorithm's complexity, focusing on the dominant term(s) as input size n grows. Here is why O(n + nlogn) = (Onlogn): 

1. Dominant Term: In the expression O(n+ nlogn)  

* n grows learly 
*  nlogn grows faster than n for sufficiently large n because log n 
(though it grows slower than any polynomial ) 
is still an icreasing function. 

(definition of polynomial function: A polynomial function is a function such as a quadratic, a cubic, a quartic, and so on, involving only non-negative integer powers of x)

2. Asymptotic Behaviour: 

As n becomes very large, the nlogn term will dominate the n term. 
This is because the logarithmic factor makes nlogn grow faster than n alone. 

For example: 

When n = 10, log n ≈ 3.3, so nlog ≈ 33
When n = 100, log n  ≈ 6.6 so nlogn ≈ 66
When n = 1000, log n ≈ 1000 so nlogn ≈ 9900 

In each case nlogn grows significantly faster than n. 

3. Big O simplification 

In Big-O notation, we only consider the term with the highest growth rate because it dominates the overall complexity as n increases. Lower-order terms and constant factors are omitted. Therefor,  O(n+nlogn) simplifies to O(nlogn) because : 

n+nlogn <= c * nlogn for same constant c and sufficiently large n.

In conclusion, O(n+nlogn) = O(n logn) because the nlogn term dominates the liner n term as n grows larger.        


If we only discuss the inner for loop that processes substrings of equal length then we can look at this logic:


for (int start = 0; start <= s.length() - substrLength; start++) {
       String substrValue = s.substring(start, start + substrLength);
            
            String sortedSubstrValue = sortString(substrValue);
            
            stringCounterHashMap.put(sortedSubstrValue, 
                                    stringCounterHashMap.
                                        getOrDefault(sortedSubstrValue, 0) + 1);
}        

If the inner for loop time complexity is О(n^2logn) and the Outer loop has approximate

time complexity of O(n), so we can say that the total time complexity is :

Space Complexity:

HashMaps:

  • hmAnagrams stores a HashMap for each substring length with all unique sorted substrings as keys.
  • The number of unique substrings of any length is at most n(n+1)/2

Example :


Substrings and Sorted substrings:

  • Each sorted substring is stored in the inner HashMap, leading to O(n^2) space for substrings
  • The space complexity for all substrings combined is O(n^3) in the worst case because we store O(n^2) substrings each of length up to n


Auxiliary Space:

  • Temp variables, counters, and all other small variables contribute negligible space compared to the HashMaps.


I hope you enjoyed this one as well.


















要查看或添加评论,请登录

Yovo Manolov的更多文章

社区洞察

其他会员也浏览了