登录查看更多内容

Sherlock and anagrams (hacker rank) from the Dictionaries and Hashmaps section + extensive time and space complexity analysis.

Yovo Manolov

Software Engineer

发布日期: 2024年6月8日

+ 关注

Hi, guys, today we will discuss the problem "Sherlock and anagrams"

The link to the problem is the following:

https://www.hackerrank.com/challenges/sherlock-and-anagrams/problem?h_l=interview&playlist_slugs%5B%5D=interview-preparation-kit&playlist_slugs%5B%5D=dictionaries-hashmaps

Short description of the problem:

Anagram definition:

Two strings are anagrams of each other if the letters of one string can be rearranged to form the other string.

The task is to:

Find the number of pairs of substrings of a given string that are anagrams of each other.

Example 1:

string = mom

The list of all anagrammatic pairs is: [m,m], [mo, om] at positions [[0],[2]], [[0.1], [1,2]] respectivly. The result will be 2, as we have 2 anagrammatic pairs.

We have a function sherlockAndAnagrams that has the following parameter(s):

string s: the given string

The function should return the number of unordered anagrammatic pairs of substrings in s

Example 2:

string: abba

anagrammatic pairs: ( a, a | b, b | ab, ba| abb, bba ) which are 4 in total.

So, the important thing for me is the thought process when I solve those tasks.

What I was initially thinking when I read the problem description was the following:

Those are dictionary problems, so we need to use dictionary data structures.
We need to convert the string to a char array to manipulate it easier.
We need to store the substrings with equal length in separate hashmaps, but in order to

make the processing easy we may use embedded hashmap. The outer hashmap will store the length of the substrings in the inner hashmap, and the inner hashmap will store the strings themselves as a key and the occurrence count as a value.

The hardest thing for me to figure out was that, to count the pairs, they need to be equal as a value. Otherwise, I couldn't do it, so, before adding the strings to the HashMap with correspondent length, we need to use sorting on the substring, to make the anagram pairs look alike. If we represent the substrings as a char array, the sorting could be easily done in Java using the method java.util.Arrays.sort(string) .
And the last thing is that if we keep the count of equal substrings with the same length we need a way to calculate the pairs. This can be achieved using this formula :

pairsCount = (numberOfEqualSubstrings * (numberOfEqualSubstrings - 1) )/2

//pc = (n*(n-1))/2

let's see how this works:

if we have the substrings - "a" "a" "a" "a" in the inner HashMap<String, Integer>

with values HashMap<"a", 6> for example then we shoud have 6 anagram pairs

(4*3)/2 = 6

the 6 anagrammatic pairs are

//letter(index)

a(0) a(1)

a(1) a(2)

a(2) a(3)

a(0) a(2)

a(1) a(3)

a(0) a(3)

Which are 6 pairs

So an example solution to the problem is the following:

public static int sherlockAndAnagrams(String s) {

    // HashMap<AnagramLength, HashMap<SortedSubstring, Counter>>
    HashMap<Integer, HashMap<String, Integer>> hmAnagrams = new HashMap<>();


    //mininum length of substring is 1 and max is length-1     
    for (int substrLength = 1; substrLength < s.length(); substrLength++) {

        //inner HashMap<equalSubstrings, equalSubstringsCounter>
        HashMap<String, Integer> stringCounterHashMap = new HashMap<>();
        
        // from index 0 to index (stringLength - substringLength) 
        //so we don't get IndexOutOfBounds exception.
        for (int start = 0; start <= s.length() - substrLength; start++) {

            String substrValue = s.substring(start, start + substrLength);
            
            String sortedSubstrValue = sortString(substrValue);
            
            stringCounterHashMap.put(sortedSubstrValue, 
                                    stringCounterHashMap.
                                        getOrDefault(sortedSubstrValue, 0) + 1);
        }

        hmAnagrams.put(substrLength, stringCounterHashMap);
    }

    // Calculate the number of anagrammatic pairs
    int count = 0;
    for (HashMap<String, Integer> map : hmAnagrams.values()) {
        for (int value : map.values()) {
            // if value is 1  we don't have a pair.
            if (value > 1) { 
                // If there are n occurrences of a substring,
                // we can form n * (n - 1) / 2 pairs
                count += (value * (value - 1)) / 2;
            }
        }
    }

    return count;
}

// Helper function to sort characters of a string
private static String sortString(String inputString) {
    char[] tempArray = inputString.toCharArray();
    java.util.Arrays.sort(tempArray);
    return new String(tempArray);
}

Time and Space complexity analysis

For this article, I'll dive a little deeper into time and space complexity analysis, so if you don't enjoy this part as much you're free to skip it.

Time complexity:

Outer loop over Substring lengths:

for (int substrLength = 1; substrLength < s.length(); substrLength++) {
...
}

This loop runs in n - 1 times where n is the length of the input string.

Inner loop over given string indexes:

领英推荐

Object State Management

Sridhar Raj P 3 周前

How to Handle Missing Data in Power BI?

Anurodh Kumar 3 个月前

Data Structure & Algorithm - Part 2

Padam Tripathi (Learner) 10 个月前

 for (int start = 0; start <= s.length() - substrLength; start++) {
...
}

For each substring length "substrLength", the inner loop runs n - substrLength +1 times.

Substring and sorting:

String substrValue = s.substring(start, start + substrLength);
String sortedSubstrValue = sortString(substrValue);

if( substringLength = n) then:

Extracting a substring of length substringLength takes O(n)
Sorting the substring takes O(nlogn)

Simple logarithm examples:

To refresh your log knowledge refer to the link in the image description above.

* for reference here is the algorithms time complexity chart:

HashMap put and getOrDefault operations take O(1) constant runtime complexity on average.

Combining all these steps, the time complexity for processing each substring can be approximated by:

* if we represent "substrLength" as n , then

O(n + nlogn) = O(nlogn)

So why is this expression true?

The expression O(n+nlogn) simplifies to O(n log n) due to the properties of the Big-O notation. 

Big O notation describes the upper bound of an algorithm's complexity, focusing on the dominant term(s) as input size n grows. Here is why O(n + nlogn) = (Onlogn): 

1. Dominant Term: In the expression O(n+ nlogn)  

* n grows learly 
*  nlogn grows faster than n for sufficiently large n because log n 
(though it grows slower than any polynomial ) 
is still an icreasing function. 

(definition of polynomial function: A polynomial function is a function such as a quadratic, a cubic, a quartic, and so on, involving only non-negative integer powers of x)

2. Asymptotic Behaviour: 

As n becomes very large, the nlogn term will dominate the n term. 
This is because the logarithmic factor makes nlogn grow faster than n alone. 

For example: 

When n = 10, log n ≈ 3.3, so nlog ≈ 33
When n = 100, log n  ≈ 6.6 so nlogn ≈ 66
When n = 1000, log n ≈ 1000 so nlogn ≈ 9900 

In each case nlogn grows significantly faster than n. 

3. Big O simplification 

In Big-O notation, we only consider the term with the highest growth rate because it dominates the overall complexity as n increases. Lower-order terms and constant factors are omitted. Therefor,  O(n+nlogn) simplifies to O(nlogn) because : 

n+nlogn <= c * nlogn for same constant c and sufficiently large n.

In conclusion, O(n+nlogn) = O(n logn) because the nlogn term dominates the liner n term as n grows larger.

If we only discuss the inner for loop that processes substrings of equal length then we can look at this logic:

for (int start = 0; start <= s.length() - substrLength; start++) {
       String substrValue = s.substring(start, start + substrLength);
            
            String sortedSubstrValue = sortString(substrValue);
            
            stringCounterHashMap.put(sortedSubstrValue, 
                                    stringCounterHashMap.
                                        getOrDefault(sortedSubstrValue, 0) + 1);
}

If the inner for loop time complexity is О(n^2logn) and the Outer loop has approximate

time complexity of O(n), so we can say that the total time complexity is :

Space Complexity:

HashMaps:

hmAnagrams stores a HashMap for each substring length with all unique sorted substrings as keys.
The number of unique substrings of any length is at most n(n+1)/2

Example :

Substrings and Sorted substrings:

Each sorted substring is stored in the inner HashMap, leading to O(n^2) space for substrings
The space complexity for all substrings combined is O(n^3) in the worst case because we store O(n^2) substrings each of length up to n

Auxiliary Space:

Temp variables, counters, and all other small variables contribute negligible space compared to the HashMaps.

I hope you enjoyed this one as well.

要查看或添加评论，请登录

Yovo Manolov的更多文章

Special string again (string manipulation) + time and space complexity analysis

2024年7月28日

Special string again (string manipulation) + time and space complexity analysis

Summary The problem is the following: Given a string s of length n, determine the number of special substrings within…

1 条评论
Fraudulent activity problem from Hackerrank - from the sorting problems. Analysis of three different solutions. ( min-max heap, counting sort)

2024年6月25日

Fraudulent activity problem from Hackerrank - from the sorting problems. Analysis of three different solutions. ( min-max heap, counting sort)

Reference link to the problem: https://www.hackerrank.
Array Manipulation Hackerrank problem analysis and solution optimization (prefix sum + difference array).

2024年6月6日

Array Manipulation Hackerrank problem analysis and solution optimization (prefix sum + difference array).

Hi, guys, today I want to analyze the Array Manipulation problem from Hackerrank. We will take a look at a suboptimal…
New Year Chaos - HackerRank algorithmic problem - clarifying comments

2024年6月4日

New Year Chaos - HackerRank algorithmic problem - clarifying comments

Hi, guys, I want to discuss an algorithmic problem that's probably easy for most of you, but initially, I found it hard…

Sherlock and anagrams (hacker rank) from the Dictionaries and Hashmaps section + extensive time and space complexity analysis.

Yovo Manolov

Software Engineer

Time and Space complexity analysis

领英推荐

Space Complexity:

Yovo Manolov的更多文章

社区洞察

其他会员也浏览了

How easy is it to switch from Quantum to MRDCL?

Rust data structures

A Deep Dive into Go Slices: Understanding Value vs. Reference, Modifying and Adding Elements

How to Build RAG application with Knowledge Graph

Understanding ListLen(var) vs. var.ListLen in ColdFusion

Discover and visualize the data to gain insights.

A Brief Survey of Data Structures (and a Little Digression about the Go Language)

Stars Cut Point Calculations (Clustering Method)

The Hidden Power of Bloom Filters: A Deep Dive into Probabilistic Data Structures

Time and Space complexity analysis

领英推荐

Space Complexity:

Yovo Manolov的更多文章

Special string again (string manipulation) + time and space complexity analysis

Fraudulent activity problem from Hackerrank - from the sorting problems. Analysis of three different solutions. ( min-max heap, counting sort)

Array Manipulation Hackerrank problem analysis and solution optimization (prefix sum + difference array).

New Year Chaos - HackerRank algorithmic problem - clarifying comments

社区洞察

其他会员也浏览了

How easy is it to switch from Quantum to MRDCL?

Rust data structures

A Deep Dive into Go Slices: Understanding Value vs. Reference, Modifying and Adding Elements

How to Build RAG application with Knowledge Graph

Understanding ListLen(var) vs. var.ListLen in ColdFusion

Discover and visualize the data to gain insights.

A Brief Survey of Data Structures (and a Little Digression about the Go Language)

Stars Cut Point Calculations (Clustering Method)

The Hidden Power of Bloom Filters: A Deep Dive into Probabilistic Data Structures