Sherlock and anagrams (hacker rank) from the Dictionaries and Hashmaps section + extensive time and space complexity analysis.
Hi, guys, today we will discuss the problem "Sherlock and anagrams"
The link to the problem is the following:
Short description of the problem:
Anagram definition:
Two strings are anagrams of each other if the letters of one string can be rearranged to form the other string.
The task is to:
Find the number of pairs of substrings of a given string that are anagrams of each other.
Example 1:
string = mom
The list of all anagrammatic pairs is: [m,m], [mo, om] at positions [[0],[2]], [[0.1], [1,2]] respectivly. The result will be 2, as we have 2 anagrammatic pairs.
We have a function sherlockAndAnagrams that has the following parameter(s):
The function should return the number of unordered anagrammatic pairs of substrings in s
Example 2:
string: abba
anagrammatic pairs: ( a, a | b, b | ab, ba| abb, bba ) which are 4 in total.
So, the important thing for me is the thought process when I solve those tasks.
What I was initially thinking when I read the problem description was the following:
make the processing easy we may use embedded hashmap. The outer hashmap will store the length of the substrings in the inner hashmap, and the inner hashmap will store the strings themselves as a key and the occurrence count as a value.
pairsCount = (numberOfEqualSubstrings * (numberOfEqualSubstrings - 1) )/2
//pc = (n*(n-1))/2
let's see how this works:
if we have the substrings - "a" "a" "a" "a" in the inner HashMap<String, Integer>
with values HashMap<"a", 6> for example then we shoud have 6 anagram pairs
(4*3)/2 = 6
the 6 anagrammatic pairs are
//letter(index)
a(0) a(1)
a(1) a(2)
a(2) a(3)
a(0) a(2)
a(1) a(3)
a(0) a(3)
Which are 6 pairs
So an example solution to the problem is the following:
public static int sherlockAndAnagrams(String s) {
// HashMap<AnagramLength, HashMap<SortedSubstring, Counter>>
HashMap<Integer, HashMap<String, Integer>> hmAnagrams = new HashMap<>();
//mininum length of substring is 1 and max is length-1
for (int substrLength = 1; substrLength < s.length(); substrLength++) {
//inner HashMap<equalSubstrings, equalSubstringsCounter>
HashMap<String, Integer> stringCounterHashMap = new HashMap<>();
// from index 0 to index (stringLength - substringLength)
//so we don't get IndexOutOfBounds exception.
for (int start = 0; start <= s.length() - substrLength; start++) {
String substrValue = s.substring(start, start + substrLength);
String sortedSubstrValue = sortString(substrValue);
stringCounterHashMap.put(sortedSubstrValue,
stringCounterHashMap.
getOrDefault(sortedSubstrValue, 0) + 1);
}
hmAnagrams.put(substrLength, stringCounterHashMap);
}
// Calculate the number of anagrammatic pairs
int count = 0;
for (HashMap<String, Integer> map : hmAnagrams.values()) {
for (int value : map.values()) {
// if value is 1 we don't have a pair.
if (value > 1) {
// If there are n occurrences of a substring,
// we can form n * (n - 1) / 2 pairs
count += (value * (value - 1)) / 2;
}
}
}
return count;
}
// Helper function to sort characters of a string
private static String sortString(String inputString) {
char[] tempArray = inputString.toCharArray();
java.util.Arrays.sort(tempArray);
return new String(tempArray);
}
Time and Space complexity analysis
For this article, I'll dive a little deeper into time and space complexity analysis, so if you don't enjoy this part as much you're free to skip it.
Time complexity:
for (int substrLength = 1; substrLength < s.length(); substrLength++) {
...
}
This loop runs in n - 1 times where n is the length of the input string.
领英推荐
for (int start = 0; start <= s.length() - substrLength; start++) {
...
}
For each substring length "substrLength", the inner loop runs n - substrLength +1 times.
String substrValue = s.substring(start, start + substrLength);
String sortedSubstrValue = sortString(substrValue);
if( substringLength = n) then:
Simple logarithm examples:
To refresh your log knowledge refer to the link in the image description above.
* for reference here is the algorithms time complexity chart:
Combining all these steps, the time complexity for processing each substring can be approximated by:
* if we represent "substrLength" as n , then
O(n + nlogn) = O(nlogn)
So why is this expression true?
The expression O(n+nlogn) simplifies to O(n log n) due to the properties of the Big-O notation.
Big O notation describes the upper bound of an algorithm's complexity, focusing on the dominant term(s) as input size n grows. Here is why O(n + nlogn) = (Onlogn):
1. Dominant Term: In the expression O(n+ nlogn)
* n grows learly
* nlogn grows faster than n for sufficiently large n because log n
(though it grows slower than any polynomial )
is still an icreasing function.
(definition of polynomial function: A polynomial function is a function such as a quadratic, a cubic, a quartic, and so on, involving only non-negative integer powers of x)
2. Asymptotic Behaviour:
As n becomes very large, the nlogn term will dominate the n term.
This is because the logarithmic factor makes nlogn grow faster than n alone.
For example:
When n = 10, log n ≈ 3.3, so nlog ≈ 33
When n = 100, log n ≈ 6.6 so nlogn ≈ 66
When n = 1000, log n ≈ 1000 so nlogn ≈ 9900
In each case nlogn grows significantly faster than n.
3. Big O simplification
In Big-O notation, we only consider the term with the highest growth rate because it dominates the overall complexity as n increases. Lower-order terms and constant factors are omitted. Therefor, O(n+nlogn) simplifies to O(nlogn) because :
n+nlogn <= c * nlogn for same constant c and sufficiently large n.
In conclusion, O(n+nlogn) = O(n logn) because the nlogn term dominates the liner n term as n grows larger.
If we only discuss the inner for loop that processes substrings of equal length then we can look at this logic:
for (int start = 0; start <= s.length() - substrLength; start++) {
String substrValue = s.substring(start, start + substrLength);
String sortedSubstrValue = sortString(substrValue);
stringCounterHashMap.put(sortedSubstrValue,
stringCounterHashMap.
getOrDefault(sortedSubstrValue, 0) + 1);
}
If the inner for loop time complexity is О(n^2logn) and the Outer loop has approximate
time complexity of O(n), so we can say that the total time complexity is :
Space Complexity:
HashMaps:
Example :
Substrings and Sorted substrings:
Auxiliary Space:
I hope you enjoyed this one as well.