Best solutions for Microsoft interview tasks. Concatenated String Length with unique Characters.
Alexander Molchevskyi
Experienced Rust, C++, Web3, Blockchain, Embedded developer exploring new opportunities
Description:
Related problems:
Solution:
We have a given array of strings. We have to find all possible combinations of these strings. We have one limitation - we can't combine strings which have the same letters, in other words we can combine strings only with unique letters. It easy to see that we need to invent some way of fast check if two strings have the same letters. And it's easy to see that length of any result string cannot exceed length of the alphabet, in our case 26 letters.
We can make a map where given strings will be the values and each string will have a sorted array of letters which used in this string as a key.
For example array of strings "coco","dodo","interactive" will have following look as a map:
It is easy to compare sorted letters and find if there are common letters in two strings.
But may be there is a way to speedup this comparison?
If each letter can appears only one time in the result string then maximum result string is the alphabet and has length 26 letters. We can reflect the alphabet into a bitset. If a letter appears in the string the corresponding bit is set to 1.
Word "interactive" as a bitset will looks like this:
26 bits is less than a register of CPU, so in ideal case, we can compare two bitsets in one CPU command. Nothing can be faster.
Therefore the final algorithm should looks like this:
- Iterate the input strings, skipping the strings that have duplicate characters.
- For each string with unique chars, check if it has the same chars as the result string.
- If they have intersection of characters, we skip it.
- If not, we append this new combination to the result.
- return length of the result string.
We should not work with the strings directly. We can convert them into the bitsets and work with the bits only. It significantly decrease memory consumption and increase speed of the program.
C++ code:
int solution(const vector<string>& v) { // We will keep in this vector bitmaps of used letters // for the processed strings. // Add one empty bitset for comparison with the first // processed string. It makes the algorithm a bit shorter vector<bitset<26>> char_bits_vector = {bitset<26>()}; int result = 0; // for each string in the vector make a bitset where all // bits corresponding to characters in alphabet are set. for (auto& str : v) { bitset<26> char_bits; // set bits corresponding to chars in the string. for (char c : str) { char_bits.set(c - 'a'); } // How many bits were set. int bit_num = char_bits.count(); //the string contains duplicate characters so don't process it if (bit_num < str.size()) continue; // Check if current word has common letters with // already processed strings for (int i = char_bits_vector.size() - 1; i >= 0; --i) { auto& c_b = char_bits_vector[i]; // if two bitsets have common 1 bits i.e. // if two strings have common letters don't // process current string if ((c_b & char_bits).any()) continue; // if current string has unique letters add // to the vector a bitset where // all bits corresponding to letters of the current // string are set to 1. char_bits_vector.push_back(c_b | char_bits); // add length of the current string to the result result = max<int>(result, c_b.count() + bit_num); } } return result; }
Repository with the full project you can find here: https://github.com/jolly-fellow/microsoft/tree/master/concatenated_string_length_with_unique_characters
Return to the table of contents.