登录查看更多内容

Best solutions for Microsoft interview tasks. Pattern Recognition.

Alexander Molchevskyi

Experienced Rust, C++, Web3, Blockchain, Embedded developer exploring new opportunities

发布日期: 2020年11月15日

+ 关注

Description:

Solution:

It looks like this task is not about algorithms, this is just about writing a simple parser which breaking the given string to parts and writing a function which finds and counts all substrings in the given string.

The only interesting subject which can be discussed in this task is implementation of algorithm for finding substrings. But the task doesn’t require to implement this algorithm, the strings are short, we have no special requirements for search of substrings, so we can use implementation from the standard library. The only thing we should keep in mind is that complexity of this implementation may be very different. Simple algorithm like this one has average complexity about 2*N where N is length of the string and worse complexity O(N*P) where P is length of the pattern, and depends on implementation of std::string.find() method.

int count_substrings(const std::string & s, const std::string& sub, bool overlapped = true) {
    size_t sub_size = sub.size();
    if ( (sub_size == 0) || (sub_size > s.size()) ) return 0;
    int count = 0;
    for (size_t offset = s.find(sub); 
       offset != string::npos;
       offset = s.find(sub, overlapped ? offset + 1 : offset + sub.size()))

    {
        ++count;
    }
    return count;

For a string like “aaa...(1.000.000 more ‘a’)...aaaB” and overlapped substring “aaa...(999.000 more ‘a’)...aaaB” it will works VERY slow because we will need to compare 1.000.000 * 999.000 characters.

There are many much faster algorithms even with complexity O(N) but all of them are applicable in special cases or has additional requirements. This is the cause why there are so many algorithms were invented and people don’t use only one the best of all. Below I will show a couple of the most famous and general algorithms which are applicable in most of cases and don’t have too many additional requirements.

The most famous is the Knuth–Morris–Pratt algorithm

int KMP_count(const string &s, const string &sub) {
    int num_subs = 0;
    size_t sub_size = sub.size();
    size_t s_size = s.size();
    if ((sub_size == 0) || (sub_size > s.size())) { return 0; }
    vector<int> prefixes(sub_size);
    for (int k = 0, i = 1; i < sub_size; ++i) {
        while ((k > 0) && (sub[i] != sub[k])) {
            k = prefixes[k - 1];
        }
        if (sub[i] == sub[k]) { k++; }
        prefixes[i] = k;
    }

    for (int k = 0, i = 0; i < s_size; ++i) {
        while ((k > 0) && (sub[k] != s[i])) {
            k = prefixes[k - 1];
        }
        if (sub[k] == s[i]) { k++; }

        if (k == sub_size) {
            // to get position of substrings in the string
            // (i - sub_size + 1); 
            ++num_subs;
        }
    }
    return num_subs;
}

There is a very fast algorithm with linear complexity which works similar to the KMP algorithm but looks more understandable IMHO. It is so called Z-function. Like the previous algorithm it requires to allocate an additional array of integers with size S where S is length of string N + length of substring P. Then we save result of calculations of Z-function in the array, pass through this array and recognize positions of searched substrings. Implementation of this algorithm is here:

vector<int> z_function (const string & s) {
    int n = (int) s.size();
    vector<int> z (n);
    for (int i=1, l=0, r=0; i<n; ++i) {
        if (i <= r)
            z[i] = min (r-i+1, z[i-l]);
        while (i+z[i] < n && s[z[i]] == s[i+z[i]])
            ++z[i];
        if (i+z[i]-1 > r)
            l = i,  r = i+z[i]-1;
    }
    return z;
}

int z_count(const std::string & s, const std::string& sub) {
    size_t sub_size = sub.size();
    if ( (sub_size == 0) || (sub_size > s.size()) ) return 0;

    string working_string(sub + s);
    size_t ws_size = working_string.size();

    auto z = z_function(working_string);

    int number_subs = 0;

    for(int i = sub_size; i < ws_size; ++i) {
        if (z[i] >= sub_size) {
// to get exact position of substrings in the string
//            cout << i - sub_size << endl; 
            ++number_subs;
        }
    }
    return number_subs;
}

And finally we should implement the parser which will break the input string to the pattern and the blobs where we will search the patterns.

Unfortunately standard C++ still have no so good tokenizer as strtok from plain C so I will use it. Even if it looks not so safety as pure C++ solution but it allows to tokenize a string in one like instead of tens of lines in pure C++.

string parser(const string & s) {
    string output;
    int total_patterns = 0;

    size_t pos = s.find_first_of(';');

    string pattern = s.substr(0, pos);
    string blobs = s.substr(pos+1);

    for(char *token = strtok((char*)blobs.c_str(), "|"); 
        token != NULL; token = strtok(NULL, "|")) 
    {
        string stoken(token);
//        int num_patterns = count_substrings(stoken, pattern);
//        int num_patterns = z_count(stoken, pattern);
        int num_patterns = KMP_count(stoken, pattern);
        total_patterns += num_patterns;
        output.append(to_string(num_patterns));
        output.append("|");
    }
    output.append(to_string(total_patterns));
    return output;
}

Repository with the full project you can find here: https://github.com/jolly-fellow/microsoft/tree/master/pattern_recognition

Return to the table of contents.

要查看或添加评论，请登录

Alexander Molchevskyi的更多文章

Demystifying Dynamic Programming: Beyond the Template

2024年5月13日

Demystifying Dynamic Programming: Beyond the Template

Many resources offer fantastic explanations of Dynamic Programming (DP) for solving algorithmic problems. While I’ve…
Best solutions for Microsoft interview tasks. Partition array into N subsets with balanced sum.

2020年11月12日

Best solutions for Microsoft interview tasks. Partition array into N subsets with balanced sum.

Description: Give you one sorted array, please put them into n buckets, we need to ensure we get n sub array with…
Best solutions for Microsoft interview tasks. Particle Velocity.

2020年10月27日

Best solutions for Microsoft interview tasks. Particle Velocity.

Description: Related problems: https://leetcode.com/problems/arithmetic-slices/ Solution: This task is very simple.
Best solutions for Microsoft interview tasks. Min Deletions To Obtain String in Right Format.

2020年10月16日

Best solutions for Microsoft interview tasks. Min Deletions To Obtain String in Right Format.

Description: Solution: Actually we need to find an equilibrium point on this string where most of 'A' letters will be…
Best solutions for Microsoft interview tasks. Unique Integers That Sum Up To 0

2020年10月11日

Best solutions for Microsoft interview tasks. Unique Integers That Sum Up To 0

Description: Given an integer n, return any array containing n unique integers such that they add up to 0. Example 1:…
Best solutions for Microsoft interview tasks. Min Swaps to Group Red Balls.

2020年10月11日

Best solutions for Microsoft interview tasks. Min Swaps to Group Red Balls.

Description: https://algo.monster/problems/min_adj_swaps_to_group_red_balls Solution: Probably the first idea you get…
Best solutions for Microsoft interview tasks. Concatenated String Length with unique Characters.

2020年7月4日

Best solutions for Microsoft interview tasks. Concatenated String Length with unique Characters.

Description: Related problems:…
Best solutions for Microsoft interview tasks. Largest K such that both K and -K exist in array.

2020年7月4日

Best solutions for Microsoft interview tasks. Largest K such that both K and -K exist in array.

Description: Solution: This task is very easy. It is a bit more difficult than finding of a maximum value in given…
Best solutions for Microsoft interview tasks. Day of week that is K days later.

2020年7月2日

Best solutions for Microsoft interview tasks. Day of week that is K days later.

Description: Solution: By description we have as input parameters: string with name of day of week and distance in days…
Best solutions for Microsoft interview tasks. Max Inserts to Obtain String Without 3 Consecutive 'a'

2020年7月2日

Best solutions for Microsoft interview tasks. Max Inserts to Obtain String Without 3 Consecutive 'a'

Description: Solution: If N is number of characters in the given string and all characters are not A we can insert into…

See all articles

Best solutions for Microsoft interview tasks. Pattern Recognition.

Alexander Molchevskyi

Experienced Rust, C++, Web3, Blockchain, Embedded developer exploring new opportunities

Description:

Solution:

Alexander Molchevskyi的更多文章

社区洞察

其他会员也浏览了

Apidel Technologies: Navigating AI's Impact on Job Applications

Navigating the AI Revolution: The Pros and Cons of AI in the 2024 Job Search Process

Are You Missing Out on These AI Tools? Why Your Job Search Might Be Falling Behind

The Role of AI in Hiring: Balancing Efficiency with Human Expertise

How Will AI Change the Hiring Process in 2025

TribuESK - AI Recruiters: The Future of Recruitment with TribuESK

Job seekers & recruiters are engaged in an AI-driven race that's reshaping the job search—Here’s what the future of recruiting could look like.

4 Key AI-Driven Changes That Defined IT Hiring in 2024

Using ChatGPT Or AI Resumes Keeps You Unemployed Longer

AI Takes Over Recruitment:

Description:

Solution:

Alexander Molchevskyi的更多文章

Demystifying Dynamic Programming: Beyond the Template

Best solutions for Microsoft interview tasks. Partition array into N subsets with balanced sum.

Best solutions for Microsoft interview tasks. Particle Velocity.

Best solutions for Microsoft interview tasks. Min Deletions To Obtain String in Right Format.

Best solutions for Microsoft interview tasks. Unique Integers That Sum Up To 0

Best solutions for Microsoft interview tasks. Min Swaps to Group Red Balls.

Best solutions for Microsoft interview tasks. Concatenated String Length with unique Characters.

Best solutions for Microsoft interview tasks. Largest K such that both K and -K exist in array.

Best solutions for Microsoft interview tasks. Day of week that is K days later.

Best solutions for Microsoft interview tasks. Max Inserts to Obtain String Without 3 Consecutive 'a'

社区洞察

其他会员也浏览了

Apidel Technologies: Navigating AI's Impact on Job Applications

Navigating the AI Revolution: The Pros and Cons of AI in the 2024 Job Search Process

Are You Missing Out on These AI Tools? Why Your Job Search Might Be Falling Behind

The Role of AI in Hiring: Balancing Efficiency with Human Expertise

How Will AI Change the Hiring Process in 2025

TribuESK - AI Recruiters: The Future of Recruitment with TribuESK

Job seekers & recruiters are engaged in an AI-driven race that's reshaping the job search—Here’s what the future of recruiting could look like.

4 Key AI-Driven Changes That Defined IT Hiring in 2024

Using ChatGPT Or AI Resumes Keeps You Unemployed Longer

AI Takes Over Recruitment: