Interview Pearls: Maximum Bipartite Matching
From Introduction to Algorithms, 3rd Edition.

Interview Pearls: Maximum Bipartite Matching

This is a textbook problem, and I vaguely remember it from the school. Though, when I saw it on LeetCode, I had absolutely no clue how to solve it. This problem is behind the paywall, so I’ll explain it here.

It was one of the hardest problems for me on LeetCode. Not sure why it's marked as Medium... Maybe it is Medium and I am just stressing myself too much?

This problem goes like this: we have m workers and n tasks. For each worker, there is a list of tasks they are capable of performing. Our objective is to match workers to tasks (1-to-1) in a way that maximizes the number of matching.

So, my first idea was to use backtracking, but it’s too slow, and memoisation it too expensive. We can realize that workers and tasks form a bipartite graph: workers on the left side, tasks on the right side, and edges connect workers with tasks they can perform. The maximum bipartite matching can be efficiently found using the flow network. The best way, in my option, is to first understand the maximum flow problem and the Fold-Fulkerson method. After that, you should be able to reduce the maximum bipartite matching to the maximum flow problem.

As a refresher, flow networks are directed graphs, originated from the source node s, and terminated at the sink node t. In other words, every node in the graph lies on some path from the source to the sink. Flow networks are used to model road maps, assembly lines, electrical networks, and so on. Each edge in the graph is a conduit, and has a capacity (e.g. 20 amperes of electrical current). In the maximum flow problem, we wish to compute the greatest rate we can move material from the source to the sink. As a flip problem, we may wish to identify bottlenecks edges in our network (the min-cut problem).

The maximum flow problem can be solved using the Ford-Fulkerson method, which iteratively pushes and balances flow in “pipes” until it reaches the maximum capacity of the network. The balancing is performed using so-called residual connections: for each edge, we create an edge in the backward direction with initial zero capacity. When we send some material along an edge, we decrease its capacity and increase the capacity of the backward edge. That allows the algorithm to back out flows when the network overflows. The best way to understand how it works is to go through a step-by-step example.

You can read about it in textbooks (Cormen et al., Introduction to Algorithms, third edition, pages 708-736 and Skiena, The Algorithm Design Manual, second edition, pages 217-222). However, it was hard for me to get the intuition from books alone, so I resorted to YouTube videos (there are plenty of good ones on this subject).

Now, to apply the Ford-Fulkerson method to the maximum bipartite matching problem, we need to add the source (connected to workers) and target (connected from tasks) nodes to our bipartite graph. Then, we assign the same capacity of 1 to every edge.

I cannot do a full justice of the theory of the network flows here. My motivation is to outline the problem space and approach. Take a look at one of many videos available on YouTube. For your reference, below is the implementation of the Ford-Fulkerson method, simplified for the maximum bipartite matching problem (this is an image with a link to GitHub).

No alt text provided for this image

The grid parameter specifies workers to jobs matching. From that matching, we create the adjacency list for n + m + 1 nodes (adding the sink as the last node), connecting workers to jobs, and jobs to the sink. Note that each edge has the capacity of 1 in the forward direction.

When doing depth-first search, we track nodes we visited, and swap the capacity of the edge between forward and backward directions. For example, if we use edge ab, we set its capacity to zero, and bump the capacity of edge a b to one. We can do a simple swap because each edge has the capacity of 1. Finally, instead of adding the source node, we start the search directly from each worker node.

Abdur-Rahman S.

CS @ UWaterloo | SWE @ Chess.com

2 年

Very interesting read!

回复

Thanks for sharing Vlad Trubachov, what would be the time and space complexity of this algorithm ?

回复

要查看或添加评论,请登录

Vlad Trubachov的更多文章

  • We are Engineers

    We are Engineers

    I am indebted to community for getting better at coding. I learn a lot from those comments.

    6 条评论
  • Interview Pearls: In-Shuffle

    Interview Pearls: In-Shuffle

    This one comes with a story (and no, it wasn't in a casino). I was considering a team in Azure, and, as you may know…

    6 条评论
  • Choice Engineering

    Choice Engineering

    I was a smoker for seventeeeeeeeeeen years, people..

    5 条评论
  • Coding Interview: The Game Plan

    Coding Interview: The Game Plan

    I am doing interviews few times a year, and it's probably not enough[1]. Opportunities aside, I learn a valuable lesson…

    7 条评论

社区洞察

其他会员也浏览了