Interview Pearls: Maximum Bipartite Matching
This is a textbook problem, and I vaguely remember it from the school. Though, when I saw it on LeetCode, I had absolutely no clue how to solve it. This problem is behind the paywall, so I’ll explain it here.
It was one of the hardest problems for me on LeetCode. Not sure why it's marked as Medium... Maybe it is Medium and I am just stressing myself too much?
This problem goes like this: we have m workers and n tasks. For each worker, there is a list of tasks they are capable of performing. Our objective is to match workers to tasks (1-to-1) in a way that maximizes the number of matching.
So, my first idea was to use backtracking, but it’s too slow, and memoisation it too expensive. We can realize that workers and tasks form a bipartite graph: workers on the left side, tasks on the right side, and edges connect workers with tasks they can perform. The maximum bipartite matching can be efficiently found using the flow network. The best way, in my option, is to first understand the maximum flow problem and the Fold-Fulkerson method. After that, you should be able to reduce the maximum bipartite matching to the maximum flow problem.
As a refresher, flow networks are directed graphs, originated from the source node s, and terminated at the sink node t. In other words, every node in the graph lies on some path from the source to the sink. Flow networks are used to model road maps, assembly lines, electrical networks, and so on. Each edge in the graph is a conduit, and has a capacity (e.g. 20 amperes of electrical current). In the maximum flow problem, we wish to compute the greatest rate we can move material from the source to the sink. As a flip problem, we may wish to identify bottlenecks edges in our network (the min-cut problem).
The maximum flow problem can be solved using the Ford-Fulkerson method, which iteratively pushes and balances flow in “pipes” until it reaches the maximum capacity of the network. The balancing is performed using so-called residual connections: for each edge, we create an edge in the backward direction with initial zero capacity. When we send some material along an edge, we decrease its capacity and increase the capacity of the backward edge. That allows the algorithm to back out flows when the network overflows. The best way to understand how it works is to go through a step-by-step example.
You can read about it in textbooks (Cormen et al., Introduction to Algorithms, third edition, pages 708-736 and Skiena, The Algorithm Design Manual, second edition, pages 217-222). However, it was hard for me to get the intuition from books alone, so I resorted to YouTube videos (there are plenty of good ones on this subject).
Now, to apply the Ford-Fulkerson method to the maximum bipartite matching problem, we need to add the source (connected to workers) and target (connected from tasks) nodes to our bipartite graph. Then, we assign the same capacity of 1 to every edge.
I cannot do a full justice of the theory of the network flows here. My motivation is to outline the problem space and approach. Take a look at one of many videos available on YouTube. For your reference, below is the implementation of the Ford-Fulkerson method, simplified for the maximum bipartite matching problem (this is an image with a link to GitHub).
The grid parameter specifies workers to jobs matching. From that matching, we create the adjacency list for n + m + 1 nodes (adding the sink as the last node), connecting workers to jobs, and jobs to the sink. Note that each edge has the capacity of 1 in the forward direction.
When doing depth-first search, we track nodes we visited, and swap the capacity of the edge between forward and backward directions. For example, if we use edge a → b, we set its capacity to zero, and bump the capacity of edge a ← b to one. We can do a simple swap because each edge has the capacity of 1. Finally, instead of adding the source node, we start the search directly from each worker node.
CS @ UWaterloo | SWE @ Chess.com
2 年Very interesting read!
SWE 3@Google
2 年Thanks for sharing Vlad Trubachov, what would be the time and space complexity of this algorithm ?