Algorithm For Counting Inversions
Harith Bakhrani
Backend Engineer | DevOps Engineer | Cloud | GCP | Kubernetes | Docker | AWS | Python | Remote
Pre-requisites:
- Good knowledge of programming
- Familiarity with recursion
- Familiarity with algorithms
What are Inversions?
Inversion in a list of numbers indicates how far a list is from being sorted. Let us say we have a variable: arr = [1,2,4,3,5,6], in the variable arr we have one inversion of the numbers 4 and 3 because they are in an unsorted order. Thus, the formal definition will be:
Given an array arr, an inversion would occur if arr[i] > arr[j] and i < j.
Just to cement what an inversion is, I would re-define it using other words:
If an element on the left is greater than an element on the right, then that is an inversion.
Why Count Inversions?
At this point, you might be wondering what are some real-life implications of counting inversions, why do we need to count inversions?
There are a couple of scenarios where this might help. One would be to have a numerical similarity measure that quantifies how close two lists are to each other. So, for example, I can create two lists that contain some products that two different people have ranked, beginning from the top-ranked to the least-ranked. From these two lists, I can then calculate inversions which would quantify how similar/dissimilar these two people are to one another. Finding people with similar preferences helps some websites to implement collaborative filtering.
The Worst-Case Scenario
What will the worst-case scenario look like for this algorithm?
In the worst-case scenario, the numbers would be totally inverted, so instead of having a list of numbers as: [ 1 , 2 , 3 , 4 , 5 , 6 ], we would have it as [ 6 , 5 , 4 , 3 , 2 , 1 ]. The mathematical formula for calculating the number of inversions given a list of length n and assuming that the numbers are totally inverted is: n(n-1) / 2.
To Code
Using Brute Force
Using nested loops, we can loop through the list two times. Below is pseudocode for counting the number of inversions in a list:
initialize arr to [1,2,4,3,5,6] initialize i to 0 initialize inversions to 0 while i is less than length of arr initialize j to i+1 while j is less than length of arr if arr[i] is greater than arr[j] then increment inversions by 1 print out inversions
As you can see we are going through the entire list two times, thus our time complexity for our “brute force” algorithm would be O(n2).
As Software Engineers, the question we need to ask is, can we do better? YES, we can do better!
Divide & Conquer!
If you are familiar with merge sort, then this would be easy for you, if you are not, then worry not, things would make sense as you progress!
Divide and conquer algorithms are used to continuously divide the list into two and then do whatever you would like to do. In the case of merge sort, we recursively divide the list into two and sort it out. Here is the pseudocode for merge sort:
function mergeSort(array) { if length of array is equal to 1 return array initialize firstHalf to mergeSort(first half of the array) initialize secondHalf to mergeSort(second half of the array) initialize sortedArray to an empty array initialize i to 0 initialize j to 0 while i < length of firstHalf and j < length of secondHalf if firstHalf[i] > secondHalf[j] append secondHalf[j] to sortedArray else append firstHalf[i] to sortedArray append the remaining elements of firstHalf to sortedArray append the remaining elements of secondHalf to sortedArray return sortedArray }
Let me give you a visual tour of what happens during the merge sort. At first, the elements get recursively divided into two until only one element is left:
When only one element is left, it is returned (it does not need sorting, of course). The elements are then merged in sorted order and thus we get our sorted list.
Wait, hold on a second! Where did our count inversions algorithm go???
Having known merge sort, we can now count inversions by keeping track of inversions variable:
function countInversions(array) { if length of array is equal to 1 return array midPoint = length of array \ 2 firstHalf, inversions1 = mergeSort(first half of the array) secondHalf, inversions2 = mergeSort(second half of the array) initialize sortedArray to an empty array initialize i to 0 initialize j to 0 initialize inversions to 0 while i < length of firstHalf and j < length of secondHalf if firstHalf[i] > secondHalf[j] append secondHalf[j] to sortedArray inversions = inversions + (midPoint - i) else append firstHalf[i] to sortedArray append the remaining elements of firstHalf to sortedArray append the remaining elements of secondHalf to sortedArray totalInversions = inversions1 + inversions2 + inversions return sortedArray, totalInversions }
Voila! There you go! We have a countInversions function that takes in an array as its parameter and spits out a sorted array together with the total number of inversions.
You might be having one question left though: why are we adding the length of the remaining elements in the firstHalf array to the inversions variable (inversions = inversions + (midPoint — i))? Let us assume we have reached a step where we have two lists:
The first check would be between 2 and 1, 2 is greater than 1 and so 1 would come first before 2. The number of inversions is two because the numbers 2 and 3 are both greater than 1. Thus, all the remaining elements in the first list would be less than 1 and their count would add up to the number of inversions we would have!
Practice!
Phew! Too much of theory and pseudocode there! How about you go and try to implement it your self on HackerRank: https://www.hackerrank.com/challenges/ctci-merge-sort/problem?h_l=interview&playlist_slugs%5B%5D=interview-preparation-kit&playlist_slugs%5B%5D=sorting
Enjoy ??
Founder & CEO @ J3D.AI (Jedi) | McK | Building the Decentralized Global Brain | TedX Speaker | IDG & SDG | Hydrogen | Longevity | Meditation ??
3 年??
Sr. Software Developer | Engineer
5 年This beautifully written! Thank you!