VLOOKUP with a Binary Search - VLOOKUP(..., ..., ..., TRUE)
As we all know VLOOKUP is a pretty useful function and many of us are also made aware by others that using TRUE in the Range_Lookup up section is lunacy as it simply doesn’t work. See here https://www.settingbox.com/excel-formulas-not-working/ for the proof. Section 10, but to save you the time looking here’s the snippet:
DO NOT STOP READING HERE!
Right, to clarify, I know that’s nonsense and I’m going to TRY to explain to you why.
Firstly, I’m going to give you a quick reminder of how the FALSE variant works. If I hand you an non-ordered telephone directory and I asked you to find Michael Blackman what’s the first thing you do? Simple, you open the directory at the first page and go through each and every entry until you find a match – good news, it’s the second item and we’ve got our MATCH, beautiful, two small processes for our computer. I now ask you to find Alan Allen, 124,517 processes later I have my match. Crap on toast that was boring and more importantly took forever.
Ok, the TRUE variant. Some essential knowledge:
- The first column of your data (the one we’re searching for our match) needs to be sorted in Ascending order.
- The Lookup Value MUST be in our Table_Array to produce an accurate result.
I now hand you an ordered telephone directory and ask you to find Michael Blackman. Our approach here is slightly different, we open the directory in the middle (computers are more exact in this regard) and we check there to see what the names on the sheet begin with. For this example, our middle item begins with L and from this we derive that Michael must fall in the second half of the Directory and can completely disregard the first half. That’s one process and I’ve dismissed half of my records in one hit (the same as looking at the first name in an non-ordered directory). Next I split what’s left exactly in half again and do the same check, it’s an S, lovely I can completely ignore the latter part of the directory and my RecordSet is now down to a quarter (in 2 processes – that’s the first two records checked with the first method). We then repeat the process writing off half of the remaining items until we’re left with a single record – this is our approximate match.
Now, please note I have used the term approximate match, the reason for this is that this method returns the nearest match (last man standing) and doesn’t require an exact match which is the source of people’s belief a Binary search doesn’t work. If the Lookup value is missing, we can return erroneous data but this knowledge alone arms us to avoid problems.
VLOOKUP("Michael Blackman",A:B,2,TRUE)
To use TRUE on data that may not contain a match without the concerns of producing incorrect results you can use something like this:
=IF(ISERROR(MATCH("Michael Blackman",A:A,0)),
"",
VLOOKUP("Michael Blackman",A:B,2,TRUE))
EDIT - WITH THANKS TO Jan Karel Pieterse USE THIS METHOD IF YOU ARE NOT 100% CERTAIN YOU WILL FIND A MATCH
=IF(VLOOKUP("Michael Blackman",A:B,1,TRUE)="Michael Blackman",
VLOOKUP("Michael Blackman",A:B,2,TRUE),
NA())
BACK TO ORIGINAL ARTICLE
To drive home why we would use TRUE is easy, it’s simply 000’s of times more efficient due to the number of Boolean checks (A1=”Michael Blackman”) it needs to make compared to the iterative search. EVEN WITH THE ADDITIONAL FUNCTION CALLS.
If anything above isn’t clear please feel free to ask away and I’ll do my best to answer any questions as best I can.
Expert BI functional analyst
8 年very well explained!
IT Manager @Eiffage Construction BELUX
8 年I will test this for sure! tks!
Data Analyst COG P&C at Chubb
8 年Hi Florian, don't apologise, the grammar in my article was probably far worse :) When you say sorting issues do you mean because different formula will be using different index columns and therefore not both can be sorted as required? If so we could get around this with some VBA and a calculation event and still get the benefit of the binary search. Not tested but something like this could yield some serious efficiency gains vs using Exact Matches if you have a sizable dataset. Dim wksData As Worksheet, wksSummary As Worksheet Set wksData = ThisWorkbook.Sheets("MyData") Set wksSummary = ThisWorkbook.Sheets("MySummary") With wksData .UsedRange.Sort Key1:=.Cells(1, 4), Header:=xlYes End With With wksSummary .Range("B1:B10") = Application.VLookup(.Range("A1:A10"), Range(wksData.Columns(4), wksData.Columns(6)), 3, True) End With With wksData .UsedRange.Sort Key1:=.Cells(1, 6), Header:=xlYes End With With wksSummary .Range("C1:C10") = Application.VLookup(.Range("B1:B10"), Range(wksData.Columns(6), wksData.Columns(12)), 6, True) End With My understanding of the two is that INDEX MATCH is marginally quicker but the difference is negligible when you compare the speeds of Binary vs Iterative searching. Take a look at the link Andrea Thehos shared in these comments.
First of all, sorry for Capital letter mistakes. Currently using a german iPad... Great explanation! I use this Option to calculate tarifs on weight classes for example. The approximate result is a very Strong and easy Tool Here. but for the shown use case this practice is problematic, because you often need vlookup on several items of a dataset. Causing Sorting issues. One question about the Match Function. Is the exact Match so much more efficient than exact vlookup so it s useful for error handling of the approximate vlookup? Then i may use Index match combination in Future.
Data Analyst COG P&C at Chubb
8 年Never underestimate the power of the Excel function library. Did you look at the calculations speeds using a binary function in the link Andreas Thehos kindly shared? I'm not sure Power Query would get a million records into the data model let alone report in 0.02 seconds. Power BI is powerful and I have been telling people for years it's the next big thing in data visualisation, VBA is incredible iwhen written correctly but Excel is a beautiful and extremely powerful thing in its own right, when used correctly.