VLOOKUP with a Binary Search - VLOOKUP(..., ..., ..., TRUE)

VLOOKUP with a Binary Search - VLOOKUP(..., ..., ..., TRUE)

As we all know VLOOKUP is a pretty useful function and many of us are also made aware by others that using TRUE in the Range_Lookup up section is lunacy as it simply doesn’t work. See here https://www.settingbox.com/excel-formulas-not-working/ for the proof. Section 10, but to save you the time looking here’s the snippet:

DO NOT STOP READING HERE!

Right, to clarify, I know that’s nonsense and I’m going to TRY to explain to you why.

Firstly, I’m going to give you a quick reminder of how the FALSE variant works. If I hand you an non-ordered telephone directory and I asked you to find Michael Blackman what’s the first thing you do? Simple, you open the directory at the first page and go through each and every entry until you find a match – good news, it’s the second item and we’ve got our MATCH, beautiful, two small processes for our computer. I now ask you to find Alan Allen, 124,517 processes later I have my match. Crap on toast that was boring and more importantly took forever.

Ok, the TRUE variant. Some essential knowledge:

  • The first column of your data (the one we’re searching for our match) needs to be sorted in Ascending order.
  • The Lookup Value MUST be in our Table_Array to produce an accurate result.

I now hand you an ordered telephone directory and ask you to find Michael Blackman. Our approach here is slightly different, we open the directory in the middle (computers are more exact in this regard) and we check there to see what the names on the sheet begin with. For this example, our middle item begins with L and from this we derive that Michael must fall in the second half of the Directory and can completely disregard the first half. That’s one process and I’ve dismissed half of my records in one hit (the same as looking at the first name in an non-ordered directory). Next I split what’s left exactly in half again and do the same check, it’s an S, lovely I can completely ignore the latter part of the directory and my RecordSet is now down to a quarter (in 2 processes – that’s the first two records checked with the first method). We then repeat the process writing off half of the remaining items until we’re left with a single record – this is our approximate match.

Now, please note I have used the term approximate match, the reason for this is that this method returns the nearest match (last man standing) and doesn’t require an exact match which is the source of people’s belief a Binary search doesn’t work. If the Lookup value is missing, we can return erroneous data but this knowledge alone arms us to avoid problems.

VLOOKUP("Michael Blackman",A:B,2,TRUE)

To use TRUE on data that may not contain a match without the concerns of producing incorrect results you can use something like this:

=IF(ISERROR(MATCH("Michael Blackman",A:A,0)),
    "",
    VLOOKUP("Michael Blackman",A:B,2,TRUE))

EDIT - WITH THANKS TO Jan Karel Pieterse USE THIS METHOD IF YOU ARE NOT 100% CERTAIN YOU WILL FIND A MATCH

=IF(VLOOKUP("Michael Blackman",A:B,1,TRUE)="Michael Blackman",
    VLOOKUP("Michael Blackman",A:B,2,TRUE),
    NA())

BACK TO ORIGINAL ARTICLE

To drive home why we would use TRUE is easy, it’s simply 000’s of times more efficient due to the number of Boolean checks (A1=”Michael Blackman”) it needs to make compared to the iterative search. EVEN WITH THE ADDITIONAL FUNCTION CALLS.

If anything above isn’t clear please feel free to ask away and I’ll do my best to answer any questions as best I can.

Fabien Morel

Expert BI functional analyst

8 年

very well explained!

Fran?ois Vandecan

IT Manager @Eiffage Construction BELUX

8 年

I will test this for sure! tks!

Michael Blackman

Data Analyst COG P&C at Chubb

8 年

Hi Florian, don't apologise, the grammar in my article was probably far worse :) When you say sorting issues do you mean because different formula will be using different index columns and therefore not both can be sorted as required? If so we could get around this with some VBA and a calculation event and still get the benefit of the binary search. Not tested but something like this could yield some serious efficiency gains vs using Exact Matches if you have a sizable dataset. Dim wksData As Worksheet, wksSummary As Worksheet Set wksData = ThisWorkbook.Sheets("MyData") Set wksSummary = ThisWorkbook.Sheets("MySummary") With wksData .UsedRange.Sort Key1:=.Cells(1, 4), Header:=xlYes End With With wksSummary .Range("B1:B10") = Application.VLookup(.Range("A1:A10"), Range(wksData.Columns(4), wksData.Columns(6)), 3, True) End With With wksData .UsedRange.Sort Key1:=.Cells(1, 6), Header:=xlYes End With With wksSummary .Range("C1:C10") = Application.VLookup(.Range("B1:B10"), Range(wksData.Columns(6), wksData.Columns(12)), 6, True) End With My understanding of the two is that INDEX MATCH is marginally quicker but the difference is negligible when you compare the speeds of Binary vs Iterative searching. Take a look at the link Andrea Thehos shared in these comments.

First of all, sorry for Capital letter mistakes. Currently using a german iPad... Great explanation! I use this Option to calculate tarifs on weight classes for example. The approximate result is a very Strong and easy Tool Here. but for the shown use case this practice is problematic, because you often need vlookup on several items of a dataset. Causing Sorting issues. One question about the Match Function. Is the exact Match so much more efficient than exact vlookup so it s useful for error handling of the approximate vlookup? Then i may use Index match combination in Future.

Michael Blackman

Data Analyst COG P&C at Chubb

8 年

Never underestimate the power of the Excel function library. Did you look at the calculations speeds using a binary function in the link Andreas Thehos kindly shared? I'm not sure Power Query would get a million records into the data model let alone report in 0.02 seconds. Power BI is powerful and I have been telling people for years it's the next big thing in data visualisation, VBA is incredible iwhen written correctly but Excel is a beautiful and extremely powerful thing in its own right, when used correctly.

回复

要查看或添加评论,请登录

Michael Blackman的更多文章

  • A folder is not a Database, probably!

    A folder is not a Database, probably!

    As many of you probably know by now I'm a technological dinosaur. Whilst many others are sharing information on new and…

    11 条评论
  • My Array Formula is Slow...

    My Array Formula is Slow...

    First things first. Yes I know Power BI exists.

    9 条评论
  • Using Names and Arrays to avoid nasty nested IF's

    Using Names and Arrays to avoid nasty nested IF's

    Hands up who remembers writing the longest nested IF statement in the world and thinking you've conquered the world? I…

    6 条评论
  • Saving User Settings locally in .ini files with VBA

    Saving User Settings locally in .ini files with VBA

    I’ve had a busy few months at Tax Automation and I’m delighted to say I’m learning new things again after stagnating to…

    2 条评论
  • ADO Function to Query most of the common Data Sources

    ADO Function to Query most of the common Data Sources

    I've lost count of the number of variants of this I've written on clients machines and have never remembered to keep a…

    3 条评论
  • Working with multiple Ranges without a Loop (Union)

    Working with multiple Ranges without a Loop (Union)

    Just wanted to quickly put something together to highlight a very powerful function built into the Excel Application…

    12 条评论
  • Function to Validate Headers

    Function to Validate Headers

    We've all been there, you've been downloading the same csv for the last 6 months every single damn day and blindly…

    10 条评论
  • Injecting pre-written Functions into Workbook Projects via VBA

    Injecting pre-written Functions into Workbook Projects via VBA

    During a recent interview I was asked to talk through a VBA Function that sets many of the application settings (Screen…

    4 条评论
  • Dynamic Named Ranges & Index Match

    Dynamic Named Ranges & Index Match

    Back in the old days before Lists received its revamp to Tables we had another way to ensure our formula was efficient…

    9 条评论

社区洞察