A folder is not a Database, probably!

A folder is not a Database, probably!

As many of you probably know by now I'm a technological dinosaur. Whilst many others are sharing information on new and shiny tools such as Power BI i see it as my duty to keep some of the older approaches alive.

Why?

Well that's simple, we are all on a personal journey and if you're anything like me you want to be as good as you can be at what you do for as many people as possible. If you're at the start of your journey as an Excel developer you're going to be inundated with articles on Power BI and M which makes perfect sense but what happens when you rock up to a new gig and you're informed they are still locked into legacy Excel versions and Power Query isn't an Option?

The answer should probably be ADO.

Mashing csv's together from several data sources

If you read my previous article My Array Formula is Slow... and played around with the example file shared you would have come across (an older iteration of) my ADO class where I demonstrated leveraging SQL's Pivot & Transform functionality to work efficiently with a 600k+ dataset. There was also an example of using Advanced Filter to achieve the same results but that's slightly less relevant.

When we work with a Database we can use SELECT INTO statements that allow us to create brand new tables by joining up data from numerous other tables but obviously if you've only got csv extracts to work with the Database is no longer of any use.

The good news is that if you house one or more csvs or text files into a folder you can create a connection to that folder with ADO and employ any of your SQL methods that work in your Database with your folder, do research the subtle differences between Jet SQL & one of the more main stream flavours - ie CASE statements are replaced with IIF([logical test],[True],[False])

You can also use ADO within an Excel file to create new sheets in the same method but you lose some functionality as you are unable to use DROP TABLE of DELETE queries.

Walkthrough

We're going to start out with just our initial raw data file.

Next we're going to create a new mapping table to redefine our various Relationship statuses.

And collect some US State level statistics to bring through both population & size per state from good old Wiki

And we'll save these as csv files in the same folder as our data.

Now for the ADO & SQL, as mentioned above I'm using my ADO class again (which is exactly why we write re-usable classes) and I'm going to add and execute various SQL strings as seen below after connecting to my folder.

Query 1 will drop (delete) any pre-existing AggData.csv files

Query 2 joins and aggregates our data file with our new relationship mapping table and our totally reliable Wiki data

Query 3 & 4 then will query our newly created AggData.csv file and pull our data into our workbook as declared by setting our .Range properties.

And Voila, running the above code (along with our ADO class and certainly after adding in the missing PathSeperator :) proceeds to force our simple Windows folder to behave like a database.

A working example can be downloaded from my Google Drive here, simply extract the file and the Data folder to a location on your machine and fire up the 600K+ Records Blog 2.xlsb and run the JoinDataSetsAndAggregate Subroutine to test it. If you're SQL is decent feel free to play around to your hearts content.

As always any questions and feedback are more than welcome in the comments of the article below.

Balaji Venktesh

Principal Data Analyst at Walmart Global Tech India

7 年

Good article...Happy to learn

Gemma Pryke

Chartered Accountant

7 年

Some great info in there, thanks for sharing. I also can't help myself but like anything which leads with a Dinosaur photo.

Matthieu Liatard

Infrastructure Capital Markets VP at Crédit Agricole CIB

7 年

Great article! Inspiring way of managing csv files!

Rishi Sapra ACA, MCT, Microsoft MVP

Group Manager at Avanade | Microsoft Most Valuable Professional (MVP) | Fast Track Recognised Solution Architect (FTRSA) | Chartered Accountant (ACA) | Microsoft Certified Trainer (MCT) | Quantic Executive MBA (Hons)

7 年

This is awesome Michael! I used VBA and ADO for years and never knew it could connect to a folder data source! Thanks for sharing!

要查看或添加评论,请登录

Michael Blackman的更多文章

  • My Array Formula is Slow...

    My Array Formula is Slow...

    First things first. Yes I know Power BI exists.

    9 条评论
  • Using Names and Arrays to avoid nasty nested IF's

    Using Names and Arrays to avoid nasty nested IF's

    Hands up who remembers writing the longest nested IF statement in the world and thinking you've conquered the world? I…

    6 条评论
  • Saving User Settings locally in .ini files with VBA

    Saving User Settings locally in .ini files with VBA

    I’ve had a busy few months at Tax Automation and I’m delighted to say I’m learning new things again after stagnating to…

    2 条评论
  • ADO Function to Query most of the common Data Sources

    ADO Function to Query most of the common Data Sources

    I've lost count of the number of variants of this I've written on clients machines and have never remembered to keep a…

    3 条评论
  • Working with multiple Ranges without a Loop (Union)

    Working with multiple Ranges without a Loop (Union)

    Just wanted to quickly put something together to highlight a very powerful function built into the Excel Application…

    12 条评论
  • Function to Validate Headers

    Function to Validate Headers

    We've all been there, you've been downloading the same csv for the last 6 months every single damn day and blindly…

    10 条评论
  • Injecting pre-written Functions into Workbook Projects via VBA

    Injecting pre-written Functions into Workbook Projects via VBA

    During a recent interview I was asked to talk through a VBA Function that sets many of the application settings (Screen…

    4 条评论
  • VLOOKUP with a Binary Search - VLOOKUP(..., ..., ..., TRUE)

    VLOOKUP with a Binary Search - VLOOKUP(..., ..., ..., TRUE)

    As we all know VLOOKUP is a pretty useful function and many of us are also made aware by others that using TRUE in the…

    18 条评论
  • Dynamic Named Ranges & Index Match

    Dynamic Named Ranges & Index Match

    Back in the old days before Lists received its revamp to Tables we had another way to ensure our formula was efficient…

    9 条评论

社区洞察

其他会员也浏览了