A folder is not a Database, probably!
As many of you probably know by now I'm a technological dinosaur. Whilst many others are sharing information on new and shiny tools such as Power BI i see it as my duty to keep some of the older approaches alive.
Why?
Well that's simple, we are all on a personal journey and if you're anything like me you want to be as good as you can be at what you do for as many people as possible. If you're at the start of your journey as an Excel developer you're going to be inundated with articles on Power BI and M which makes perfect sense but what happens when you rock up to a new gig and you're informed they are still locked into legacy Excel versions and Power Query isn't an Option?
The answer should probably be ADO.
Mashing csv's together from several data sources
If you read my previous article My Array Formula is Slow... and played around with the example file shared you would have come across (an older iteration of) my ADO class where I demonstrated leveraging SQL's Pivot & Transform functionality to work efficiently with a 600k+ dataset. There was also an example of using Advanced Filter to achieve the same results but that's slightly less relevant.
When we work with a Database we can use SELECT INTO statements that allow us to create brand new tables by joining up data from numerous other tables but obviously if you've only got csv extracts to work with the Database is no longer of any use.
The good news is that if you house one or more csvs or text files into a folder you can create a connection to that folder with ADO and employ any of your SQL methods that work in your Database with your folder, do research the subtle differences between Jet SQL & one of the more main stream flavours - ie CASE statements are replaced with IIF([logical test],[True],[False])
You can also use ADO within an Excel file to create new sheets in the same method but you lose some functionality as you are unable to use DROP TABLE of DELETE queries.
Walkthrough
We're going to start out with just our initial raw data file.
Next we're going to create a new mapping table to redefine our various Relationship statuses.
And collect some US State level statistics to bring through both population & size per state from good old Wiki
And we'll save these as csv files in the same folder as our data.
Now for the ADO & SQL, as mentioned above I'm using my ADO class again (which is exactly why we write re-usable classes) and I'm going to add and execute various SQL strings as seen below after connecting to my folder.
Query 1 will drop (delete) any pre-existing AggData.csv files
Query 2 joins and aggregates our data file with our new relationship mapping table and our totally reliable Wiki data
Query 3 & 4 then will query our newly created AggData.csv file and pull our data into our workbook as declared by setting our .Range properties.
And Voila, running the above code (along with our ADO class and certainly after adding in the missing PathSeperator :) proceeds to force our simple Windows folder to behave like a database.
A working example can be downloaded from my Google Drive here, simply extract the file and the Data folder to a location on your machine and fire up the 600K+ Records Blog 2.xlsb and run the JoinDataSetsAndAggregate Subroutine to test it. If you're SQL is decent feel free to play around to your hearts content.
As always any questions and feedback are more than welcome in the comments of the article below.
Principal Data Analyst at Walmart Global Tech India
7 年Good article...Happy to learn
Chartered Accountant
7 年Some great info in there, thanks for sharing. I also can't help myself but like anything which leads with a Dinosaur photo.
Infrastructure Capital Markets VP at Crédit Agricole CIB
7 年Great article! Inspiring way of managing csv files!
Group Manager at Avanade | Microsoft Most Valuable Professional (MVP) | Fast Track Recognised Solution Architect (FTRSA) | Chartered Accountant (ACA) | Microsoft Certified Trainer (MCT) | Quantic Executive MBA (Hons)
7 年This is awesome Michael! I used VBA and ADO for years and never knew it could connect to a folder data source! Thanks for sharing!