High performance DW rule #6: simplify, simplify, then simplify some more
Colin Chapman was an interesting guy. For those that don't know, he was a designer for Lotus in a time when most in racing were adding horsepower to win races. With more horsepower, comes the need for bigger brakes, heavier transmissions, wider tires etc. Chapman took a very different approach: "Adding power makes you faster on the straights. Subtracting weight makes you faster everywhere."
It was completely contrary to the ethos of the time. But it worked, Lotus won races consistently. His design for the Lotus 7 was eventually open sourced, and examples are still being built and winning in sports car clubs around the world decades later.
The same holds true for your data warehouse environment. Adding lightness will make it quicker, from both a machine utilization standpoint, but more importantly, from a manpower standpoint.
Consider the above architecture diagram, fairly common today and hasn't changed much in quite a few years. Data flows from source systems to an ODS and a Data Warehouse. From there, to numerous datamarts, caching layers, then eventually to reporting/dashboards.
It's a tried and true methodology, and has been used for a long time. But do we really need all of that? Like Colin's challenges back in the day, simplifying this architecture requires a lot of product knowledge. "What materials and manufacturing processes are available for making wheel uprights" translates to "what data products are available for simplifying my architecture".
领英推荐
Today, there are amazing products that make a bunch of the blocks on that diagram vanish. Why do datamarts exist? Well, because many data warehouse products didn't efficiently support aggregation and joins. That caching layer? Again, newer platforms integrate that into the data warehouse itself, so there's no need for Analysis Services nor extracts for your presentation layer.
Every block and arrow on that diagram signifies manpower. Someone has to create the processes that make data flow. Someone has to manage every system that's signified as a block.
There's 10 blocks in the DW/Analytics world in that diagram, if I can make 7 of them unnecessary, my staff can concentrate on providing value, rather than going though the unnecessary labor of maintaining an architecture that isn't efficient.
Simplify, then add lightness.
Top Voice Data Architect and Top Voice Data Governance!! DV2.0, Principal Data & Governance Architect, Led team of Data Engineers & Data Analysts, BI Architect, Data Architect
1 年Simple though effective diagram Robert! Many new #dwh / #lakehouse implementation skipping ODS! It misses long term vision on analytics #performance and simplicity. I recently implemented end to end Analytics #dataproduct for a Banking Product. I categorically added #ods with #canonical model! It is working with wonder ??
Data Architect at Tata Steel BV, designer of data constructs.
1 年Rule number one: less data is faster. reminds me of that 64K computer I started of with.
Business Analytics Software Engineering Expert
1 年Caching after Data Marts could/should be replaced by any kind of Analytics solution IBM Planning Analytics (my prefered choice), Jedox, Pigment..
BI & Analytics / Data engineering / Statistics & Data Science / MITAA Affiliate
1 年Maybe one day even dashboards will be obsolete. You will have an AI assistant trained on source data in real time. Thé AI assistant will generate beautiful dashboards to show you the best insights found in the data and tell you the story with a nice pitch. Ultimately he will tell you what you need to change to make business better ! No doubt it will come in the future.
Your Next Data & AI Project Manager
1 年Love it!