Data Vault 2.0 and Data Science
Sanjay Pande
Chief of Marketing and Product Strategy at Data Vault Alliance, Data Vault 2.0 Authorized Instructor, DV 2.0 Certified Master
Audience(s):?
What is Data Vault 2.0?
An enabler within your organization, Data Vault 2.0 is the only prescriptive industry standard methodology for you to turn raw data into actionable business intelligence, leading to tangible business outcomes. It gives you a proactive, proven recipe to rapidly produce results for your business intelligence endeavours.
On this particular front, there’s nothing out there that comes close because while there are various architectures, modelling styles and frameworks, nothing really gives you step-by-step methodology to follow across people, process and technology that goes all the way from soup to nuts using a solution approach.
Note: Data Vault 2.0 enthusiasts already know this, so they can skip this section and move to the next section. If you haven’t heard of it, then here’s what you get:
There’s a lot more to it, but what’s crucial to understand is that it is a paint-by-the-numbers approach to building analytics solutions and includes everything from requirements to information delivery to the business.?
They key is to understand and then adhere to the prescribed standards and recommended best practices. To ensure that business get the maximum value out of this solution, there are certifications which start with people. But first, lets talk about how …
You Can Derive Maximum Value from Data Science with Data Vault 2.0
Assume you have a data engineering team and a data science team. As you're aware, it's not possible to do any data science without a reasonable good data set. Data scientists are far more valuable to the organization when they don’t have to mess with the data engineering aspects (something that they don’t actually receive enough training in), but are rather used to leverage their talents to use the data and build data science models.
Let's now look at the synergistic aspects of Data Vault 2.0 and Data Science that has led to so many successful teams presenting Data Science success stories at the annual world wide data vault consortium.
Synergy #1 - Delivering Data to Data Science Teams - Data engineering teams have deep expertise in delivering required data sets to different levels of the organization. They’ve been doing this for decades and you cannot discount that expertise. Just the time it would save the data science teams would pretty much be an argument for doing this, because of which it has been discussed and recommended by many thought leaders.
One of the unique features of the Data Vault 2.0 methodology is to include all data (within scope) in the Data Warehouse without any “soft” business rules. The enables a team using a Data Vault to deliver raw or cleansed sets - whatever is needed.
领英推荐
Synergy #2 - The Data Vault 2.0 Exploration Link - The DV 2.0 model has a construct called the exploration link. This is where the Data Vault actually becomes a client to data science teams. The exploration link is an extremely powerful construct that has the power to revolutionize the usage of AI and ML to find relationships in existing data sets and use it in an already enriched analytics environment.
In fact, it’s such an important topic, there’s an entire section dedicated to it in the Data Vault 2.0 certification bootcamp.
Synergy #3 - Temporal Data Archival of Data Science Results - Any data science effort is only as good as the data it uses to build models from. But, while the models are getting trained, where do you store the learnings. The DV 2.0 model gives your teams a place to store and record findings over time. The best part is because it is in a data warehouse construct where temporal sets are kept, you can store the results even over time. If you know about predictive analytics, just think about the power of having stored sets with history - after the fact of the prediction - to compare the predicted to the actual. You can easily fine tune the data, the models, and your way of working to reality in the data.
The teams can actually demonstrate the actual variance of predicted to actual with data and continue to improve the sets and models to get better and better.
There’s a lot more we can talk about here, but I’ll leave these to your imagination for now as these three points are …
Important enough to consider leveraging a Data Vault 2.0 solution
As far as I know, the Data Vault 2.0 System of Business Intelligence is the only solution that has a consideration for Data Science teams in its systems architecture with synergy between these teams so the organization can actually benefit from its data assets.
Data Vault Alliance (DVA) is globally recognized by forward thinking information-driven leaders as the trusted authority of the Data Vault standards, resources, compliant implementations, and measurable, predictable and valuable business outcomes.
That standardized methodology of Data Vault 2.0 dramatically reduces risk by enabling compliant solutions and ensuring predictable ROI.
The Data Vault 2.0 methodology works across people, process and technology and DVA as an organization focuses on ensuring that Data Vaults are done right, everywhere, every time which starts with Data Vault 2.0 certified people or the CDVP2 certification offered by DVA and it’s Authorized Training Partners (ATPs) world-wide.
There are several success stories of Data Science and Data Vault 2.0 teams working together, but any successful effort requires trained and certified people who adhere to the standards and follow the best practices.
Since, the foundation of any successful solution is a team of trained people, it’s important to get an entire team trained and certified before building out a solution, and if required even get some guidance during the initial phase.
There are several scheduled Data Vault 2.0 bootcamps by DVA and its authorized training partners across the world that can be found here -> https://learn.datavaultalliance.com/event-directory/
Chief Operating Officer at DataVaultAlliance Holdings and President of DataRebels
2 年Sanjay, you nailed this at a high level, easily consumed and understood. I know a number of data science teams who have benefited from DV2. I particularly appreciated your comment, "Data scientists are far more valuable to the organization when they don’t have to mess with the data engineering aspects ...". As I've said on podcasts and teach in DataRebels?CDVP2 classes, Data Scientists are the most expensive ETL programmers a company will ever hire. Kudos!