Data Vault 2.0 and Data Science

Data Vault 2.0 and Data Science

Audience(s):?

  1. Data Vault 2.0 enthusiasts working with AI/ML (Data Science) teams
  2. AI/ML teams working with data engineering teams
  3. Business sponsors of data management efforts related to BI/AI/ML

What is Data Vault 2.0?

An enabler within your organization, Data Vault 2.0 is the only prescriptive industry standard methodology for you to turn raw data into actionable business intelligence, leading to tangible business outcomes. It gives you a proactive, proven recipe to rapidly produce results for your business intelligence endeavours.

On this particular front, there’s nothing out there that comes close because while there are various architectures, modelling styles and frameworks, nothing really gives you step-by-step methodology to follow across people, process and technology that goes all the way from soup to nuts using a solution approach.

Note: Data Vault 2.0 enthusiasts already know this, so they can skip this section and move to the next section. If you haven’t heard of it, then here’s what you get:

  • A flexible data model architecture capable of dealing with cross-platform data persistence, multi-latency, multi-structured data, and massively parallel platforms.?
  • An agile methodology called Disciplined Agile Deliveries (DAD) within it, which is both automation and load friendly.
  • Several architectures across people, process, data and technology which includes a scalable systems architecture for dealing with almost any type of data.
  • Implementation guidelines with patterns and templates for all aspects of the solution.

There’s a lot more to it, but what’s crucial to understand is that it is a paint-by-the-numbers approach to building analytics solutions and includes everything from requirements to information delivery to the business.?

They key is to understand and then adhere to the prescribed standards and recommended best practices. To ensure that business get the maximum value out of this solution, there are certifications which start with people. But first, lets talk about how …

You Can Derive Maximum Value from Data Science with Data Vault 2.0

Assume you have a data engineering team and a data science team. As you're aware, it's not possible to do any data science without a reasonable good data set. Data scientists are far more valuable to the organization when they don’t have to mess with the data engineering aspects (something that they don’t actually receive enough training in), but are rather used to leverage their talents to use the data and build data science models.

Let's now look at the synergistic aspects of Data Vault 2.0 and Data Science that has led to so many successful teams presenting Data Science success stories at the annual world wide data vault consortium.

Synergy #1 - Delivering Data to Data Science Teams - Data engineering teams have deep expertise in delivering required data sets to different levels of the organization. They’ve been doing this for decades and you cannot discount that expertise. Just the time it would save the data science teams would pretty much be an argument for doing this, because of which it has been discussed and recommended by many thought leaders.

One of the unique features of the Data Vault 2.0 methodology is to include all data (within scope) in the Data Warehouse without any “soft” business rules. The enables a team using a Data Vault to deliver raw or cleansed sets - whatever is needed.

Synergy #2 - The Data Vault 2.0 Exploration Link - The DV 2.0 model has a construct called the exploration link. This is where the Data Vault actually becomes a client to data science teams. The exploration link is an extremely powerful construct that has the power to revolutionize the usage of AI and ML to find relationships in existing data sets and use it in an already enriched analytics environment.

In fact, it’s such an important topic, there’s an entire section dedicated to it in the Data Vault 2.0 certification bootcamp.

Synergy #3 - Temporal Data Archival of Data Science Results - Any data science effort is only as good as the data it uses to build models from. But, while the models are getting trained, where do you store the learnings. The DV 2.0 model gives your teams a place to store and record findings over time. The best part is because it is in a data warehouse construct where temporal sets are kept, you can store the results even over time. If you know about predictive analytics, just think about the power of having stored sets with history - after the fact of the prediction - to compare the predicted to the actual. You can easily fine tune the data, the models, and your way of working to reality in the data.

The teams can actually demonstrate the actual variance of predicted to actual with data and continue to improve the sets and models to get better and better.

There’s a lot more we can talk about here, but I’ll leave these to your imagination for now as these three points are …

Important enough to consider leveraging a Data Vault 2.0 solution

As far as I know, the Data Vault 2.0 System of Business Intelligence is the only solution that has a consideration for Data Science teams in its systems architecture with synergy between these teams so the organization can actually benefit from its data assets.

Data Vault Alliance (DVA) is globally recognized by forward thinking information-driven leaders as the trusted authority of the Data Vault standards, resources, compliant implementations, and measurable, predictable and valuable business outcomes.

That standardized methodology of Data Vault 2.0 dramatically reduces risk by enabling compliant solutions and ensuring predictable ROI.

The Data Vault 2.0 methodology works across people, process and technology and DVA as an organization focuses on ensuring that Data Vaults are done right, everywhere, every time which starts with Data Vault 2.0 certified people or the CDVP2 certification offered by DVA and it’s Authorized Training Partners (ATPs) world-wide.

There are several success stories of Data Science and Data Vault 2.0 teams working together, but any successful effort requires trained and certified people who adhere to the standards and follow the best practices.

Since, the foundation of any successful solution is a team of trained people, it’s important to get an entire team trained and certified before building out a solution, and if required even get some guidance during the initial phase.

There are several scheduled Data Vault 2.0 bootcamps by DVA and its authorized training partners across the world that can be found here -> https://learn.datavaultalliance.com/event-directory/

Cindi Meyersohn

Chief Operating Officer at DataVaultAlliance Holdings and President of DataRebels

2 年

Sanjay, you nailed this at a high level, easily consumed and understood. I know a number of data science teams who have benefited from DV2. I particularly appreciated your comment, "Data scientists are far more valuable to the organization when they don’t have to mess with the data engineering aspects ...". As I've said on podcasts and teach in DataRebels?CDVP2 classes, Data Scientists are the most expensive ETL programmers a company will ever hire. Kudos!

要查看或添加评论,请登录

Sanjay Pande的更多文章

  • A Few Use Cases More ... Perhaps!

    A Few Use Cases More ... Perhaps!

    Are you pondering what I'm pondering Pinky? So, I've been thinking about this for some time now. To be honest this is…

    1 条评论
  • Data Vault 2.0 - Already Designed for Analytics on SaaS Applications

    Data Vault 2.0 - Already Designed for Analytics on SaaS Applications

    Any organization who is a SaaS (Software as a Service) application provider and wishes to deliver additional analytics…

    4 条评论
  • Data Vault 2.0 Helps Kick-Start MDM Efforts

    Data Vault 2.0 Helps Kick-Start MDM Efforts

    Whether you have a Master Data Management System or are planning on one, DV 2.0 solutions are already ahead.

    5 条评论
  • At Least 10 Reasons to Attend WWDVC 2022 - Live and Streaming

    At Least 10 Reasons to Attend WWDVC 2022 - Live and Streaming

    Every year Data Warehousing and Data Vault 2.0 enthusiasts gather at the World Wide Data Vault Conference to discuss…

    2 条评论
  • The Day the Music Died - 14th November 2021

    The Day the Music Died - 14th November 2021

    The reference to the famous Don McLean song titled American Pie is on purpose. The song refers to the untimely crash of…

  • WWDVC 2021 will be 100% Virtual

    WWDVC 2021 will be 100% Virtual

    Data Vault Alliance is organizing the 7th annual WWDVC and it is going to be virtual this year owing to the travel…

  • Understanding Homeschoolers

    Understanding Homeschoolers

    Recently an angel investor put out a feeler about looking to fund a startup which will promote homeschooling in India…

    1 条评论
  • At Least 7 Good Reasons to Attend the First Annual WWDVC EU

    At Least 7 Good Reasons to Attend the First Annual WWDVC EU

    The World Wide Data Vault Consortium has been the premier and only Data Vault focused conference in the world for 6…

    1 条评论
  • 12 More Reasons to Attend the 6th Annual WWDVC

    12 More Reasons to Attend the 6th Annual WWDVC

    If you’re a program manager, project manager, team leader, business intelligence specialist, data integration expert…

  • What Would Santa Do?

    What Would Santa Do?

    So, it's that time of the year again and I have to do my customary assessment of the operations at the north pole. Your…

    2 条评论

社区洞察

其他会员也浏览了