What are the differences between a Data Warehouse and a Transactional Database?

What are the differences between a Data Warehouse and a Transactional Database?

Over the past 24 - 36 months, the majority of our long term client engagements have started with our client wanting to address what they perceived to be a “data science” or big data problem. In nearly every case, we found that there was a lack of data management infrastructure necessary for doing serious analytical work and they would be better served and able to address a wider set of stakeholders in the organization by tying together their various data sources into an integrated data lake/data warehouse as a first step. This approach not only provides a richer data set for data science work, it automates a ton of manual processes required to cleanse and stitch the data together, and allows for operational Business Intelligence (BI) and self service reporting. Finance/Accounting, Operations, Sales/Marketing, C-Suite, etc., are now able to produce reports and analytics from a single version of the truth.??

At the heart of these solutions is the data lake/data warehouse. While the concept of a data warehouse is not new….I’ve been implementing them for nearly 30 years now….It still blows me away that we often find ourselves in a position where we are helping our client build consensus in the organization around the concept of a data warehouse and the overall business case. For many folks, they just can’t understand why they need to build yet another database and start feeding it data.????

In the context of business intelligence, a data warehouse is a core repository that serves as a “single version of truth” that integrates and rolls up data across various data sources within an organization. The stored data is both historical and current, and supports analytical reporting, executive dashboards, “self service BI”, and data science. Many organizations don’t have a clear understanding of how a data warehouse is different from the various transactional databases that power things like CRM, ERP, Accounting, and other core infrastructure within an organization’s data landscape. The purpose of this post is to help the reader understand the differences between a data warehouse versus a traditional, transactional database.

The primary difference between a data warehouse and a transactional database is that the underlying table structures for a transactional database are designed for fast and efficient data inserts and updates (it’s all about getting data into the database). For a data warehouse, the underlying table structures are designed for efficient data reads, allowing users to develop simple and powerful queries for retrieving data for analytical purposes.

The rest of this post goes into greater detail on the differences between these two database designs.?

Database Architecture - 3NF vs. Dimensional Modeling

Transactional databases are designed and built using a data modeling technique called third normal form (3NF). The purpose of using this technique is to build a database that is optimized for the transactional applications that are running on top of the database. The overarching goals of this design technique is to reduce data duplication/redundancy and follows a database design consisting of a set of atomic tables that are optimized for fast inserts and updates.The disadvantage of this design technique is that creating a database of smaller, normalized tables means that when you want to start performing analytics and business intelligence, you experience poor query performance because you have to join a bunch of smaller tables to answer the questions you are seeking to address. The overhead of these “joins” greatly impacts database performance and results in long query times and sometimes the queries timeout all together and don’t finish. The other disadvantage with this design technique is that your database overwrites previous values with the most current version of a transaction, and thus it is hard to track changes over time which is important for analytics. As mentioned earlier, companies without data warehouses often attempt to perform analytics and business intelligence on copies of their transactional databases. The poor query performance, the lack of historical perspective, and the lack of integrated datasets across multiple applications and data sources is the primary reason that companies adopt the use of a data warehouse for powering their analytics and providing a single version of truth within the organization.

In contrast to 3NF data modeling, Data Warehouses use a data modeling concept called Dimensional Modeling. This design technique calls for organizing your data into facts and dimensions. Fact tables record measurements or metrics for a specific event (i.e. sales). Dimension tables contain the data by which you would slice a fact table. For example you would slice a fact table containing sales numbers by Region, Time, Product, Employee, etc. In this example Region, Time, Product, and Employee are all Dimensions.? Organizing the data in this manner allows for business users to easily slice and dice data for analytical purposes and helps support organizational goals such as enabling business users with “self service BI” capabilities. Organizing data in this manner is critical for unlocking the power of BI tools such as Tableau, Looker, and Microsoft BI.

Summary

??? Having a data warehouse is a critical component of a modern analytics environment for an organization. It is different from existing transaction database systems in that it is organized for integrated reporting across ALL of your transactional systems and data sources. A data warehouse is designed using a different database modeling technique referred to as Dimensional Modeling. Application developers are typically more focused on third normal form modeling which is why it is important to have a Data Warehouse Architect who is skilled in Dimensional Modeling to design and develop your data warehouse. In doing so, the organization ensures that it has a solid foundation for realizing the full value of various BI tools such as Tableau, Microsoft BI, Looker, as well as the ability to produce clean, integrated data for data science purposes.

???

Jim Frayer

Enterprise Data Architect | Certified Data Vault 2.0 Practitioner with years of experience leading and coaching data architecture teams to deliver scalable, business-aligned data solutions across industries.

3 年

Your Dimensional modeling concept or methodology is outdated and can have many detrimental effects on getting the client an efficient BI platform. Have you investigated Data Vault 2.0? It solves the brittleness that dimensional modeling runs into when the data warehouse grows to a point where SCD can no longer be maintained.

回复
Srivathsan S.

Threat, identity and behavior analytics using Stats/ML/Graphs

3 年

A new thing is coming up, called LakeHouse which borrows the best of DataLake (stores all kinds of data - structured, semi, un-..) and Datawarehouse (for precise, structured data, querying etc).

要查看或添加评论,请登录

Matt Brown的更多文章

  • How to Improve Teamwork Under Pressure

    How to Improve Teamwork Under Pressure

    Stress is the rule, not the exception in the American workplace. Ninety-four percent of American workers feel stressed…

    3 条评论
  • Build Your A-Team With These Tips

    Build Your A-Team With These Tips

    Whether you are trying to assemble the perfect team for your own business or within your organization, it can be hard…

    1 条评论
  • 5 Things to Consider When Choosing a Mentor

    5 Things to Consider When Choosing a Mentor

    Eighty percent of CEOs say they had mentors throughout their careers, and studies show that good mentoring leads to…

    1 条评论
  • Building A Personal Brand Beyond Social Media

    Building A Personal Brand Beyond Social Media

    Before the age of social media, personal branding wasn’t a thing — at least not for the masses. Your reputation was…

    1 条评论
  • The Link Between Curiosity and Success

    The Link Between Curiosity and Success

    “What qualities do you see most often in those who succeed?” Adam Bryant asked over 700 CEOs he interviewed for his…

    2 条评论
  • Embedded Analytics for Fun and Profit

    Embedded Analytics for Fun and Profit

    Embedded analytics refers to the practice of bolting on a reporting or analytics package into an existing application…

  • Excel - The Dirty Little Secret

    Excel - The Dirty Little Secret

    In spite of ALL the hype in the data analytics space, the dirty little secret is that Excel remains the most powerful…

  • FiveTran

    FiveTran

    New Entrants to the Data Integration Market The last 10 years have seen a proliferation of new data integration tools…

  • Better Reporting and Analytics from Credit Card Processors

    Better Reporting and Analytics from Credit Card Processors

    Over the past few years we have worked with two clients in particular who offer payment processing services to their…

  • JasperSoft and Talend for Embedded Analytics

    JasperSoft and Talend for Embedded Analytics

    Background: A leading financial services SaaS vendor recently contacted us to solve a critical problem for them. Their…

    3 条评论

社区洞察

其他会员也浏览了