Data warehouse vs database

Data warehouse vs database


What’s a Database

Businesses have been using databases for almost as long as they’ve been storing data electronically. Conceptually, a database management system (DBMS) is just a way to make data accessible quickly.?

When we talk about databases, we generally mean relational database management systems (RDBMS), because relational databases have had an overwhelming share of the market for several decades. Businesses use them because storing and retrieving data from an RDBMS is faster than other alternatives.

A database is an organized collection of data. In relational databases, data is organized in tables, which group together related objects. You can think of a table as a grid with rows and columns.?

Each row is an instance of the object the table holds — a customer record, for instance, or transportation data.?

Each column is a field of information — a customer number, a name, an address, and so on.?

Tables, rows, and columns are defined by a schema, which is a definition of all the components in the database.

Databases often serve as the back end of online transaction processing applications (OLTP), or transactional databases, in which data is added, modified, and deleted one record at a time. Table data is accessed a row at a time, which means the most efficient way to store records is by row, with indexes on key fields to make it efficient to retrieve any given record.

Not all systems are transaction-based, however. Sometimes you want to see trends in data over time. To do that, you don’t need to know the values of individual records. You need aggregated information — how many sales were made, how many trips did passengers take. And, again, you want to get that information quickly.?

There’s a tool for that: a data warehouse.

What’s a Data Warehouse?

Like a database, a data warehouse has a relational structure, in that data is organized into tables, rows, and columns — but there’s one key difference.

While the data in a database is organized and stored by row, the data in a data warehouse is stored by columns, to facilitate online analytical processing (OLAP). Business intelligence consists of reports that aggregate many of the same kinds of records — purchases per month, or travel by destination and cost, for example. You don’t care about individual rows, you care about whole columns.

Data warehouses are columnar databases, which are organized, stored, and indexed according to column values, in contrast to the row-oriented storage used by databases. They use columnstore indexes, which, while complex to create, are simple to understand: They make it efficient to pull information from across all aggregated rows in a table at once for reporting.

Getting From Point A to Point B

You may wonder: If you already have your data in a database, should you duplicate it by copying it to a data warehouse? Isn’t it unwise to keep multiple copies of the same data?

But that’s not exactly what you’re doing when you maintain a data warehouse. Data in a database is updated frequently, one record at a time, and represents transactions and events in the real world. Data in a data warehouse is updated only in batches as new data comes in for analysis, and represents systems as a whole.?

If you’re comparing data warehouses vs. databases, think of it like this: Databases show the current state of a system; data warehouses can provide a historical perspective useful for data analysis.

You could think of the data in your OLTP systems as a kind of living organism. In that analogy, the data in your OLAP system, in your data warehouse, is an X-ray — a near real-time image of a particular subset of the organism.

It’s always worth using the right tool for the job. If the job is running data analytics in the most efficient manner to get results quickly, then a data warehouse is the right tool for the job.

There are other types of creatures in this data ecosystem as well, including?data lakes and data marts, as well as a newer hybrid, the?data lakehouse.

Can You Use a Database as a Data Warehouse?

Since both databases and data warehouses use relational data structures, you might consider using one where you really ought to use the other. Most data professionals become familiar with databases before they work with data warehouses, and it’s always tempting to stick with familiar tools.

In the best of all possible worlds, you’d?never use a database as a data warehouse. A row-based database simply won’t give you the performance you need for data analytics, especially when you have a high volume of data.?

However, small organizations, or developers doing prototyping, sometimes do use a database in place of a data warehouse. If you’ve done that, we’re not going to make you renounce membership in the Association of Data Analytics Professionals, but we will encourage you not to do it anymore.?

With cloud-based,?modern data warehouses?so cheap these days, and the fact that you pay only for the resources you use (storage and processor time, depending on the data warehouse you choose) there’s really no excuse not to use a data warehouse right from day one of prototyping. Amazon Redshift, Google BigQuery, Snowflake, and Microsoft Synapse are all excellent cloud data warehouses. One of them is probably right for your use case, and our?guide to enterprise data warehouses?can help you evaluate the differences.

Regardless of which cloud data warehouse you choose, you should use Fivetran to replicate data from your OLTP systems — both databases and SaaS applications — into your cloud data warehouse.?Sign up today?to try Fivetran for free.?        

要查看或添加评论,请登录

Darshika Srivastava的更多文章

  • LGD Model

    LGD Model

    Loss Given Default (LGD) models play a crucial role in credit risk measurement. These models estimate the potential…

  • CCAR ROLE

    CCAR ROLE

    What is the Opportunity? The CCAR and Capital Adequacy role will be responsible for supporting the company’s capital…

  • End User

    End User

    What Is End User? In product development, an end user (sometimes end-user)[a] is a person who ultimately uses or is…

  • METADATA

    METADATA

    WHAT IS METADATA? Often referred to as data that describes other data, metadata is structured reference data that helps…

  • SSL

    SSL

    What is SSL? SSL, or Secure Sockets Layer, is an encryption-based Internet security protocol. It was first developed by…

  • BLOATWARE

    BLOATWARE

    What is bloatware? How to identify and remove it Unwanted pre-installed software -- also known as bloatware -- has long…

  • Data Democratization

    Data Democratization

    What is Data Democratization? Unlocking the Power of Data Cultures For Businesses Data is a vital asset in today's…

  • Rooting

    Rooting

    What is Rooting? Rooting is the process by which users of Android devices can attain privileged control (known as root…

  • Data Strategy

    Data Strategy

    What is a Data Strategy? A data strategy is a long-term plan that defines the technology, processes, people, and rules…

  • Product

    Product

    What is the Definition of Product? Ask a few people that question, and their specific answers will vary, but they’ll…

社区洞察

其他会员也浏览了