登录查看更多内容

Database Vs Data Warehouse Vs Data Lake

Utkarsh Sharma

SME & Manager | SAP Certified Application Associate | Certified Data Scientist | Intel certified Machine Learning Instructor| Mentor

发布日期: 2022年2月17日

In this article, we are going to discuss the difference between databases, data warehouses, and data lakes. So, to need to understand the difference between data organizations one should know the difference b/w structured and unstructured data.

In simple words, structured data is a type of data that has a known schema and also has a fixed neat structure, and most importantly could be fit in a fixed field table, for example, data stored in Excel files. On the other hand, unstructured data has no fixed schema or structure. Let’s take an example of a newsletter, which is having images along with the text. So, to store such kind of data, it becomes difficult for the traditional DBMS to accommodate it in a fixed schema structure.

So, what's the database then? databases are typically structured data storage with a defined schema. In a database, items are organized as a set of tables with columns and rows. Where a column represents the attribute of the object, and a row contains the entire property set of an object. Examples of a database are Mysql, oracle, PostgreSQL. Databases are designed to store transactional data which may or may not have any analytical importance. The Databases are used by the organizations which need to store only the frequent transactional data. A data warehouse in contrast to a database designed for analytical purposes. A data warehouse exists on top of several databases and uses data from all these databases and creates a unified schema to perform data analytics.

A Datawarehouse transforms the data collected from several databases and keeps only that information which is crucial for data analysis. The main design of a data warehouse revolves around the management's decision-making facilitation. Data in a Datawarehouse is carefully related to all of the other data in the data warehouse. In addition, data in a data warehouse tends to be highly standardized and cleaned.?

领英推荐

Data Warehousing 101: Tracing its Evolution to the…

Ankur G. 1 年前

Data Warehouse vs Data Lake vs Data Lakehouse: What's…

Devendra Goyal 4 个月前

Data Lake Architecture [4 out of 10]

Mahmoud Yassin 2 年前

A data lake is a centralized repository for structured and unstructured data storage. The main use of data lake originated just because of the increase in the generation of unstructured data through big data applications. We can’t store unstructured data in a data warehouse because in a Datawarehouse we need a unified structure for efficient data analysis. Data lakes maintain the data in its raw format until and unless the data is not required for use. There is no need to perform any transformation prior to storing the data in a data lake. Processing can be done on export so that schema is defined on reading.

So, the decision on which service you should use totally depends on your need for data storage. If your need is to just store the daily transactional data with little analysis, then go for a DBMS. If your need is to serve the only analytical purpose, then opt for a Datawarehouse and if you require to perform analytical operations on unstructured data then your solution is a Data lake.

Ravindra Pawar

3 年

Thanks for sharing, very informative

1 次回应

要查看或添加评论，请登录

Utkarsh Sharma的更多文章

reCAPTCHA: The Turing Test We Use Daily

2023年9月20日

reCAPTCHA: The Turing Test We Use Daily

It is amazing that we use some things so frequently that we forget to understand the mechanism behind them, like for…
Enable Machines to Feel: Sentiment Analysis

2022年5月5日

Enable Machines to Feel: Sentiment Analysis

Have you ever got a text from someone and couldn't tell if they were kidding or not? Unless we clearly tell the person…
Introduction to Time Series Analysis

2022年4月28日

Introduction to Time Series Analysis

Time series is a sequence of data points organized in time order. Forecast of data by analyzing time-based data is Time…

1 条评论
Dimensionality Reduction by PCA using Orange

2022年4月21日

Dimensionality Reduction by PCA using Orange

The curse of dimensionality haunts every data scientist dealing with a dataset containing a large number of attributes.…

1 条评论
Model Drift in Machine Learning

2022年4月14日

Model Drift in Machine Learning

“Change is the only constant in life.”- Heraclitus (Greek philosopher).
Principal Component Analysis????

2022年4月1日

Principal Component Analysis????

What is PCA? Principal Component Analysis, or PCA, is a dimensionality-reduction method that is often used to reduce…

3 条评论
Curse of Dimensionality

2022年3月17日

Curse of Dimensionality

Yes, data scientists and the data handling community do suffer from this well-known curse. So, is it really a curse or…
Market Basket Analysis:- What will I buy next?

2022年3月10日

Market Basket Analysis:- What will I buy next?

Have you ever wondered, while entering a shopping store that how they organize or stack the things in a particular…
What do Data Engineer Do?

2022年3月3日

What do Data Engineer Do?

So, to define it very shortly a data engineer is that person who is responsible to collect the data from various…

4 条评论
A beginner’s Guide to data mining : RapidMiner

2022年2月24日

A beginner’s Guide to data mining : RapidMiner

RapidMiner studio is a data science and data mining platform that lets users extract transform and load data to draw…

See all articles

Database Vs Data Warehouse Vs Data Lake

Utkarsh Sharma

SME & Manager | SAP Certified Application Associate | Certified Data Scientist | Intel certified Machine Learning Instructor| Mentor

领英推荐

Utkarsh Sharma的更多文章

社区洞察

其他会员也浏览了

Data Platforms - An Outlook

Understanding the Differences Between Snowflake and Star Schema in the Data Engineering Universe

Importance of partitioning in Data-intensive Analytics Solution Design

What is a Data Cube in a Data warehouse?

Understanding Data Storage Solutions: Data Lake, Data Warehouse, and Data Mart Explained

Understanding the Differences Between Databases, Data Warehouses, Data Lakes, and Lakehouses

Data Mesh Simplified

A Comprehensive Approach to Designing Data Architectures for Semi-Structured Data

Data is the New Oil: How to Incorporate Unstructured Data into Your Business

Architecting Data Warehousing Solutions with Azure SQL Data Warehouse

领英推荐

Utkarsh Sharma的更多文章

reCAPTCHA: The Turing Test We Use Daily

Enable Machines to Feel: Sentiment Analysis

Introduction to Time Series Analysis

Dimensionality Reduction by PCA using Orange

Model Drift in Machine Learning

Principal Component Analysis????

Curse of Dimensionality

Market Basket Analysis:- What will I buy next?

What do Data Engineer Do?

A beginner’s Guide to data mining : RapidMiner

社区洞察

其他会员也浏览了

Data Platforms - An Outlook

Understanding the Differences Between Snowflake and Star Schema in the Data Engineering Universe

Importance of partitioning in Data-intensive Analytics Solution Design

What is a Data Cube in a Data warehouse?

Understanding Data Storage Solutions: Data Lake, Data Warehouse, and Data Mart Explained

Understanding the Differences Between Databases, Data Warehouses, Data Lakes, and Lakehouses

Data Mesh Simplified

A Comprehensive Approach to Designing Data Architectures for Semi-Structured Data

Data is the New Oil: How to Incorporate Unstructured Data into Your Business

Architecting Data Warehousing Solutions with Azure SQL Data Warehouse