登录查看更多内容

What is a Data Fabric?

Richard Schreiber, MPA

Operations Research Analyst @ GSA | Data Science, Federal Contracting

发布日期: 2021年11月12日

Data Fabric is a data strategy and management concept. It is less tangible than a Data Lake or Data Warehouse. It is a system that makes data storage, extraction, access, and analysis more efficient. A Data Fabric integrates an organization's ingestion and storage processes, adds machine learning to improve its performance and insight gathering, and delivers everything for easy access and consumption.

Background

First, you have to understand what a Data Lake and Data Warehouse are. These two are ways to store big data. (Industry characterizes Big Data by large volumes, frequencies, and sources.) However, they have significant differences. The most notable difference is their data structure. Lakes are for raw data, and Warehouses are for processed data (Learn more about Lakes and Warehouses here ).

Then you have to understand the concept of metadata . Meta is greek for "about the thing itself." Therefore, metadata describes data. Metadata is the names, sources, types, and sizes of the data files. The title of a book is another example. A title describes what is within.

The Problem?

The underlying need for a Data Fabric strategy derives from the innate characteristic of data to be "dirty."

dirty data , data, data and analytics, data science, data fabric, big data

Dirty Data is hard to manage. Many organizations aren't prepared for how dirty data can get when they expand, add more data sources, and adopt more products. They end up filling their data lakes and warehouses with data that has errors, varying input types, and null values. They turn their Data Lake into the dreaded... Data "Swamp."

Additionally, data can be difficult to access. There can be many gates, logins, and permissions needed to merely touch the sweet sweet data. One thing I hate is when I need a different login for every application. So many passwords!

Finally, it is time-intensive to analyze all the various structures, forms, and systems that an organization might have around its data storage. The time investments are massive, and the likelihood for human errors is high with such a complex problem as Big Data management.

The Solution

Enter from stage left, Data Fabrics. I see it as a two-pronged approach.?

First, you must create your systems and standard operating procedures. This includes creating a naming system, ingestion process, cleaning process, and universal storage rules for the metadata that enters your data lake. Having organized data will lead to identifiable habits, workflows, and outcomes and make life a lot easier.

The second prong is adding machine learning to the management system. You've gone in and manually laid the road. Now, you can let a machine drive down it. Look at your processes, automate as much as possible, and apply models that predict decisions on storing and analyzing ingested data.

The goal is to build machines that can discover new insights from your metadata.

Andre Ripla PgCert 1 个月前

Warping through Data pipelines

Mathias Halkj?r Petersen 1 年前

An Approach to Architecting a Lower Cost, Fast and…

Alex Merced 1 年前

There may be correlations that your organization hadn't noticed. Proper data fabric will recognize these correlations and alert you to the insights for decision-making. The machine learning threads will ensure your organization consistently enhances and empowers its analytical capabilities and professionals.

What you have now is an automated, continuously improving, and manageable system to capture, clean, store, and access big data from any point in the organization. What a mouthful! But that mouthful is exactly what a Data Fabric is.

It's called a Data "Fabric" because it integrates all the parts of a data ecosystem like woven fabric. Every piece and every person connects through threads that span the entire system.

The Data Journey

Flowcharts and process maps help me understand complex subjects better. The one below from TIBCO helped me visualize how a data fabric works and hope it helps you too.

A diagram describing where data fabric plays a role in the life cycle of data. It shows data sources funneling their data through the layers of the data fabric and ending in the applications and products given to the consumers.

Below are the stages that I believe data goes through when your organization uses a data fabric. These could be more specific but suffice for basic comprehension.

Data sources send/create data?
The data is filed according to the rules of the metadata catalog
The data is ingested into the management and storage systems.
While in the system, built-in ML and AI can identify new insights.
The data is delivered to end-users in whatever form, application, or access point appropriate for their job function and technical level.

(For more on how to build a Data Fabric click here)

Quotes and Helpful Links

IBM

"A data fabric is a data management architecture that can optimize access to distributed data and intelligently curate and orchestrate it for self-service delivery to data consumers. It automates data discovery, governance, and consumption, delivering business-ready data for analytics and AI."

Gartner

"Gartner defines data fabric as a design concept that serves as an integrated layer (fabric) of data and connecting processes. A data fabric utilizes continuous analytics over existing, discoverable, and inferenced metadata assets to support the design, deployment, and utilization of integrated and reusable data across all environments, including hybrid and multi-cloud platforms."

What is a Data Fabric?

Richard Schreiber, MPA

Operations Research Analyst @ GSA | Data Science, Federal Contracting

Background

The Problem?

The Solution

领英推荐

The Data Journey

Quotes and Helpful Links

更多精彩文章

社区洞察

其他会员也浏览了

Understanding Data Mesh: A Modern Approach to Data Architecture

What is an effective way to handle Big Data?

Efficiency Redefined: Pinecone Launches Serverless Vector Database!

Data Management: Knit a Fabric or Mesh Around?

Harnessing Microsoft Fabric: Unifying Data Management with One Lake

Overcoming Difficulties in Modern Big Data Analysis for Business: Strategies and Implications

Decentralizing Data: From Data Monolith to Data Mesh with Zhamak Dehghani

Navigating Data Mesh and Evolving Data Governance: A Practical Guide

What Is the Modern Data Stack and Why Should You Be Excited About It?

A serious word about Data Democratization

Background

The Problem?

The Solution

领英推荐

The Data Journey

Quotes and Helpful Links

2021 Investment Portfolio and Future Picks

2021年2月13日

社区洞察

其他会员也浏览了

Understanding Data Mesh: A Modern Approach to Data Architecture

What is an effective way to handle Big Data?

Efficiency Redefined: Pinecone Launches Serverless Vector Database!

Data Management: Knit a Fabric or Mesh Around?

Harnessing Microsoft Fabric: Unifying Data Management with One Lake

Overcoming Difficulties in Modern Big Data Analysis for Business: Strategies and Implications

Decentralizing Data: From Data Monolith to Data Mesh with Zhamak Dehghani

Navigating Data Mesh and Evolving Data Governance: A Practical Guide

What Is the Modern Data Stack and Why Should You Be Excited About It?

A serious word about Data Democratization