What is a Data Fabric?

What is a Data Fabric?

Data Fabric is a data strategy and management concept. It is less tangible than a Data Lake or Data Warehouse. It is a system that makes data storage, extraction, access, and analysis more efficient. A Data Fabric integrates an organization's ingestion and storage processes, adds machine learning to improve its performance and insight gathering, and delivers everything for easy access and consumption.

Background

First, you have to understand what a Data Lake and Data Warehouse are. These two are ways to store big data. (Industry characterizes Big Data by large volumes, frequencies, and sources.) However, they have significant differences. The most notable difference is their data structure. Lakes are for raw data, and Warehouses are for processed data (Learn more about Lakes and Warehouses here ).

Then you have to understand the concept of metadata . Meta is greek for "about the thing itself." Therefore, metadata describes data. Metadata is the names, sources, types, and sizes of the data files. The title of a book is another example. A title describes what is within.

The Problem?

The underlying need for a Data Fabric strategy derives from the innate characteristic of data to be "dirty."

dirty data , data, data and analytics, data science, data fabric, big data

Dirty Data is hard to manage. Many organizations aren't prepared for how dirty data can get when they expand, add more data sources, and adopt more products. They end up filling their data lakes and warehouses with data that has errors, varying input types, and null values. They turn their Data Lake into the dreaded... Data "Swamp."

Additionally, data can be difficult to access. There can be many gates, logins, and permissions needed to merely touch the sweet sweet data. One thing I hate is when I need a different login for every application. So many passwords!

Finally, it is time-intensive to analyze all the various structures, forms, and systems that an organization might have around its data storage. The time investments are massive, and the likelihood for human errors is high with such a complex problem as Big Data management.

The Solution

Enter from stage left, Data Fabrics. I see it as a two-pronged approach.?

First, you must create your systems and standard operating procedures. This includes creating a naming system, ingestion process, cleaning process, and universal storage rules for the metadata that enters your data lake. Having organized data will lead to identifiable habits, workflows, and outcomes and make life a lot easier.

The second prong is adding machine learning to the management system. You've gone in and manually laid the road. Now, you can let a machine drive down it. Look at your processes, automate as much as possible, and apply models that predict decisions on storing and analyzing ingested data.

The goal is to build machines that can discover new insights from your metadata.

There may be correlations that your organization hadn't noticed. Proper data fabric will recognize these correlations and alert you to the insights for decision-making. The machine learning threads will ensure your organization consistently enhances and empowers its analytical capabilities and professionals.

What you have now is an automated, continuously improving, and manageable system to capture, clean, store, and access big data from any point in the organization. What a mouthful! But that mouthful is exactly what a Data Fabric is.

It's called a Data "Fabric" because it integrates all the parts of a data ecosystem like woven fabric. Every piece and every person connects through threads that span the entire system.

The Data Journey

Flowcharts and process maps help me understand complex subjects better. The one below from TIBCO helped me visualize how a data fabric works and hope it helps you too.

A diagram describing where data fabric plays a role in the life cycle of data. It shows data sources funneling their data through the layers of the data fabric and ending in the applications and products given to the consumers.

Below are the stages that I believe data goes through when your organization uses a data fabric. These could be more specific but suffice for basic comprehension.

  1. Data sources send/create data?
  2. The data is filed according to the rules of the metadata catalog
  3. The data is ingested into the management and storage systems.
  4. While in the system, built-in ML and AI can identify new insights.
  5. The data is delivered to end-users in whatever form, application, or access point appropriate for their job function and technical level.

(For more on how to build a Data Fabric click here)

Quotes and Helpful Links

IBM

"A data fabric is a data management architecture that can optimize access to distributed data and intelligently curate and orchestrate it for self-service delivery to data consumers. It automates data discovery, governance, and consumption, delivering business-ready data for analytics and AI."

Gartner

"Gartner defines data fabric as a design concept that serves as an integrated layer (fabric) of data and connecting processes. A data fabric utilizes continuous analytics over existing, discoverable, and inferenced metadata assets to support the design, deployment, and utilization of integrated and reusable data across all environments, including hybrid and multi-cloud platforms."


Jon Aguirre

Texas A&M | Honorary Member of ASH | Engineer Mentioned in Academia.edu | Dream Big Award Start Up Semi finalist | IT Management | Phase Separation Researcher

10 个月

Insightful and easy to follow along! Good article.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了