登录查看更多内容

Forrester changed the way they think about data catalogs. Here’s what you need to know.

Prukalpa ?

Co-Founder at Atlan –?Home for Data Teams | Forbes30 & Fortune40 lists | TED Speaker

发布日期: 2022年10月7日

As we?predicted ?at the beginning of this year, metadata is hot in 2022 —?and it’s only getting hotter. But this isn’t the old-school idea of metadata we all know and hate.

The data industry is in the middle of a fundamental shift in how we think about metadata. Now, in the latest sign of this shift, Forrester scrapped its Wave report on “Machine Learning Data Catalogs” to make way for one on “Enterprise Data Catalogs for DataOps”.

Here’s what you need to know about where this change came from, why it happened, and what it means for modern metadata.

??Spotlight: What are Enterprise Data Catalogs for DataOps, and why should you care?

One of the biggest challenges with Data Catalog 2.0 was adoption — no matter how it was set up, companies found that people rarely used their expensive data catalog. For a while, the data world thought that machine learning was the solution. That’s why, until recently, Forrester’s reports focused on evaluating machine learning data catalogs.

However, in early 2022, Forrester dropped machine learning in its?Now Tech report . It explained that even as ML-based systems became ubiquitous, the problems they were meant to solve persisted. Although machine learning allowed data architects to get a clearer picture of the data within their organization, it didn’t fully address modern challenges around data management and provisioning.

The key change —?”Data engineers need a data catalog that does more than generate a wiki about data and metadata”. Instead, data teams need a catalog built to enable DataOps. This requires in-depth information about and control over their data to “build data-driven applications and address data flow and performance”.

So what actually is an enterprise data catalog for DataOps (EDC)? According to?Forrester , “[enterprise] data catalogs create data transparency and enable data engineers to implement DataOps activities that develop, coordinate, and orchestrate the provisioning of data policies and controls and manage the data and analytics product portfolio.”

There are?three key ideas ?that distinguish EDCs from the earlier Machine Learning Data Catalogs.

Handles the diversity and granularity of modern data and metadata

Today a company’s data isn’t just simple tables and charts. It’s a wide range of data products and associated assets, such as databases, pipelines, services, policies, code, and models —?each with its own metadata. EDCs are built for this complex portfolio of data and metadata.

Rather than just storing a “wiki” of this data, EDCs act as a “system of record” to automatically capture and manage all of a company’s data through the data product lifecycle. This includes syncing context and enabling delivery across data engineers, data scientists, and application developers.

Provides deep transparency into data flow and delivery

A key idea in DataOps is CI/CD, a software engineering principle to improve collaboration, productivity, and speed through continuous integration and delivery. For data, implementing CI/CD practices rely on understanding exactly how data is moved and transformed across the company.

EDCs provide granular data visibility and governance with features like column-level lineage, impact analysis, root cause analysis, and data policy compliance. These should be programmatic, rather than manual, with automated flags, alerts, and/or suggestions to help users keep on top of complex, fast-moving data flows.

领英推荐

2024 Big Data Trends

ACI INFOTECH 9 个月前

The Evolving Landscape of Data Analytics: Comparing…

Quadrant Technologies 2 个月前

One of the best Big Data Consultancy and Service…

Plain Concepts 1 年前

Designed around modern DataOps and engineering best practices

With data growing beyond the IT team, data engineering tools can no longer just focus on the data warehouse and lake. DataOps merges the best practices and learnings from the data and developer worlds to help diverse data people work together better.

EDCs are a critical way to connect the “data and developer environments”. Features like bidirectional communication, collaboration, and two-way workflows lead to simpler, faster data delivery across teams and functions.

The future of metadata is active ??

All of these ideas — from Forrester’s championing data catalogs for DataOps to Gartner scrapping its?Magic Quadrant for Metadata Solutions ?— point to the importance of?active metadata. We first wrote about?this idea ?in January 2021, and we’ve seen it explode since then.

From DataOps to the data mesh, modern data concepts are fundamentally based on being able to collect, store, and analyze metadata. However, data catalogs lagged behind for years, acting as static, siloed systems in a world of fast-moving, interconnected data. In a world where metadata is approaching “big data” and it is critical for a range of modern use cases, the standard way of storing metadata is no longer enough. As Forrester said, we need more than a wiki for our data.

The solution is “active metadata”, which is a key component of modern data catalogs. Instead of just collecting metadata from the rest of the data stack and bringing it back into a passive data catalog, active metadata makes a two-way movement of metadata possible. It sends enriched metadata and unified context back into every tool in the data stack, and enables powerful programmatic use cases through automation.

Here are a few examples of what active metadata looks like in action:

Purge stale or unused assets: Use active metadata to periodically calculate when each data asset was last used and how many people used it, and then flag or purge neglected assets.
Allocate compute resources dynamically: Imagine that 90% of users log in to a BI tool during the last week of a financial quarter —?automatically scale up compute resources just before that week and scale them down again afterward.
Enrich user experience in BI tools: Instead of making business users switch between a BI tool and data catalog, push important metadata (like business terms, descriptions, owners, and lineage) directly into the BI tool.
Notify downstream consumers: Check data pipelines for issues when a data store changes and notify downstream data users about potential breaking changes (e.g. the addition or removal of a column).

Learn more about active metadata here. ???

???More from my reading list

From Business Problem To Data Science Experiment ?by Vin Vashishta
What is Data Engineering Part 1 ?and?Part 2 ?by Gergely Orosz
What We Are Missing in Data CI/CD Pipelines? ?by Ivan
Why Does Self-Service BI Fail and What Could Enterprises Do to Turn the Tide? ?by Anh Tran
What Open Source Can Do For Your Data Career ?by Mehdi Ouazza

I’ve also added some more resources to my data stack reading list. If you haven’t checked out the list yet, you can find and bookmark it?here .

See you next week!

P.S. Liked reading this edition of the newsletter? We'd love it if you could take a moment and share it with your friends on social.

Metadata Weekly

9,154 位关注者

Nathan Greenhut

Head of Strategic Accounts & Solutions Experienced in data, AI, ML, MLOps and Gen AI. Strong at sales director, delivery, client partner, business development, technology strategy, CIO, CTO, CDAO and CDO functions.

2 年

This change makes a lot of sense. Thank you Prukalpa ? for sharing. Enterprise data catalogues and intelligent metadata help companies to streamline and gain productivity and insights quicker. There is always a need for this and a thirst for this no matter what company and size from what I have seen over the past 20 or more years. I don’t think this will change. I think the difference going forward is how quickly companies can adapt to change, given environment, economic and global political pressures speeding up their cycles of ups and downs. Your work at Atlan is impressive and I highly commend you and your team’s efforts and success!

1 次回应

要查看或添加评论，请登录

查看全部

Forrester changed the way they think about data catalogs. Here’s what you need to know.

Prukalpa ?

Co-Founder at Atlan –?Home for Data Teams | Forbes30 & Fortune40 lists | TED Speaker

??Spotlight: What are Enterprise Data Catalogs for DataOps, and why should you care?

Handles the diversity and granularity of modern data and metadata

Provides deep transparency into data flow and delivery

领英推荐

Designed around modern DataOps and engineering best practices

The future of metadata is active ??

???More from my reading list

Metadata Weekly

9,154 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Spotlight on Scalability: How PromptCloud Handles Your Growing Data Needs - In Conversation with Data Engineer Lead

Why 2022 Will Be the Year of Data Observability

Come Hell or High Water: Some Lessons from Four Years of Data Mesh Implementations Learned the Hard Way: Lesson One

The anatomy of an active metadata platform, bringing data analysts to the table, mapping data journey with column lineage, and more

Data Mesh, Data as a Product, and Active Metadata

Data Transformation 101 - Unlock the True Potential of Your Data with Microsoft Fabric

Charting the Data Landscape: A Tale of Technological Triumphs and Trials

Trusted Data Foundations Are Key to Unlocking GenAI

Analytics and Data Science News for the Week of September 20; Updates from Firebolt, Qrvey, Teradata & More

Data Science

??Spotlight: What are Enterprise Data Catalogs for DataOps, and why should you care?

Handles the diversity and granularity of modern data and metadata

Provides deep transparency into data flow and delivery

领英推荐

Designed around modern DataOps and engineering best practices

The future of metadata is active ??

???More from my reading list

Metadata Weekly

9,154 位关注者

How to craft the ultimate business case for data governance - Part 2

2024年11月1日

How to craft the ultimate business case for data governance - Part 1

2024年9月12日

How companies are making Forrester’s idea of modern data cataloging a reality

2024年8月30日

What the recent Forrester Wave means for data catalogs

2024年8月14日

The War of the Catalogs

2024年8月2日

3-step framework for scaling data quality in the age of generative AI

2024年7月18日

4 practical lessons from data governance leaders at Dropbox, General Motors, and Patagonia

2024年5月30日

Why data governance fails in today’s AI world

2024年5月13日

A Shared Language for Enterprise Data ?

2023年8月4日

Modernizing Data Stack ?

2023年6月29日

社区洞察

其他会员也浏览了

Spotlight on Scalability: How PromptCloud Handles Your Growing Data Needs - In Conversation with Data Engineer Lead

Why 2022 Will Be the Year of Data Observability

Come Hell or High Water: Some Lessons from Four Years of Data Mesh Implementations Learned the Hard Way: Lesson One

The anatomy of an active metadata platform, bringing data analysts to the table, mapping data journey with column lineage, and more

Data Mesh, Data as a Product, and Active Metadata

Data Transformation 101 - Unlock the True Potential of Your Data with Microsoft Fabric

Charting the Data Landscape: A Tale of Technological Triumphs and Trials

Trusted Data Foundations Are Key to Unlocking GenAI

Analytics and Data Science News for the Week of September 20; Updates from Firebolt, Qrvey, Teradata & More

Data Science