What the recent Forrester Wave means for data catalogs

A massive transformation — data cataloging now includes governance, quality, security, monitoring, and more


Quick announcement: Metadata Weekly now has over 11,000 subscribers across Substack and LinkedIn! I’m so thankful for all of you who read and support this newsletter, and I’m excited to keep writing about all things data and metadata. ???


In the last issue, I talked about why data catalogs are falling short today —?in short, the modern data ecosystem and its users are more diverse than ever before, and metadata is itself evolving into big data. Whether they’re technical or universal, even the best data catalogs just can’t keep up, and companies still end up with widespread confusion and silos.

I’m not trying to be a Captain Negative, so what’s the solution? This is where some major news comes into play.

The Forrester Wave?: Enterprise Data Catalogs, Q3 2024 was just released. In this report, Forrester examined today’s most significant catalogs and emerged with plenty of thoughts about what it means to be a modern data catalog.?

In today’s issue, let’s examine what we believe this Forrester Wave means for not just data cataloging, but also the data governance, quality, security, and observability categories.?


?? A “massive transformation” in the data cataloging space

Most data people have firsthand experience with the “data wiki”, a data catalog that aims to inventory and document all of a company’s data. It’s expensive to buy, slow to set up, a pain to populate… and ultimately people just don’t want to use it.?

For the last few years, analysts have focused on what to add to these traditional data catalogs to make them successful. Forrester talked about machine learning data catalogs in 2018 and 2020, then focused on data catalogs for DataOps in 2022. Meanwhile, Gartner moved from traditional “metadata management” in 2020 to focusing on active metadata in 2022.

And yet, none of these additions seem to have fixed the problem with data catalogs. That’s why Forrester just announced a major transformation in the way it thinks about Enterprise Data Catalogs.

“Like other data management sectors, enterprise data catalogs (EDCs) are witnessing a transformation driven by AI advancements, fragmented and complex data estates, accessibility needs, and strategic imperatives to harness data for competitive advantage. The exponential surge in the velocity, variety, veracity, and volume of data demands solutions that transcend traditional metadata repositories and technical user bases. Customers seek solutions that can bridge the gap between complex datasets, governance, business insights, and AI enablement. Vendors are offering intelligent solutions, integrated AI and ML to automate and enhance data discovery, semantics curation, impact analysis, quality assessment, among other catalog functionalities. They are also improving user experiences to cater to both technical and nontechnical users, thereby supporting the goal of data democratization and self- service.” – The Forrester Wave?: Enterprise Data Catalogs, Q3 2024 (emphasis added)

Let me highlight that: Forrester said that EDCs today need to “transcend traditional metadata repositories and technical user bases”. In other words, catalogs can’t just be data wikis for technical data people any more.?

So what should a modern data catalog look like??

First, Forrester talked about how basic cataloging is no longer enough. Instead, EDCs need to automatically catalog, analyze, and govern your entire data ecosystem, from traditional databases to SaaS platforms, unstructured data, AI/ML repositories, and more.

“Advanced solutions offer features like automated metadata harvesting, cross-platform semantic mapping, policy enforcement, quality validation, and end-to-end lineage. This holistic approach ensures a complete view of all data assets, including AI/ML models, to enhance governance, compliance, and use across the organization.”

Second, this holistic approach can’t be powered by data stewards doing manual work. Instead, AI and automation are key to quickly rolling out catalogs and creating value with them. Note that this isn’t just about cataloging — it’s also about powering data governance and quality efforts, all within the catalog rather than in separate governance and quality tools.

“Modern solutions… offer advanced capabilities, including AI-assisted data discovery, generative AI (genAI) augmentation, ML-driven profiling, automated anomaly detection, predictive tagging, and proactive compliance reporting. These technologies are crucial for streamlining data governance, enhancing data quality, and unlocking actionable insights.”

Forrester then evaluated various cataloging tools based on what they deemed to be the key capabilities of a modern EDC. But instead of focusing on the standard aspects of a data catalog (e.g. metadata management, data discovery, data lineage), they also expected capabilities from what we often think of as separate spaces and tools —?e.g. data governance, security, privacy, etc. Here’s the list of evaluation criteria under “Current Offering” (emphasis is my own):?

  1. Metadata management
  2. Data products
  3. Data discovery and profiling
  4. Data lineage
  5. Governance, risk, and compliance
  6. User interface and user experience (UI/UX)
  7. Deployment and time to value
  8. Data quality and observability
  9. Monitoring and alerts
  10. Data privacy and security
  11. Workflow and task management
  12. Integration
  13. Collaboration capabilities
  14. Marketplace and exchange
  15. Real time, IoT, and edge
  16. Advanced capabilities

In short, Forrester is drawing a line in the sand, arguing that we are now witnessing a “transformation” in the data cataloging space, driven by GenAI, fragmented data estates, diverse user needs, and business-critical use cases. As a result, the best data catalogs can’t just be catalogs anymore. Instead, they should use AI and automation to take over other metadata-driven capabilities like data governance, security, observability, and monitoring.

This is a huge shift but I think it’s ultimately a good one. The data space is incredibly fragmented these days, so if we can merge several different spaces and tools into one, it’s ultimately better for users.?

I personally think of this new idea of the EDC, the catalog that’s more than just a catalog, as a unified control plane —?a comprehensive layer that can manage context, governance, and compliance across diverse tools and for diverse users.?


?? Recognition of the impact customers have with Atlan

Not to bury the lead but… we were named a Leader in the Forrester Wave?: Enterprise Data Catalogs, Q3 2024, with the highest scores across all vendors in the “Current Offering” and “Strategy” categories!?

Click here to view a complementary copy of the report, including the Wave graphic.

Atlan got the highest score possible in 15 criteria, including Data lineage; Governance, risk, and compliance (where we were the only company to score a 5/5); Adoption; and Deployment and time to value. The report? recognized us as "an unparalleled partner” for organizations “aiming for democratization and AI- enhanced self-service to governed data”.

“Atlan differentiates itself with a personalized, AI-driven catalog, providing quick value… Atlan’s Third-Gen Data Catalog is quickly outpacing established players by adeptly anticipating and addressing strategic customer needs through automation. Atlan is a visionary player with a clear, ambitious goal: to become the data and AI control plane enabling complex business use cases.”

With the highest possible scores in criteria like Vision, Innovation, and Roadmap, we’re more confident than ever about our vision of building a data and AI control plane, powered by active metadata, with complete configurability, interoperability, and openness to power every data team in every industry, however unique and complex their need.

Read more and download the full report


?? More from my reading list

Top links from last week:

Ashish Jain

Associate Director @ Capgemini Invent | Data & AI Strategy Consulting, Ex-Fractal, Infosys & Tech Mahindra

6 个月

Hi Prukalpa ?Fantastic read. Thanks for sharing key insights in the data catalog space and embedding GenAI to redefine or modernize how we approach the Data governance. I suggest also enhancing it by embedding a sustainability dimension. Are you available for a quick Zoom call to exchange ideas and share my experience in this space?

回复
Chris Higgins

Marketing Leader | B2B SaaS GTM Advisor | GTM Dialogues Podcast | USA, Australia, India

6 个月

It's exciting seeing the development of a new category, or a new way of thinking about the category.

Swaminathan Kumar

Empowering humans of data @ Atlan | Earlier: Healthcare Innovation; Political Strategy Consulting | Duke University

6 个月

I think the crux is clear -- the speed of change is dizzying. Modern data is at higher scales, more velocity, and the stack is increasingly complex! And AI has poured jet fuel on the fire! Governing this is a nighmare! And things have to change if EDCs stand a chance in truly helping data teams tame the madness. Cant keep up the old way -- not if they care about getting to actual outcomes anyway. And DEF cant do it by simply slapping *genai* on top of stuff either. Needs a fundamentally different mindset. Not slapping *automation* to a static catalog, but an automation-first catalog. Not just claiming to integrate disparate suite of products, but a truly integrated platform. Not bringing people after the fact with a UI pretty up but the same UX with 10x more clicks, but a fundamentally diff UX that is integrated "in your workflows" from day 0. Excited for this time in the Enterprise Data Catalog era! ??

Ramdas Narayanan

SVP Client Insights Analytics (Digital Data and Marketing) at Bank Of America, Data Driven Strategist, Innovation Advisory Council. Member at Vation Ventures. Opinions/Comments/Views stated in LinkedIn are solely mine.

6 个月

Thank you for sharing the insightful report I completely agree about with how AI can enhance cataloging and help find the place in the data ecosystem. Second this needs to part of the Startegy itself and not as an after thought those tendencies are still there unfortunately.

回复

要查看或添加评论,请登录

Prukalpa ?的更多文章

社区洞察

其他会员也浏览了