Want to trust your numbers and prepare your company for AI? Start with a Data Catalog
Priscila J. Papazissis Paolinelli
Head of Data Analytics Vallourec | Qlik Luminary 2021-2025 and Educator Ambassador 2024 | Professor at PUC-MG and DataSchool | LinkedIn Top Voice | Data Culture | BI | Analytics | Gen + AI | Data Literacy | Speaker
The other day, a colleague reached out for help. She needed a sales report for a meeting and wanted to know which version was the most up-to-date. The problem? There were at least three different versions of the same data: a dashboard in the BI tool, a spreadsheet shared via email, and a report someone had exported to PowerPoint. To make matters worse, the numbers didn’t match. She spent hours trying to figure out which source was the most reliable and ultimately had to ask multiple people before finding the right answer.
If this has ever happened to you, your company is probably struggling with a lack of data visibility and data governance. The data exists, but it is scattered across different systems, lacking context, and without an easy way to determine which version is correct. This leads to wasted time, decisions based on inconsistent information, and an overreliance on key people who "know the data" in their heads. The solution to this is not creating more reports or asking the IT team to document everything manually. What truly solves this type of issue is a data catalog.
Where does a Data Catalog fit in the company's data ecosystem?
Within a company, data goes through a complete cycle: from its origin in various sources (databases, ERPs, CRMs, transactional systems) to its final consumption in dashboards and reports. A data catalog is not a storage or visualization tool, but rather a layer of intelligence and organization that connects all these components and facilitates access to information.
It sits between data sources and analytical tools, ensuring that users can find, understand, and use data correctly. In other words, it doesn’t create new data but provides context to what already exists.
In practice, a data catalog allows you to:
Without a catalog, different teams create their own versions of reports and dashboards, leading to a "data war," where different areas within the company reach conflicting conclusions because they rely on different sources.
Is your Data Catalog AI-ready?
With the growing use of artificial intelligence in businesses, data catalogs are becoming even more strategic. According to BARC (link), AI can only deliver real value when it has access to organized, well-documented, and accessible data. A well-structured data catalog not only improves governance and data discovery but is also essential for feeding AI models with trustworthy, traceable data.
Companies without a robust catalog risk using outdated or inconsistent data, directly compromising the quality of AI-generated insights. If your company is investing in AI without a well-implemented data catalog, the truth is that you might just be accelerating bad decision-making. The first step to a successful AI strategy is ensuring that data is well-organized and documented - and the catalog is the key to making that happen.
What are the most common Data Catalog tools?
The data catalog market has grown significantly in recent years, with several specialized tools available. Some of the most well-known include:
Some companies also create internal catalogs using wikis, SharePoint, or simple databases. While these may not be as powerful as dedicated tools, they can serve as a starting point for organizing data documentation.
Data Catalog vs. Data Marketplace: what's the difference?
Many people confuse data catalogs with data marketplaces, but they serve different purposes. A data catalog is an internal tool used to organize and document data within a company. It helps teams quickly find the right data, ensuring standardization, governance, and reliability. A data marketplace, on the other hand, goes beyond organization - it creates an environment where data can be shared, exchanged, or monetized, either internally between departments or externally with partners and clients. While a catalog answers the question, "Where is the data, and how can I use it?", a data marketplace answers, "How can I access or distribute this data?".
A more precise way to visualize this difference is to think of a data catalog as an internal library, where data is organized and documented for easy access and proper usage. Meanwhile, a data marketplace functions like a store, where data can be requested, made available under certain conditions, or even sold. Companies looking to maximize data utilization typically start with a catalog to structure their information before advancing to a marketplace, ensuring that shared data is trustworthy and well-documented.
How does your company handle this today? Are your data sources well-organized, or have you experienced situations similar to this? Let me know in the comments! If this newsletter was useful for you, follow my profile and share it with others who might be facing the same challenges!
Financial Services Dev Senior Manager
1 周Perfect! Thanks for sharing!
Great dad | Inspired Risk Management and Security | Cybersecurity | AI Governance & Security | Data Science & Analytics My posts and comments are my personal views and perspectives but not those of my employer
1 周Priscila, thanks for highlighting the difference between data catalogs and data marketplaces, as well as sharing your knowledge of the importance of data catalogs. IMO, data catalogs are essential for a successful data governance strategy and many organizations have adopted or are in the process of adopting one driven by digital transformation and AI efforts. The part where 80% to 90% organizations are failing on their data governance projects is that a data catalog is not the Silver Bullet for your data. Most organizations fail on two aspects: 1) the sustainment of the data catalogs (keep maintaining and adding new sources), and 2) effective communication to reinforce benefits end-users to rely on the catalog as the trusted source. Otherwise, the benefits are lost because stubborn or unaware people keep using new untrusted sources for data analysis and consumption.