Vision and convictions for a successful data catalog take-off

Vision and convictions for a successful data catalog take-off


Metadata management is a central discipline in data management in general, and in data modeling in particular.

The tool of choice for this discipline is increasingly the data catalog.

These host several types of metadata.

It completes the IT department's service offering by providing a tool for centralizing knowledge that is usually dispersed in technical tools (ETL, modeling tools, databases, datalake, Github repository, etc.) and documents in various formats located either on networks (network disks on internal servers or drives) or in the company's document management tools (Sharepoint, Teams, Slack, etc.).

A data catalog is a metadata management tool that lists and documents the metadata stored in a company's production environments, making it easier to find and reuse by business and IT staff and applications. The benefits of a data catalog include :

  • Better collaboration and more informed decision-making: Employees can more easily access the data they need to do their jobs, which can improve collaboration between business and IT teams, within each of these two communities, notably by setting up workflows, for example on the validation of glossary definitions of business terms, and thus helping to make relevant decisions;

Example of validation workflow settings for business terms


  • Reduced search time: The data catalog makes it possible to find useful data quickly, rather than spending time searching in different places or asking colleagues for help. This improvement in the "findability" of the right information maximizes the use and reuse of the company's most interesting data;


Example of advanced metadata search functions
Example of advanced filtering and export of search results

  • Better data control: The data catalog enables better control of data access and use, by clearly managing access levels and classifying data according to its level of security and confidentiality;
  • Better regulatory compliance: The data catalog can help guarantee compliance with regulations on data confidentiality, security and quality, by facilitating the management and control of data by business functions, in line with corporate policies;
  • Better data quality: The data catalog enables data to be tracked and managed to ensure its quality and accuracy throughout its lifecycle and across the entire production chain: from the primary endpoint from which it emerges (application, data provider, etc.) to the ultimate endpoint where it is consumed (reporting, dashboard, file, API, etc.). In this case, it becomes the first line of defense in terms of data quality management, offering an ideal location for monitoring: indeed, an incorrect format for a field can, at best, create misalignments with business needs and thus generate technical debt, and, at worst, have disastrous consequences for the entire lifecycle of the data it hosts (e.g.: field in text format instead of date format);


Example of a data lineage technique including a data quality indicator


However, implementing a data catalog can pose challenges, particularly with regard to :

  • The intrinsic complexity of data: Data can be stored in different formats and sources, which can make it difficult to categorize and organize. The versatility and richness of the solution's connectors will be key criteria in selecting the solution best suited to your company's current and future ecosystem;
  • Involvement of business and IT users: Users must be involved in the implementation of the data catalog to ensure that it meets their needs. Each community (Business or IT) is interested in this type of tool, but for different use cases: Functional lineage for some, technical lineage for others. Implementation must not be done without them. Establishing data governance is one of the prerequisites for generating demand for this type of equipment, which will make life easier for the employees concerned. Once again, data governance needs to be both IT and business-oriented, if it is to be effective and sustainable. To ensure that the bellows don't fall and that data governance remains active, the CDO can ensure its follow-up by defining the objectives and KPIs necessary for the success of the data governance program: certain tools can boost usage by integrating customizable dashboarding capabilities and the integration of these same KPIs (e.g.: number of "orphaned" business terms, i.e. linked to no technical metadata);

Example of a data governance dashboard embedded in a data catalog

  • Catalog maintenance: Regular maintenance of the data catalog is essential to ensure that it remains up to date, relevant and continuously enriched, both in terms of business glossaries and ingested sources. However, this point is alleviated by the increasingly automated technical metadata ingestion capabilities of this type of solution, whose market is currently very dynamic (April 2023). The ability to automatically associate business glossaries with technical metadata (using artificial intelligence) is another important factor to consider when choosing a solution.
  • Costs: Setting up a data catalog can be costly in terms of time and resources. Managing the business glossary is undoubtedly the most important task in this type of program, and needs to be anticipated so as to be able to get off to a flying start by loading the tool with sufficient relevant business terms, and ensuring the tool's consistency from the outset. Finally, vigilance regarding cloud consumption costs must be a point of attention right from the launch, by rapidly defining a policy for refreshing the technical dictionaries ingested into the solution.


To sum up, a data catalog can bring many benefits to a company, but the challenges of integrating it into the company's own ecosystem and maintaining the business glossary must be taken into account. It is also important to think carefully about the company's priority use cases, and to involve data governance players in setting up the catalog to guarantee its success.

Typical metadata management use cases include :

  • he embodiment of data governance roles and responsibilities in a common tool used by both Business and IT;
  • Facilitating data searches for the average person;
  • Facilitating IT impact analyses through the visualization of graphical data lineages;
  • Linking technical metadata with corporate policies to manage compliance with regulatory constraints (GDPR, BCBS239, SOLVENCY II, IFRS17, KYC, FATCA, KIIDS, PRIIPS, etc.);

But today, more "active" metadata management can also ensure :

  • Data Observability, i.e. the pro-active identification of potential incidents linked to the integrity of data ingested in increasingly complex linkage chains;
  • Financial optimization (FineOps) of data consumption, particularly in cloud contexts;
  • The creation of data marketplaces offering data products to internal or even external consumers;

What about you?

Can you think of other uses for data catalogs?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了