How VISA Uses DataHub to Scale Data Governance
Acryl Data
Reliable Data. Compliant AI. Simple | Driving DataHub, the #1 Open Source Metadata Platform | Discover, Govern, Observe.
For Visa, builders of the massive network responsible for?moving money globally, seamlessness, security, and trust are paramount. They drive the company’s pursuit of excellence, ensuring every transaction is executed with utmost integrity and reliability. At the heart of this commitment lies effective data governance.
As you can imagine, for Visa, data governance is not merely a matter of compliance; it is a fundamental aspect of responsible data management. This requires a robust infrastructure for managing data —?both efficiently and ethically.
In DataHub’s March Town Hall,?Jean-Pierre Dijcks, Senior Director of Product Management at Visa, joined us to share how DataHub is helping them implement data governance at scale.
Understanding Visa’s Data & AI Platform
At the core of Visa’s operations lies its robust Data and AI platform, which is responsible for processing and analyzing the vast array of transactions coursing through its network.
Key components of Visa’s setup include Kafka for messaging systems, Spark for large-scale data processing, multiple Hadoop clusters and databases, common BI tools, and advanced AI tools
As you can imagine, AI plays a pivotal role in various aspects of Visa’s operations, particularly in fraud detection, where AI-driven monitoring systems are key.
This brings us to a key point raised by Jean-Pierre.
AI and Metadata: A Case for Investing in Metadata Cataloging
There’s a strong case to be made for investing more in cataloging and metadata for better AI:?AI systems are only as good as the data that powers them.
As Jean-Pierre Dijcks aptly puts it, “It’s time to move beyond the usual data flows and to start thinking about connecting to and cataloging our AI systems”
Given the current legal landscape and the excitement surrounding AI, Jean-Pierre strongly urges teams to invest more in metadata. This investment will help address inevitable questions about the source and use of data within emerging AI technologies. Visa has recognized the crucial role of metadata in improving data quality, making cataloging and metadata management essential for the company.
Visa’s Data Catalog Journey
Nearly five years ago, Visa embarked on its data catalog journey to build a catalog from the ground up. They aimed to prioritize a metadata-first approach and gain deeper insights into data usage across the organization.
While this custom-built approach offered the flexibility to tailor the system to their specific needs, it also required constant attention and upkeep, diverting resources from value-driving activities.
This realization brought them to DataHub, which has since become a key component of Visa’s metadata management strategy.
What Made DataHub the Right Data Catalog for Visa
DataHub appealed to Visa for several reasons. Firstly, it offers essential features such as messaging and alerting, alleviating Visa’s need to develop and maintain these functionalities internally.
However, what truly set DataHub apart was its robust API platform. This platform enabled seamless integration with Visa’s existing tools and facilitated a more user-friendly experience for data engineers and other stakeholders.
The emphasis on accessibility through APIs struck a chord with Visa’s user base, many of whom preferred interacting with the catalog via APIs rather than through the user interface.
As Jean-Pierre Dijcks explained,?“We found DataHub to provide excellent coverage for our needs. What we appreciate most about DataHub is its powerful API platform.”
Using a Data Catalog to Solve Challenges in Scaling Data Governance
Data catalogs are fast becoming indispensable tools in the pursuit of scalable data governance By centralizing metadata management, facilitating data discovery, and enhancing data quality, they provide a scalable solution to the challenges of modern data governance.
Here’s how Visa is using data cataloging with DataHub to solve some of its pressing governance challenges:
Challenge #1: Manage classifications, definitions, etc. at Scale
The Visa team set out to scalably manage common classifications, definitions, access policies, etc., for attributes across a sprawling system?without?shifting ownership of business metadata to data engineers.
Solution: Built and contributed a logical model called?Business Attributes?to the DataHub Project
“Business Attribute” is a logical model designed by Visa to centralize and maintain crucial business information owned and maintained by data stewards and subject matter experts. It aggregates various business-related metadata, including terms and definitions, classifications, and data access policies, streamlining data management across thousands of datasets and millions of columns.
By defining these logical attributes once and mapping them to table columns, end-users accessing the catalog are provided with curated information directly linked to the business attributes, ensuring real-time updates reflect accuracy and relevance.
Note: Business Attributes are available as of?DataHub v0.13.3!
领英推荐
Challenge #2: Capturing high-quality, validated metadata
The Visa team wanted a way to encourage data owners to provide high-quality and validated annotations of data assets.
Solution:?DataHub’s?Structured Properties
DataHub’s Structured Properties approach is helping Visa’s data platform team a) streamline metadata management and b) improve the developer experience and facilitate more efficient integration of API data into applications.
Challenge #3: Managing copies of datasets across environments
Within Visa’s data ecosystem, it’s common for a dataset to be copied across multiple physical environments. It’s important to them that data stewards and data consumers have an easy way to discover and manage datasets across these environments.
Solution: Defining and implementing “Logical Datasets”
Where is the data? Where does it live in a physical context? What does it mean in the business context?
These are the kinds of questions Logical Datasets can help answer.
Visa is currently developing a Logical Dataset capability to connect replicated tables across physical environments, streamlining and scaling governance efforts while simplifying navigation for users and stewards.
With Logical Datasets, the aim is to establish a more scalable and efficient governance model, where data products, contracts, and definitions are interconnected seamlessly — for better clarity on data location, business context, and overall data management.
Here’s Jean-Pierre breaking this down:
Learnings from Visa’s 5-year Metadata Journey
1. Start with an ‘Invisible Catalog’
It can be challenging to dictate tooling across the company, but using an ‘invisible catalog’ is a great way to start building and fostering a metadata-first environment without dictating tooling across the company.
An API-based approach is the best way to get a foot in the door.
The Visa team was able to leverage DataHub’s API capabilities to embed the data catalog into existing tools and processes, effectively integrating metadata capabilities where needed. With this approach, Visa could enable self-service data access through custom-built tools designed to interact directly with the catalog APIs.
2. Build a closed-loop system for driving a metadata-first approach
Implementing change management and ensuring accountability can be challenging despite advocating for the right principles. A more pragmatic approach involves implementing a closed-loop system that measures and incentivizes actions rather than prescribing methods.
For example, you could ensure that any new dataset deployed undergoes ingestion and that relevant stakeholders are informed about it and any subsequent changes. This would empower data owners to take ownership of their datasets, maintain metadata compliance, and ensure data integrity.
This approach streamlines the data governance process and ensures that the catalog remains refreshed, up-to-date, and seamlessly integrated within organizational workflows.
3. Carefully consider the effort needed to maintain data catalog table stakes (Build vs. Buy)
The decision between building and buying a solution involves a tradeoff between control and effort. While building in-house offers maximum control, it requires significant time and resources. Buying reduces effort but often limits customization and control.
As Jean-Pierre shares, open-source solutions like DataHub “offer a bit of the best of both build and buy.” Open source offers the ability to influence development, engage with and leverage the community, and contribute to ongoing improvements.
If your organization needs additional support and stability, a SaaS solution built on an open-source foundation, like?Acryl Data, might be just what you need. It combines the benefits of vendor support with the robustness of an active open-source community, providing dependability and reliability.
Want to know more about Visa’s experience with DataHub? Watch the full video here:
Connect with DataHub
Strategic Sr. Data Architect | Engineering Management
5 个月Was this solution implemented with Acryl's Managed DataHub or the Open Source DataHub?