Data Quality Belongs in the Data Catalog
Acryl Data
Reliable Data. Compliant AI. Simple | Driving DataHub, the #1 Open Source Metadata Platform | Discover, Govern, Observe.
Hear John Joyce (Founding Engineer, Acryl Data) break it down on The Ravit Show
As data increasingly becomes a product, maintaining high-quality standards and ensuring accessibility for all stakeholders is more crucial — and challenging — than ever.
At the Snowflake Summit 2024, John Joyce, Founding Engineer at Acryl Data, met with Ravit from The Ravit Show to discuss how teams use DataHub to tackle data quality challenges by unifying data discovery, quality, and governance under one platform.
Watch the full interview below and read on for some of the highlights from Ravit Jain and John’s conversation!
Ravit: Can you tell us about Acryl Data and what you do there?
John: Absolutely. I’m John, a co-founder at Acryl. At Acryl, we aim to build a central control plane for your data.
Snowflake excels at storing, querying, and processing data while we focus on the human component — the who, what, when, where, and why of the data.
Ravit: Why is data quality so important in today’s organization, especially given the Gen AI hype?
John: I think the reason is that data is becoming a product. What you just mentioned is one type of data product, which is an AI model. But, of course, there are internal reporting dashboards and recommendation features that feed back into the platform and are statistics-based. So, I think because data is becoming a product, it’s more important than ever that the quality of those products is maintained over time. That’s a trend we’ll continue to see over the next ten to twenty years, and it’s going to explode.
Ravit: I can’t wait to see all the solutions that you bring to the table. I’m also curious to hear the problems you foresee in the future.
John: I think the big one we’re seeing, especially around data quality right now, is accessibility and visibility of quality context. Folks are quickly adopting tools for data engineers to monitor the health of their data assets. But the problem is that a lot of the time, the data quality rules are buried inside git repos or YAML files and not accessible to the people who understand the data, which are the people using it on a day-to-day basis.
领英推荐
Ravit: How are DataHub and Acryl solving these problems and looking toward the future?
John: We make Data Quality more accessible. We’re building one platform that unifies data discovery — finding the data, where it is, who owns it, why it exists — data quality, which is the health of every data asset, and data governance, the compliance aspect of every data asset, all under one roof. Ultimately, we believe these signals reinforce each other. If you understand who the owner of a table is, you’ll have better data quality practices, and better data governance practices, and it will be easier to find the data.
Ravit: 100%. That’s one of the biggest problems for a lot of enterprises out there. And you’re looking at it very sharply, which is good news for a lot of them.
Why Acryl Observe and not another observability tool? What’s the benefit of having it all in one?
John: I’m going to go back to this idea of accessibility. We believe you cannot build a sustainable data quality practice unless you bring all the personas involved in producing and consuming data into the data quality picture. Many vendors specifically focus on data observability for data platform engineers. Those tools are great, but we think they don’t do well in bringing everyone into the story around data quality. We’re trying to unify that picture with one tool where you can involve everyone in your organization, whether it’s a marketing associate, a BI intelligence engineer, or an upstream application engineer.
Ravit: Any use case that you have on top of your mind that you would like to share with our audience?
John: There’s a cool use case one of our customers is trying out right now. The downstream business analysts, the domain experts of some of their key tables, are coming into DataHub and, in three clicks or less, finding data quality monitoring checks. They’re then running those checks inside their CI/CD pipeline for data. You can define them through an accessible user interface and then run them where you want.
Engineers can still run them as part of their airflow jobs or CI/CD, which is a cool balance.