DDL Ep 03: Fast-Tracking Business Impact from Data and Embracing Modern Governance
Acryl Data
Reliable Data. Compliant AI. Simple | Driving DataHub, the #1 Open Source Metadata Platform | Discover, Govern, Observe.
What universal elements make for an effective data strategy, regardless of industry? How can data leaders demonstrate ROI from data quality initiatives? What, exactly, does “modern data governance” entail in 2024?
In the third episode of DataHub’s “Decoding Data Leadership” series,?Swaroop Jagadish?(CEO/Co-Founder of?Acryl Data) sat down with?Kiran Padavala, Director and Head of Data and MarTech Engineering at?Viator. They discuss the intricacies of modern data governance, the role of metadata, and key strategies to drive business impact and modernize data governance
Read on for the conversation and a summary of the main takeaways, and check out the full recording on YouTube??
This conversation has been edited for brevity and clarity.
Swaroop: Hi, Kiran. Welcome to DDL. We’d love for you to share a bit about yourself and your background.
Kiran: Sure, Swaroop. I’ve been with Viator for about fifteen months, doing a variety of things, but I am currently focused on looking at data and platforms and supporting all of the digital marketing efforts. In addition, we’re also in the process of creating a new function focused on optimizing revenue for Viator and suppliers.
Before this role, I was with Barclays for over a decade working across a variety of roles. Anything that you can think of from an engineering standpoint — I probably would have worked on it at Barclays. So I almost consider Barclays as my alma mater. I also serve as an angel investor in some companies and as an adviser for others.
Swaroop:?One thing all our listeners care about is best practices across data strategy.
Can you share how your thinking around data strategy has evolved throughout your diverse career experiences from a giant traditional bank to a consumer marketplace?
Kiran:?In terms of data strategy, I think the things that hold it together are always consistent — your business drivers, your operational environment, and the specific context of your industry.
Reflecting on my time in banking, the things that matter most are risk management, operational controls, business intelligence, and the rigor applied to these. In banking, the cost of wrong decisions can be extremely high.
Take the recent?Citibank case, for instance, where they were fined $400 million for poor risk management and lack of data governance and controls.
With the move away from banking to something like Viator, there is a sense of being encumbered.
At Viator, however, with over 300,000 experiences and 20 million customers, our focus is on reducing the friction for customers planning their trips. Speed and visibility are key because they impact the company’s top line and overall operations significantly.
There’s also the difference in tools and technologies. For example, while I can apply the most complex personalized algorithms to match the customer’s best interest at Viator, such complexity isn’t always feasible in, say, lending decisions in the banking context — because you lose some level of explainability as you go into more complex neural networks. So, there’s a lot more scope for innovation in marketplaces like Viator.
The nature of data also varies significantly. At Viator, we deal with multi-modal data, including image processing, video processing, and unstructured data. This is not always the case in banking.
The importance of historical data also differs. At Barclays, I worked on revamping capital models using data spanning 13–14 years, going back to the last recession. In a fast-moving marketplace like Viator, we don’t often need to look beyond two to three years of historical data. The implications are different based on the context of the industry.
Swaroop:?Are there things that you’ve found to be universal or common across your experiences at Barclays and Viator?
Kiran:?The good part is that several things that we learn are transferable.
When it comes to data strategy, every organization is ultimately aiming for the same outcomes — maximizing the value of their data assets and leveraging them effectively to drive business success.
This means having solid foundations in place, starting with high-quality data that you can trust and access quickly. It’s also about implementing strong data discovery and observability practices to ensure you’re making the most of your data resources. And then there’s the importance of driving self-service analytics, empowering teams to extract insights and make informed decisions on their own. The foundation is common, it’s just that emphasis and focus could vary between organizations
You often hear data being compared to oil, but the real challenge lies in refining that “oil” into something valuable and usable. It’s about thinking about how we reduce the turnaround time in terms of leveraging our datasets effectively and efficiently.
Swaroop:?2023 was a tough year for the data community. There was hesitation in making new investments, and there was a lot of emphasis on making sure existing investments paid off. In 2024, we’re seeing slightly more cautious optimism.
However, the pressure remains on data leaders to deliver rapid ROI.
So, what strategies can they employ to minimize the time to ROI in their projects?
Kiran:?That’s a million-dollar question. But given the size of the industry, it’s probably a billion-dollar one.
All data leaders have to think about two things:
Don’t wait for the big foundational project that’s going to change the world, whose early benefits are going to be achieved only at the end of one year or two years.
The data layer has probably borne the unfair share of whatever happened within the industry. Part of the reason also is that high-level management may not fully grasp the contributions of data teams to business success.
For instance, when a machine learning model boosts conversion by 2%, earning millions, credit often goes solely to the model.
What often gets overlooked is that a strong data foundation was essential for building that model. There has got to be revenue attribution for the work that data platform teams or the data teams have done to be able to build such an efficient model.
If you don’t track these wins as part of your scorecard and attribute that success to data, people might just think, “Somebody built a model, and that’s what made it happen,” forgetting that data was at the core of the journey.
Swaroop:?Can you share an example from your past experiences or current role where you’ve attributed success back to data initiatives?
Kiran:?One of the things that we are working on at the moment is improving data discovery and improving the overall quality of data. As soon as I stepped into this role, I made it a priority to sit down with stakeholders and?really?understand what changes could make them more effective and contribute to our overall success.
The feedback highlighted that if we could reduce the time it takes to discover the right data elements and provide metadata on data quality, it would significantly enhance efficiency.
For example, if an analyst spends a week and a half identifying the right data attributes, improving this process would allow them to focus on the next steps, like exploring this data better and developing features. That is a value multiplier. Data scientists spending time just locating and assessing data quality is not the best use of their time.
Again, not all datasets are equal. In most organizations, transaction data tends to be the richest form of data. For example, in financial services, you can base lending decisions on transaction data. Similarly, financial reporting and recommendation engines heavily rely on transaction data. So, prioritize your top-tier datasets. Make them available and accessible, and ensure they are a richer body that users can trust. That’s how you can start to move faster.
Build the framework, but remember, you don’t have to do everything on day one. You might have seven measures of data quality but choose to implement two or three initially. The problem statement will help guide these steps.
Swaroop:?On that note, there’s talk in both the data community and broader business circles about tying multiple business-facing use cases to the same initiative to create leverage. Essentially, when you can demonstrate how a single initiative addresses several business needs.
Have you come across any concrete examples or do you have any recommendations in that area?
Kiran:?Transactional data was one example. But let me go back to something I was involved in at Barclays. I worked remotely with the CRO teams, which had various mandates like internal reporting, regulatory reporting, development, and business intelligence reports. We had a project-driven delivery model where each team created projects based on their specific requirements. This often led to multiple copies of the same data, causing inefficiencies.
We had to rethink our approach. We decided to create a common dataset, crystallizing it into an analytics dataset that could be used across different use cases. Teams could build custom attributes on top of this common dataset, but the idea was to avoid starting from scratch with each new project. This way, when starting a new project, teams would find 60–70% of the data they needed already within our existing rich model.
That itself was value multiplying because we improved the efficiency of our reporting and modeling tasks, which previously took years. It was a hard sell, but the leadership recognized the value in moving away from isolated projects to a more integrated data strategy.
领英推荐
Swaroop:?There’s so much more to talk about that topic, but just switching gears quickly. A term that is gaining a lot of traction is “modern governance,” especially as organizations strive to operationalize AI and other data initiatives.
How do you define this concept within the context of your current data and AI initiatives?
Kiran:?I guess all of us have bought into this traditional view of governance, where you need endless email approvals from data privacy and data quality teams, etc. This means a lot of overhead for teams doing the actual work.
Modern governance, however, is almost in line with how governments have evolved over the last few decades. A few decades back, governments were heavily involved in industries like defense, steel, and car manufacturing. But today, they have since shifted to providing enabling infrastructure and oversight rather than doing the work themselves.
For me, modern governance should be owned by independent teams with ownership of the datasets. These teams should follow centrally set principles but be responsible for enforcing them, detecting issues, and fixing them. It’s not about creating new tools or code for every governance check.
So having a computational governance model, which teams feel responsible and accountable for means that your governance can be more forward-looking rather than trying to live with the beats of what is happening on a day-to-day basis.
And with respect to AI, I think it is much more challenging because, in many ways, the future of data and AI are intertwined. Nobody can build a successful model without having quality data that you can trust and access easily. There are initial considerations for AI model governance, such as managing high-risk use cases and assessing potential customer detriment.
I think, at its core, modern governance should be an enabling and forward-thinking function rather than just a checkbox exercise. Instead of simply listing fifty compliance items on an Excel sheet, the focus should be creating a framework supporting innovation and efficiency.
Swaroop:?Love the government analogy. I think that’s exactly how it’s shaping up now with the whole decentralized data development. Just a confession — I didn’t even know about the term governance while I was at Airbnb.
What’s your take on observability and governance? Do you see them as interconnected or as separate issues?
Kiran:?I think both of them are more dynamic than we realized. It’s like a three-vertex triangle — with a business equivalent business driver as one vertex, and observability and governance as the others. And each one feeds the other two aspects.
A simple example could be saying, “Hey, you introduced new observability which has flagged certain gaps”, meaning your governance practices are missing something. That may indicate that my data quality rules are broken in a way, and I need to add something on top of it. It’s almost like a flywheel where each part feeds the other two, and their working in a conjoined fashion makes a huge difference for the overall organization. Then you are more focused on how quickly you can solve customer problems rather than saying, “Hey, I’ve got a data quality issue. It’s going to take me the next six weeks to first fix the data quality issue before I can use the data.” Or, ”I’ve got a pipeline broken, but I didn’t know it was broken. Somebody in the business has come back and now tells me that my model is not working,” and then you’re trying to figure out what’s wrong.
So for me, having that loop going on with active participation across all these three elements is key.
Swaroop:?Got it. What are the trends or technologies in data and AI that you’re most excited about as you look to the next 12–18 months?
Kiran:?It’s not one specific technology trend, but rather the overall shift toward data democratization.
I’m excited about how data and AI are becoming more mainstream and accessible.
Now, everyone in the organization can access and learn from data, instead of relying on a centralized data or steering team that holds all the power. Now we have the cumulative power of everyone in the organization to unleash upon data, and that’s what I’m most excited about at this point.
Swaroop:?Given that ~70% of the effort is spent on data prep and data quality, how do you recommend generating ROI from those initiatives?
Kiran:?When it comes to calculating the ROI for these initiatives, we’ve got to consider wasted effort. Think about it: if a project takes a certain number of days, how much of that time is spent on data cleaning and repair? For instance, if our projects were worth $12 million, but we spent about $6 million just cleaning up data. Now, investing in a tool that costs a fraction of that, say half a million, can eventually save or generate that $6 million. That’s where we really see the value of data quality or accessibility tools.
And there’s also cycle time which matters the most. There’s also an opportunity cost to consider in terms of the business value of a project. Let’s say you projected a project would deliver six months of incremental value in a financial year, but if it’s four months late, you’re only getting one-third of the overall committed value. So reducing cycle time is a key aspect to include in the benefits case.
And you know, data practitioners need to speak the language of business value. It’s a skill as important as the work itself. You’ve got to be an evangelist for your data team if you want them to succeed. That means effectively communicating the value of their work. Otherwise, people might just see it as a cost center.
Swaroop:?We have a question from our listeners on the usage of synthetic datasets for models. Where do you stand on their usage?
Kiran:?It is particularly helpful in several scenarios where you’ve got data gaps or data in the system and you’ve got no way of filling in those gaps. That’s a use case where synthetic data works towards creating those patterns and can help you do that. Sometimes it’s also a question of addressing challenges around skewed data. If your data is too skewed and you’re trying to build a model on top of it, then it can be very effective in terms of doing that, where you’re potentially having the users of the data help create a better-rounded structure in terms of being able to build a model with data accuracy.
And again, around the part of bias, if you have data that you feel is already biased by the reality of what is happening in the world, then merging it with synthetic data would reflect what it should be. Using a hybrid or ensemble of the data can make a really good use case for not just carrying these biases forward but also trying to rectify them. So, synthetic data helps with a number of use cases where there are restrictions around access to customer data, for instance. You want to build a model, but it’s protected by data privacy. You can’t have all the data dumped into your development space to do something about it. Using the overall characteristics of the data that you’re trying to generate can help you do a lot of analysis and process as well.
Swaroop:?What’s the one key insight or piece of advice you’d like the audience to take away regarding their 2024 data strategy?
Kiran:?Apart from what I’ve said, I think the success of your data strategy really depends on two things. One is the buy-in from the organization, which means you need to get the organization aligned with what you’re trying to do. In doing so, you’ll find that some stakeholders are more data savvy than others. So try to work with partners who can eventually become your sponsors. When you go to one of your executive meetings, you want somebody to stand up and say, “Yeah, that sounds about right” or “I completely support what we’re putting forward as a strategy.” Without that, the onus of trying to convince an executive team can be much harder. So it’s not just enough to come up with a good strategy. It’s important to get the organization lined up with that and to find sponsorship to do that.
Swaroop:?Fantastic.?Thanks so much, Kiran, for doing this and sharing these nuggets of wisdom. It’s been great.
Summary of Takeaways
1. Universal Foundational Elements of Data Strategy
Despite industry differences, a strong data strategy consistently revolves around aligning with business drivers and leveraging the operational environment to unlock the full potential of data assets to drive business success.
This comes down to:
2. Understanding Modern Data Governance
Modern governance should be owned by independent teams with a computational governance model. Governance and observability are interconnected, forming a dynamic loop that enhances efficiency and problem-solving.
Effective governance in AI involves maintaining high-quality, accessible data and ensuring governance practices drive innovation rather than merely ensuring compliance.
3. Tips for Demonstrating ROI from Data Prep and Quality Initiatives
The key is to focus on:
4. Emerging Trends and Technologies
5. Advice for Data Leaders
Connect with DataHub