The benefits of Openness and Modularity in an AI Lakehouse
Rik Van Bruggen
Helping the world Operationalise Machine Learning and AI in a meaningful, efficient, managed and effective way
I have written before about some of the important paradoxical tendencies in the Machine Learning (ML) and Artificial Intelligence (AI) industries. Let’s look at it from two perspectives:
It should be clear that these are two important, decisive, even, perspectives that can truly determine the success or failure of AI/ML projects. Therefore, we’d like to zoom into these two perspectives in a little more detail, and then explore what the impact of openness and modularity could be on the context of these two personas.
The Developer's Context
For developers working on these types of AI/ML software projects, I think we can state some universal truths:
At the same time: we know that data scientists and machine learning engineers spend a lot of time on setting up and managing the tools in their technical work environment. See the 2022 Kaggle survey: data scientists spend 50-80% of their time on their environment, data engineers spend 30-40% of their time on maintaining their pipelines, ML Engineers spend 20-30% of their time on this. All of this time is being spent on? If we can cut the time spent on non-core ML and AI tasks by 10%, we will make a massive win for the organization and the individual contributor.
The CTO's Context
For CTO’s that are working on AI/ML software projects, we can also consider some universal considerations:
We know that all this work will have to be done in an economically sound and feasible way. See the SpringerLink Study and the McKinsey State of AI report - both stressing the need for cost efficiency in AI and ML systems.?
So: in assessing both of these perspectives, we are clearly presented with an important question: how can we figure out a way to incrementally innovate using AI/ML, provide value to our organization in an economically sound manner, while respecting the productivity requirements and current tool choices of our engineers as much as possible??
The importance of Openness and Modularity
As I have tried to explain how both the individual engineer and their CTO management have at least slightly conflicting interests in the implementation of AI and ML projects. How can we make this work then? Here’s where I think we can take a look at the software industry in general, and look at some of the technological and methodological approaches that have made software engineering so much more productive in recent years.?
To do so, let’s think back to where we came from: as little as two or three decades ago, waterfall-based software engineering methodologies were the standard for how you would tackle large software projects. It took books like the Mythical Man-month and the Agile Manifesto to emerge for people to start changing their approach, and to start developing and then using iterative software engineering methodologies, DevOps practices, Infrastructure-as-Code and cloud computing as critical enablers of a more modern approach. This is what we need to apply in the world of the very specific domain of AI and ML now: we need to learn how to apply these techniques in the world of AI and ML, and that will require a specific architecture to enable this.?
This architecture should therefore?
On the contrary - our vision here should be for an open and modular architecture that would allow for us to leverage the power of AI and ML in an incremental, balanced way. Like in general software development best practices, this architecture will enable a combination of tools and processes that will propel AI and ML projects forward, and resolve the paradox that we have seen emerge.
Let’s discuss this vision in a bit more detail. We’ll call this the Incremental AI Architecture (AIAI), enabled by the AI Lakehouse.
The Incremental AI Architecture (IAIA)
Today, many AI and ML projects are characterized by structures that are very different from what we want in an innovative and incremental AI Architecture. Systems are developed with bespoke tools and processes that are specific to the project in which they are implemented, and do not allow for reuse and automation in a way that an IAIA enabled by an AI Lakehouse would. Therefore, we propose a combination of best practices and tools that will allow you to move in this direction.
领英推荐
Let’s walk through the different steps in a few succinct paragraphs.
Start with modularization of pipelines:
At Hopsworks, we have argued many times before that one of the first steps that one should take when embarking on an AI/ML project, is to conceptually break up the AI/ML system into three parts. We refer to these three parts as the “F-T-I” pipelines, where we would distinguish three distinct phases and processes in the development and productionalization of these systems. At a high level, we would consider
Once we have split the AI/ML system into these three functional and modular parts, we can proceed to take the next step to make the system into a true production-ready AI system. This will require some education of our team, and that would be the important second step in this journey.
Educate the team on the importance of MLOps for production AI systems
In order to maximize the benefits of AI systems in production, we will need to automate as many of the FTI pipeline steps as possible. This is what we call MLOps, short for “Machine Learning Operations” and analogous to DevOps.?
These automations will offer?
Once we have the team on board, we can start looking at an implementation of a professional MLOps platform - and to choose one that allows for a gradual, incremental implementation to accommodate the concerns of the Developer and the CTO.?
Do a gradual implementation of an MLOps platform
Over the years, Hopsworks has gained a lot of experience with the implementation of AI/ML systems on top of an MLOps platform, and as a consequence we have developed and released the 4.0 release of the Hopsworks Enterprise platform. We have called this the “AI Lakehouse” release, as we believe that it allows for a comprehensive but gradual implementation of everything an AI/ML project needs to become successful.
Based on this experience, and using the AI Lakehouse infrastructure, we recommend that you consider the following steps in sequence:
Last but not least, and in order to ensure solid future implementation of additional AI and ML systems in your organization, you should leverage the above architecture to measure the business results of the AI system implementation. If you do so, chances are that this will be fueling additional investments into the technology, bringing better prediction services to our organizations and a more competitive market position as a result.
Wrapping up
With that, we hope to have made the case for open, modular AI and ML infrastructure more clearly. At Hopsworks, we live and breathe this philosophy day in and day out, and we would love to help you see its power for yourself and your business. Please reach out to us if you have any comments or questions - we will be happy to help out.
??Multi-Award Winner CEO, AI Consultant, Podcasts??WalesTech changemaker. I help struggling CEOs & brands to launch, grow & scale 10X with smart solutions, AI integeration & content. Make More, Work Less, Deal Smarter??
3 个月Sounds like you've put some thought into that. Balancing developer and CTO needs is crucial. How's the feedback been so far? Rik Van Bruggen