Lakehouse AI: A Data-Centric Approach to Building Generative AI Applications
XenonStack
Data and AI Foundry for Autonomous Operations #agenticworkflow #aiagents #decisionintelligence #causalai
?Introduction???
Generative AI will transform every industry. And has been pioneering AI innovations for a decade, actively collaborating with thousands of customers to deliver AI solutions and working with the open-source community on projects like ML flow, With Lakehouse AI and its unique data-centric approach, we empower Customers to create and deploy AI models in a timely, robust, and fully managed manner. Today at Data and AI there are several new capabilities that establish Lakehouse AI as the premier platform to accelerate your generative AI production journey.???
What are the challenges with developing generative AI solutions??
Optimizing Model Quality
Data is the heart of AI. Poor data can lead to biases, hallucinations, and toxic output. It is difficult to effectively evaluate Large Language Models (LLMs) as these models rarely have an objective ground truth label. Due to this, organizations often struggle to understand when the model can be trusted in critical use cases without supervision.??
Cost and complexity of training with enterprise data
Organizations are looking to train their models using their own data and control them. Instruction-tuned models like MPT-7B and Falcon-7B have demonstrated that with good data, smaller fine-tuned models can get good performance. However, organizations struggle to know how many data examples are enough, which base model they should start with, to manage the complexities of the infrastructure required to train and fine-tune models, and how to think about costs.???
Trusting Models in Production
With the technology landscape rapidly evolving and new capabilities being introduced, it’s more challenging to get these models into production. Sometimes these capabilities come in the form of needs for new services such as a vector database while other times it may be new interfaces such as deep prompt engineering support and tracking. Trusting models in production is difficult without robust and scalable infrastructure and a stack fully instrumented for monitoring.??
Data security and governance
Organizations want to control what data is sent to and stored by third parties to prevent data leakage as well as ensure responses conform to regulations. We’ve seen cases where teams have unrestricted practices today that compromise security and privacy or have cumbersome processes for data usage that impede the speed of innovation.??
How existing models or train your own model using your data??
Vector Search for indexing
With Vector Embeddings, organizations' Generative AI and Learning Machine Intelligence (LLI) can be used by organizations in a variety of applications, including customer support bots used by your organization. The entire corpus of knowledge to search and recommend experiences that understand customer intent. Our vector database helps teams quickly index their organizations’ data as embedding vectors and perform low-latency vector similarity searches in real-time deployments. Vector Search is tightly integrated with Lakehouse, including Unity Catalog for governance and Model Serving to automatically manage the process of converting data and queries into vectors.????
Curated models, backed by optimized Model Serving for high performance
Rather than spending time researching the best open-source Generative AI models for your use case, you can rely on models curated by Databricks experts for common use cases. Our team continually monitors the landscape of models, testing new models that come out for many factors like quality and speed. We make best-of-breed foundational models available in the Databricks Marketplace and task-specific LLMs available in the default Unity Catalog. Once the models are in your Unity Catalog you can directly use or fine-tune them with your data. For each of these models, we further optimize Lakehouse AI’s components - for example, decreasing model serving latency by up to 10X.????
AutoML support for LLMs
We’ve expanded the AutoML offering to support fine-tuning generative AI models for text classification as well as fine-tuning base embedding models with your data. AutoML enables non-technical users to fine-tune models with point-and-click ease on your organization’s data and increases the efficiency of technical users doing the same.????
领英推荐
Securely serve models, features, and functions in real-time?
Model Serving, GPU-powered and optimized for LLMs
Not only are we providing GPU model serving, but also, we are optimizing our GPU serving for the top open-source LLMs. Our optimizations provide best-in-class performance, enabling LLMs to run an order of magnitude faster when deployed on Databricks. These performance improvements allow teams to save costs at inference time as well as allow endpoints to scale up/down quickly to handle traffic.?
Unified Data & AI Governance
We are enhancing the Unity Catalog to provide comprehensive governance and lineage tracking of both data and AI assets in a single unified experience. This means the Model Registry and Feature Store have been merged into the Unity Catalog, allowing teams to share assets across workspaces and manage their data alongside their AI.???
MLflow AI Gateway
As organizations are empowering their employees to leverage OpenAI and other LLM providers, they are running into issues managing rate limits and credentials, burgeoning costs, and tracking what data is sent externally. The MLflow AI Gateway, part of MLflow 2.5, is a workspace-level API gateway that allows organizations to create and share routes, which then can be configured with various rate limits, caching, cost attribution, etc. to manage costs and usage.??
Databricks CLI for MLOps
This evolution of the Databricks CLI allows data teams to set up projects with infra-as-code and get to production faster with integrated CI/CD tooling. Organizations can create “bundles” to automate AI lifecycle components with Databricks Workflows.?
Rest, stay focused, and develop a data-driven mindset to have a positive generative ai application.??
Unlock the potential of Lakehouse Ai for smooth development.??
Conclusion??
Experience the seamless integration in a data-centric approach a pivotal shift in the landscape of Generative AI applications. By seamlessly integrating data intelligence with creative algorithms, Lakehouse AI not only pioneers' innovation but also establishes a foundation for intelligent, meaningful insights. This holistic approach propels the development of Generative AI applications into a new era.?
What you can explore more ?
?