Spotlight on Data Intelligence Platform

Spotlight on Data Intelligence Platform

AI is a whole new world, and there’s a whole new dictionary to go with it. To read my future articles, join my network by clicking 'Follow'.

----------------------------------------------------------------------------------

In the previous blog, I wrote about MosaicML . Last week, our Founders announced something new: the Data Intelligence Platform.

Question: What happens when you combine Databricks together with MosaicML??

Answer: You get a new product category, called the Data Intelligence Platform.

And by the way, this writeup is dedicated to Batman!

First, a history lesson….

The way we manage data has changed a lot over time.

I will use the analogies of a file cabinet and the Batcave to explain how we stored and managed data in the past. And then, I will use Batman’s butler Alfred to explain the concept of a Data Intelligence Platform.

Alfred is my favorite character in the Batman series. If I had a guy like Alfred on my side, I’d be a superhero too.

File Cabinets (Data Warehouses):

First, there were data warehouses. These are like big digital filing cabinets. It's great for storing lots of folders and documents in a very organized way. Each folder is neatly labeled and placed in a specific order.?

Data warehouses are like big digital filing cabinets. Great for

But file cabinets are limited to paper documents. And a file cabinet can only hold a small amount of data.

In the same way, data warehouses are limited to a certain type of data (structured data). They are also limited in the volume of data they can handle efficiently.

The Batcave (Data Lakehouse):

Over time, people started creating and collecting different kinds of data. These included videos, audio files, social media posts, etc... These are called unstructured data. And they cannot be stored in a file cabinet or data warehouse.??

At the same time, we were also collecting more data. Bigger volumes of data. That's when data lakes came in. A data lake is like a giant storage area. Don't know where to put a piece of data? No worries. Just throw it into the lake. Unfortunately, the data in data lakes kept getting messed up. Too many things were getting thrown in with no versioning or labeling. So it got polluted.

Then, Databricks invented the Lakehouse. The Lakehouse is is a combination of a data warehouse and data lakes. It can handle unstructured and unstructured data at scale. And it can organize both types of data as well as any data warehouse. It can even do the same kinds of transactions as a data warehouse.

Lakehouse is like the Batcave. It organizes both structured and unstructured data.

A lakehouse is like the Batcave. It's more than a data warehouse. And it's more than a data lake. It's a Batcave. It's Batman's entire headquarters. It organizes not only files and documents, but everything Batman needs for his missions including videos and weapons.

  • Vast and Varied Inventory (Diverse Data Storage). The Batcave is known for its vast and varied inventory. It houses everything from high-tech gadgets to vehicles, and even historical artifacts. In a similar way, a data lake is a vast storage repository that holds a huge amount of raw data in its native format. Inventory isn't limited to one type of item. Data lakes can store all types of data. This includes structured data like databases and unstructured data like images, and videos.
  • Centralized Base of Operations (Unified Data Management). The Batcave is Batman's main base. This is where he plans missions and analyzes information. The lakehouse concept builds on the idea of a data lake . It not only stores vast amounts of data. It also allows for the organization and analysis of this data in one place. It brings together the expansive storage capabilities of a data lake in one hand. And the organizational and analytical functions of a data warehouse in the other. This is like how the Batcave combines storage of Batman's gear with advanced technologies for mission planning and data analysis.
  • Advanced Technology Integration (Sophisticated Data Processing). The Batcave has advanced technology that helps Batman analyze clues and make strategies. Similarly, the lakehouse integrates advanced data processing technologies. In a lakehouse, data can be processed and analyzed to get useful insights, like Batman using his tech to make action plans.
  • Secured and Exclusive Access (Data Security and Governance). The Batcave is secured and hidden from the public, ensuring only authorized access. Data lakes and lakehouses prioritize data security and governance. They put measures in place to protect sensitive data. And they manage who has access to what data. This is like the way Batman safeguards the secrets within his Batcave.

Alfred the Butler (DI Platform):

Lakehouses are great. But you need more if you want to do AI.

So now, the latest product category is the Data Intelligence Platforms (DI Platform).

It’s like having a super-smart system. It not only stores all your digital stuff, but it also understands them. It can find exactly what you need, when you need it. All you need to do is ask. And you can use your own words.?No need for code.

This is like having a butler, and not just any butler. It’s like having Batman’s butler, Alfred!

DI Platform is just like Batman's butler Alfred.

Alfred is a super smart and sophisticated assistant. You can speak to Alfred with your natural voice. You can even use slang to ask for information or tasks. Alfred will understand your request. And Alfred will also anticipate your needs based on his knowledge. (He can read your mind.) And Alfred will keep your secrets. He will protect your sensitive information with enhanced governance and privacy features.

In short, Alfred is not just storing and retrieving data. Alfred understands your data and manages it intelligently. And he's using the data in a way that's tailored to your specific needs and context.?

Alfred gets you.?

The Concept of a DI Platform

Alfred is much more than a regular butler. He understands everything about Bruce Wayne (Batman) and his needs. In the same way, the DI Platforms goes beyond traditional data management. The AI understands all the data in a company, just like Alfred understands Wayne Manor and Batman's world.

AI understands all data in a company, like Alfred understands Wayne Manor and Batman's world.

  1. Natural Language Access (Alfred's Communication Skills). You can talk to Alfred, and he will understand you. DI Platforms do the same with data. People can ask questions and give commands in their everyday language. The platform understands slang and jargon. For example, medical or insurance terminology.
  2. Semantic Cataloging and Discovery (Alfred's Deep Understanding of Data). Alfred, like a DI Platform, isn't just storing Batman's gadgets and information. He knows a lot about gadgets. This includes: what they do, when Batman uses them, and how they help on missions. Similarly, DI Platforms use AI to deeply understand the data in a business. It understand not just what it is. It also understands what it means and how it's used. DI Platforms catalog and organize data. They understand how different pieces of data relate to each other, making it easier to find and use them.
  3. Automated Management and Optimization (Alfred’s Efficiency). Imagine Alfred organizing the Batcave. He organizes it, so that the most needed gadgets are the easiest to access. DI Platforms organize data automatically based on its usage.
  4. Enhanced Governance and Privacy (Alfred Guarding Your Secrets). Alfred ensures the secrets of Batman's identity and missions are safe. Similarly, DI Platforms protect sensitive data, automatically identifying and controlling access to it.
  5. Supporting Batman's Tools (First-Class Support for AI Workloads). Alfred maintain sand enhances Batman's gadgets and tools. DI Platforms improve AI applications by connecting them to relevant and meaningful data.

In summary, a DI Platform is like having an 'Alfred' for your company's data. It understands and manages all the data efficiently. It responds in the language of the company. It keeps everything secure. And it enhances the overall functioning of data-related tasks – all with the sophistication and intelligence of Batman's trusted butler.

The DI Platform is like having your own butler for your company's data. And not just any butler. You get Alfred, Batman's butler.


Databricks as a DI Platform

Databricks has been building a DI Platform.?

Databricks' DI Platform contains Lakehouse, Unity Catalog, a Data Intelligence Engine (aka DatabricksIQ), Databricks AI (aka Mosaic AI), and much more

And Databricks is well positioned for this new category in data management: DI Platform. Why??

Because Databricks invented the lakehouse concept.?And a lakehouse is the foundation for a DI Platform.

The Databricks Lakehouse has two main features.?

  • A unified governance layer manages both data and AI together. It keeps all things organized and in control, like regular data or smart AI projects.?
  • A single unified query engine can handle a bunch of different queries. It can organize data and work with databases. It can also learn from data and analyze business information. So, it's like a one-stop-shop for all sorts of data needs.

And Databricks is also adding new AI features to make the DI Platform even better. Databricks acquired a company named MosaicML.?Databricks used Mosaic's knowledge to develop AI models in a Data Intelligence Engine. The name of this engine is DatabricksIQ.

DatabricksIQ

DatabricksIQ is a Data Intelligence Engine. It is like the brain of the Databricks platform. Think of it like a super-intelligent helper (like Alfred) for dealing with lots of information. It uses AI to enhance the way data is managed and utilized across the Databricks platform. It's a smart system that helps process and understand lots of data.?

DatabricksIQ manages the company's data, just like Alfred helps Batman in every mission.?

  • Setting the Knobs (Tuning the Batcave's Systems). Imagine Alfred fine-tuning the Batcave's various systems and gadgets. He adjusts everything to make sure it runs in a smooth and efficient way. In the data world, "setting the knobs" means adjusting the data platform to make it work better. This involves organizing data, like indexing columns and dividing it into partitions. These strengthen the entire system (the lakehouse). Alfred’s adjustments improve the Batcave's functionality. And these data adjustments lead to lower costs and better performance for users.
  • Improving Governance (Organizing the Batcave's Inventory). Think of Alfred labeling and categorizing all the gadgets and equipment in the Batcave. In the data platform, this is like adding descriptions and tags to all the data (in Unity Catalog). It helps users understand and find the right data quickly. This is like how Batman can easily find what he needs thanks to Alfred’s organization. This organization helps the platform search and use data well. They also make sure data is managed properly.
  • Enhancing AI Assistant (Upgrading Batcave's AI Tools). Alfred improves the Batcave's computer systems, making them more responsive to Batman's needs. In a data platform, we improve the AI assistant to understand and generate programming languages such as Python and SQL. The Batcomputer can now understand and respond better to Batman's specific commands.
  • Speeding Up Queries (Quick Response from Batcave's Systems). Alfred anticipates what Batman might need. Then he ensures the Batcave's systems can provide it quickly. In a data platform, this is akin to making data queries faster. Databricks IQ will predict what data will be needed and prepare it in advance. It ensures users get the information they need quickly.
  • Optimizing Resources (Efficient Use of Batcave's Capabilities). Finally, Alfred makes sure the Batcave uses its resources wisely, not wasting energy or space. In the data platform, this is similar to managing resources in Delta Live Tables and Serverless Jobs. The system changes its resources depending on the workload. It's similar to how Alfred adjusts the Batcave's resources for Batman's missions. This makes sure everything runs efficiently and saves money.

Integration with Mosaic (Databricks AI)

Databricks believes that DI platforms should make it easier for businesses to create AI applications. Therefore, Databricks is combining DatabricksIQ with their AI platform, Mosaic AI. This combination will help businesses build AI applications that can understand their data.?

Mosaic AI provides various features that let businesses integrate their data directly into AI systems. This means that businesses can use their own data to make AI tools. And these AI tools are more tailored and effective for their specific needs.

  • You can create chatbots that can provide high-quality, relevant responses. (Remember RAG from Chapter 1 ?) RAG allows you to search for the right information from a large set of data. Then you can use that information to answer a question accurately.
  • You can train a custom AI model from scratch. Or you can start with a pre-trained model and fine-tune it with your data. Mosaic Training offers choices for how to train your AI models.?
  • You can analyze data using AI models, without dedicated servers. This ensures well-organized, controlled, and high-quality data use. MosaicML Inference (or Databricks Foundation Model APIs) provide secure serverless Inference. And Unity Catalog ? organizes and governs data access.?
  • You can use a complete and reliable system for MLOps (See Chapter 3) . MLflow is a fantastic open source mlops tool for managing AI and ML projects. It makes sure that the data used in these projects can be used, watched, and checked for any issues,

Basically, this integration helps businesses use AI to meet their specific needs.

DatabricksIQ and Mosaic having a drink. "Everyone needs a DI Platform on top of Lakehouse."

Summary

Databricks sees a future where AI will change all types of software. Especially those that deal with data. In the past, using and managing these data programs has been difficult. However, DI Platforms are set to make a big difference. These tools make it easier to work with data, It can simplify tasks like searching and managing data.

These platforms also have a deep understanding of data. This is key for building advanced AI applications for businesses. Businesses can use these platforms to lead and innovate in their industries. DI Platforms are becoming essential for organizations. It enables them to develop new and advanced data and AI applications.


About the author: Maria Pere-Perez

The opinions expressed in this article are my own. This includes the use of analogies, humor and occasional swear words. I currently work as the Director of ISV Technology Partnerships at Databricks. However, this newsletter is my own. Databricks did not ask me to write this. And they do not edit any of my personal work. My role at Databricks is to manage partnerships with AI companies, such as Dataiku, Pinecone, LangChain, Posit, MathWorks, Plotly, etc... In this job, I'm exposed to a lot of new words and concepts. I started writing down new words in my diary. And then I thought I’d share it with people. Click "Subscribe" at the top of this blog to learn new words with me each week.

Jorge Gimenez De Quinto

Global Field Enablement | Driving Scalable Growth through Innovation and Simplification @ Databricks

1 年

This is brilliant, Maria Pere-Perez! Loved the simplicity and the analogies to explain and connect all these concepts together! Definitely worth reading if visiting the in-laws! ????

Albert R.

Client Technical Specialist, Chief Database Architect, Northeast US @ Mphasis || Health AI @ DocNote.ai || Generative AI Search Evaluating LLM's @ MetaRAG.ai

1 年

Maria Pere-Perez, enjoyed your language on relevant responses. We passed your language to our engine at www.DocNote.ai, which does not require a reward model or multiple passes across an LLM. Thanks again for the insights!

  • 该图片无替代文字
Samuel Tan

APJ (Gen)AI @ Databricks ?? | Futurist & Trail Blazer ?? | Youth Servant Leader @ New Creation Church | Brickserves CSR Leader | VMware Alumni ??

1 年

Brilliant analogy to explain the new data intelligence platform paradigm ????

要查看或添加评论,请登录

社区洞察

其他会员也浏览了