Databricks : Intelligence 2.0, Delivered!

Databricks : Intelligence 2.0, Delivered!

Revenue scales big

?Databricks, told investors on Wednesday that annualized revenue will reach $2.4B by the midpoint of this year. Annualized sales through July, or the first six months of fiscal 2025, will increase 60% from a year earlier. Snowflake did $828.7 last quarter and has forecasted $3.3B for FY25, growing 34%

https://www.cnbc.com/2024/06/12/databricks-says-annualized-revenue-to-reach-2point4-billion-in-first-half.html


Credit: @tomasz tunguz


Credit: @tomasz tunguz


Databricks Intelligence Platform going fully serverless


100% Serverless


UC is open sourced with immediate effect

Databricks has open sourced Unity Catalog following Snowflake Polaris Catalog move https://www.prnewswire.com/news-releases/databricks-open-sources-unity-catalog-creating-the-industrys-only-universal-catalog-for-data-and-ai-302170787.html

?The catalog supports any data format and compute engine, according Databricks. It can read tables with Delta Lake, Apache Iceberg and more while supporting Iceberg representational state transfer (rest) Catalog and Hive Metastore (HMS) interface standards. It also promises to ensure unified governance across tabular, non-tabular data, and AI assets including ML models and generative AI tools. Unity Catalog OSS interoperates with Microsoft Azure, Amazon Web Services, Google Cloud Platform, Salesforce, Apache Spark, Trino, DuckDB, Daft, PuppyGraph, dbt Labs, Confluent,, Fivetran, Granica, Immuta, Informatica,, LangChain, Tecton and more. More than 10,000 enterprises leverage Unity Catalog.


@Matei Zaharia - Unity Catalog Open source


Lakeflow Connect for Data Ingestion launched

?Launched a tool called Lakeflow connect for data ingestion https://www.databricks.com/product/data-ingestion. Its own data engineering solution that can handle data ingestion, transformation and orchestration and eliminates the need for a third-party solution. Databricks is rolling out the LakeFlow service in phases. First up is LakeFlow Connect, which will become available as a preview soon. The company has a sign-up page for the waitlist?here. At its core, the LakeFlow system consists of three parts.

  • The first is LakeFlow Connect, which provides the connectors between the different data sources and the Databricks service. It’s fully integrated with Databricks’ Unity Data Catalog data governance solution and relies in part of technology from Arcion. Databricks also did a lot of work to enable this system to scale out quickly and to very large workloads if needed. Right now, this system supports SQL Server, Salesforce, Workday, ServiceNow and Google Analytics, with MySQL and Postgres following very soon.
  • ?The second part is LakeFlow Pipelines, which is essentially a version of Databricks’ existing Delta Live Tables framework for implementing data transformation and ETL in either SQL or Python. Ghodsi stressed that LakeFlow Pipelines offers a low-latency mode for enabling data delivery and can also offer incremental data processing so that for most use cases, only changes to the original data have to get synced with Databricks.
  • ?The third part is LakeFlow Jobs, which is the engine that provides automated orchestration and ensures data health and delivery. “So far, we’ve talked about getting the data in, that’s Connectors. And then we said: let’s transform the data. That’s Pipelines. But what if I want to do other things? What if I want to update a dashboard? What if I want to train a machine learning model on this data? What are other actions in Databricks that I need to take? For that, Jobs is the orchestrator”.

Lakeflow Connect


LakeFlow Connect


?New Mosaic AI tools

?Databricks is launching five new?Mosaic AI tools?at its conference: Mosaic AI Agent Framework, Mosaic AI Agent Evaluation, Mosaic AI Tools Catalog, Mosaic AI Model Training and Mosaic AI Gateway https://www.databricks.com/product/machine-learning and https://techcrunch.com/2024/06/12/databricks-expands-mosaic-ai-to-help-enterprises-build-with-llms/

?

  • Mosaic AI Agent Framework aims to provide an easy way of building retrieval augmented generation (RAG) applications with foundation models and enterprise data.
  • Mosaic AI Agent Evaluation is an AI-assisted evaluation tool that promises to automatically determine if outputs are high quality. It also provides a user interface (UI) for getting stakeholder feedback.
  • Mosaic AI Model Training is used for fine-tuning open source foundation models with an organization’s private data. The models are fully owned and controlled by the customer and should produce better results for specific use cases. Smaller models fine-tuned this way should also be faster and less expensive, with fewer parameters and less computing power needed.
  • Mosaic AI Gateway is a way for Databricks users to have a unified interface for querying, managing and deploying models – open source or proprietary – so that users can switch the large language models (LLMs) that power applications without complicated changes to the app code
  • Mosaic AI Tools Catalog: A Mosaic AI offering launching in private preview instead of public preview is the Mosaic AI Tools Catalog. The catalog will allow users to govern, share and register tools with Databricks Unity Catalog to make them more secure and discoverable, according to Databricks. This should make tool-enabled models usable in secure, governed ways while giving the tools discoverability across an organization.

?

Nvidia partnership ?

  • Not to be outdone by Snowflake announcing a new?collaboration?with Nvidia around integrating NeMo Retriever microservices into Snowflake Cortex AI, Databricks has revealed that it has its own expanded collaboration with NVidia
  • As part of the collaboration, Nvidia’s compute unified device architecture (Cuda) will come to Databricks’ Data Intelligence Platform, with the aim of boosting efficiency, accuracy and performance of AI development pipelines, according to Databricks.
  • This alliance means native support for Nvidia graphics processing units (GPUs) acceleration on the platform and native support for Nvidia-accelerated computing in the Databricks vectorized query engine Photon ?

Unified Monitoring is coming

(Details awaited)


Databricks Unified Monitoring


Delta sharing is growing

Delta Sharing?is an open protocol for secure real-time exchange of large datasets, which enables organizations to share data in real time regardless of which computing platforms they use. It is a simple?REST protocol?that securely shares access to part of a cloud dataset and leverages modern cloud storage systems, such as S3, ADLS, or GCS, to reliably transfer data. With Delta Sharing, a user accessing shared data can directly connect to it through pandas, Tableau, Apache Spark, Rust, or other systems that support the open protocol, without having to deploy a specific compute platform first. Data providers can share a dataset once to reach a broad range of consumers, while consumers can begin using the data in minutes. ?

  • More than 16,000 data recipients have used Delta Sharing to receive data and AI assets
  • Quadruple year-over-year growth in active Delta Shares between data providers and data recipients
  • More than 2,000 listings of datasets, AI models and solution accelerators are on the Databricks Marketplace
  • More than quadruple increase year over year in listings on Databricks Marketplace
  • 40 percent of Delta Sharing connections are through open connectors to Apache Spark, Microsoft Excel, Salesforce’s Tableau and other non-Databricks platforms


Delta Sharing Protocol


Delta sharing Architecture

AI/BI

Databricks AI/BI is a new type of business intelligence product built to democratize analytics and insights for anyone in your organization. Powered by data intelligence, AI/BI understands your unique data and business concepts by capturing signals from across your Databricks estate, continuously learning and improving to accurately answer your questions.

AI/BI features two complementary capabilities: Dashboards and Genie. Dashboards provide a low-code experience to help analysts quickly build highly interactive data visualizations for their business teams using natural language, and Genie allows business users to converse with their data to ask questions and self-serve their own analytics.

Databricks AI/BI is native to the Databricks Data Intelligence Platform, providing instant insights at massive scale while ensuring unified governance and fine-grained security are maintained across the entire organization


AI BI for Sales..
AI BI Genie



Shutterstock?ImageAI: This is a new text-to-image diffusion Image Generation Model powered by?Databricks?to generate high-fidelity, trusted images.

There is more - I will update this blog as it happens.

Note: This is assembled from various sources. All due credits apply where it is due. Feel free to tag appropriately.

Sankar Krishnan thanks for sharing captivating overview of the summit announcements!

Chad Troline

Democratizing Data and AI at Databricks

9 个月

As Jensen Huang said, “get on the train”. Make mistakes and be an early adopter. Disrupt or be disrupted

Marian Mozucha

SAP Solution Architect || AI enthusiast || AI security and compliance speaker & writer || AI ethics writer and lecturer

9 个月

Insightful!

要查看或添加评论,请登录

Sankar Krishnan的更多文章

  • Databricks 2024 State of Data AI Report

    Databricks 2024 State of Data AI Report

    2024 Databricks State of Data AI report is here. Summary below: The insights shared come from more than 10,000 global…

  • Data Observability for the Modern Data Stack

    Data Observability for the Modern Data Stack

    Gartner estimates that every year, poor data quality costs organizations an average of $12.9M.

    1 条评论

社区洞察

其他会员也浏览了