What is Databricks?
Problem
Databricks, Inc. is a data, analytics, and artificial intelligence (AI) company founded by the original creators of Apache Spark. The Platform as a Service (PaaS) has evolved over the years with support on Microsoft Azure and Amazon cloud-based platforms. Databricks purchased six companies to add features to its offerings, such as ML OPS, Data Governance, and Generative AI. As a new data engineer, how can I quickly get up to speed using the Databricks platform?
Solution
At the core of Databricks' offering is the Apache Spark Engine. Initially, this engine was written in Object Oriented Java (Scala). However, the demands of big data have increased, requiring additional speed. Databricks added Photon to the Runtime engine. Photon is a new vectorized engine written in C++. The image below shows the traditional offerings from the Spark Ecosystem. These four areas will be covered at a summary level today.
Business Problem
Our manager has asked us to provide a high-level overview of the Databricks ecosystem to our senior management. The company has a large amount of data in an aging on-premises data center. The goal is to understand what is Databricks and associated ecosystem for business intelligence, data analytics and decision making.
The above table outlines the topics that will be discussed in this MS SQL TIPS article today. As a bonus, the first three topics will have some simple examples.