登录查看更多内容

Azure Databricks?—?An Intro

Nuzhi Meyen

Data Science | Analytics

发布日期: 2018年12月2日

For those of you whom are familiar with cloud and the machine learning field Azure and Databricks are two terms which you have probably heard quite often. For those who have not heard of them before, Azure is Microsoft’s cloud platform for IaaS, PaaS and much more services, while Databricks is an Unified Analytics by Matei Zaharia and the rest of the team behind Apache Spark. Databricks is also available on AWS but for the purposes of this article I will be primarily touching on the Azure variant in this brief intro.

The value proposition outlined by Databricks is that it helps “accelerate innovation by unifying data science, engineering and business” . The key benefits that I see in Databricks is the value it brings to both data engineers as well as the data scientists allowing complex ETL pipelines as well as ecosystem integration across a variety of services such as Hadoop, Kafka, Parquet and Tensorflow to be carried out seamlessly reducing the time taken to deploy the latest in class AI algorithms to production from development and QA environments. (Yep, I am a fan — As you may have gathered from the above sentence :) )

In terms of the cloud offering with respect to Databricks in terms of Azure and AWS as well the infrastructure complexity is minimized to a great deal. By offering the Pay-As-You-Go model for spinning and the auto-scaling feature for Databricks clusters it allows the data analytics/business intelligence and IT teams of an organization to “reap the benefits of a fully managed service” and “focus more on innovation” to quote verbatim from the Databricks website.

In addition with the 99.99% SLA offered by most cloud vendors this enables organization to confidently run their business-critical applications. Azure Databricks offers integration with Azure Active Directory (AD) and also integrates with Azure Databases and data stores such as, SQL Data Warehouse, Cosmos DB, Data Lake Store and Blob Storage.

I plan to cover a bit more on later articles on how Spark works on Databricks as well as a project being outlined with the Analytics/BI team at John Keells IT to build a Change-Data-Capture (CDC) mechanism for database logs using Apache Kafka with Databricks which is still in its infancy and where we are debating the pros and cons of doing so.

要查看或添加评论，请登录

Nuzhi Meyen的更多文章

Why Model Context Protocol Could Be the Future for LLMs (and Replace RAG)

2025年3月16日

Why Model Context Protocol Could Be the Future for LLMs (and Replace RAG)

If you're exploring advanced language models (LLMs) like GPT, you've probably heard of Retrieval-Augmented Generation…
Apache Spark Interfaces?—?RDD’s, Dataframes & Datasets

2019年1月13日

Apache Spark Interfaces?—?RDD’s, Dataframes & Datasets

Apache Spark in a nutshell,is an open-source powerful distributed querying and processing engine originally developed…
Building and Installing TensorFlow on CentOS 6.x

2018年3月16日

Building and Installing TensorFlow on CentOS 6.x

CentOS — Image Credits: www.rosehosting.
A Survey of Deep Learning Frameworks

2017年8月20日

A Survey of Deep Learning Frameworks

Currently a limited variety of tools are available in terms of deep learning frameworks since they implement algorithms…

2 条评论
In memory databases : An Introduction (Open Source & Proprietary)- Part I

2017年1月17日

In memory databases : An Introduction (Open Source & Proprietary)- Part I

Well before I begin to touch on this topic I have got to say that the title is a lil’ misleading as I will touch more…
Building from virtually scratch or How I ported a program from Fortran to C++

2016年12月26日

Building from virtually scratch or How I ported a program from Fortran to C++

As part of my undergrad research I was required to rewrite an algorithm for estimating the parameters of a mathematical…

See all articles

Azure Databricks?—?An Intro

Nuzhi Meyen

Data Science | Analytics

Nuzhi Meyen的更多文章

社区洞察

其他会员也浏览了

Read from Kafka & Write to Snowflake via Spark Databricks

Databricks Serverless Compute

Supervise your Databricks Clusters using AWS-managed open-source services.

Learn Apache Spark ( Databricks ) - Step by Step Guide