Databricks Vs Azure Machine Learning - a comparative study
Debi Prasad Rath
@AmazeDataAI- Technical Architect | Machine Learning | Deep Learning | NLP | Gen AI | Azure | AWS | Databricks
azure machine learning vs databricks:--
===============================
CREDIT- Microsoft Documentation
note:- ml - machine learning
Azure ml is a completely managed cloud service to create machine learning systems including end to end lifecycle tasks. starting from creating models to experiment it such that the best model gets selected, seems to be a prevalent ask till deployment. No wonder, azure machine learning can be trusted end-to-end at scale. as we are building models, be rest assured that experimenation will be helping you a lot to select your best model.
Conversely, Databricks creates a unified data platform such that one can build machine learning systems and perform lifecycle tasks there itself. it is a collabotive analytics platform, very easy and fast to build production?ready systems.
As data gets scaled up through different systems or workloads, performing machine learning lifecycle tasks seems a?tidious task. therefore using databricks will not only make it faster with distributed serverless computing, but also allows us to?augment systems with advanced security. Both are lying with a simple purpose to provide production ready machine learning systems but the only concern remains once we are done with model building activity, how we make it available for others?and monitor model predictions from it? Especially if that is a very complex system.?
So, how are we going to manage the system?
- have an independent serving infrastructure
- have an independent inference pipleine
- but this will take significant amount of time
- databricks with azure ml workspace comes as a rescue
Referring to the notes above, it seems like Azure Databricks and Azure ml has a many to many relation. The reasoning behind the same analogy is that, we can certainly use databricks for compute purpose and take the reference data back to Azure ml workspce?to leverage/perform lifecyle activities. Conversely, if we want to use databricks cluster for compute inside azure ml workspace and proceed ahead with inference in that same instance as well. it completely depends on the context of problem statement.?
consider one scenario:--
----------------------------------
let us assume that you have a very complex data pipeline where data gets collected from many source systems incrementally at different timestamps. "BigData" is used along with a strong dependency within data engineering team to create an intermediate data that is required for model training and more on. It turns out that, in data science team many?developers are working with an objective to create end-to-end machine learning process couretsy of effective collaboration. Referring both requirements which run hand in hand, we would definitely need databricks compute cluster to run a training experiment in parellel (say 100 runs), and azure ml workspace that runs model experiment through a lot of trained models. The whole process looks over simplified with this approach.?Let me make it simpler for you.
First things first we can create a specific ml compute on azure ml workspace and deploy the model object through docker containers. But this process can be time consuming as we need to create, get it running depending upon the use case need every time. In order to set the right context, Databricks can be used as compute and Azure ml can be used to run machine learning lifecyle activities via an automated machine learning capabilities. Often times, this is looking like a more sort of a ideal scenario.
ML Lifecycle with Azure databricks and Azure ML workspace:--
================================================
During any machine learning lifecycle one of the most crucial step is to collect and preprocess data from different sources.
- Databricks has got all the connectors to get the data
- data pipeline can make it easy
- preprocess and aggregatons
When it comes to build models databricks offers almost all state of the art toolkits/libraries/framework.
- pytorch, tensorflow,mxnet
Once we build the best model(iterative) , the next task is to serve it, and generate predictions.
- create a designated model serving layer, monitor it.
- use automated ml
- use databricks notebook as is
- do want to run models in parallel
领英推荐
- workflow:-
- create databricks workspace
- define number of cluster
- install PyPi dependecies (sdk, and etc. as per model req)
- run an automated experiment
- use reference data as specified
- submit an experiment
- create a azure ml workspace from databricks
- run parallel training on worker nodes (via databricks)
- get the best automated model
- deployment:-
- inference pipeline with all reports (pre-requisite)
- deploy it using the same databricks instance or docker cotainersdocker/ kubernetis
- track metrics and keep improving with different runs
a typical experiment process: --
========================
cluster (define worker nodes) ---> install libraries -----> automated experiment ----> run the experiment in databricks--azure ml?
----> get metrics -----> feedback and response-----> revisit and start again ---> automated experiment
breifly Azure ml:-
-----------------------
1- completely managed cloud service to build and manage machine learning systems
2- can not handle multiple source systems through different workloads, dependencies
3- end to end data pipeline operation relatively slow, time consuming
4- can be used to model training and deployment
5- a common toolkit/enabler for data science developers
6- can be used to perform machine learning lifecyle activities
breifly Databricks:--
--------------------------
1- databricks creates a unified data platform such that one can build machine learning systems
2- able to handle multiple source systems through different workloads, dependencies
3- end to end data pipeline operation fast, easy and robust
4- can be used to perform analytics, model training and deployment
5- a preferred toolkit/enabler across data/app team members
6- can be used for machine learning training compute