MLops vs. DevOps
Image credit: Nvidia

MLops vs. DevOps

MLops vs. DevOps

If you enjoy programming, datascience and WFH topics, you can subscribe to Datascience Learning Center?here . I cannot continue to write without tips, patronage and community support.

https://datasciencelearningcenter.substack.com/subscribe

Join 29 other paying subscribers. (the price of a cheap coffee)

How to build a better bridge?

Also, Snowflake vs. Databricks

MON AUGUST 15TH, 2022 11:40 AM MONTREAL, CANADA

Hey Guys,

Just as there is?Databricks vs. Snowflake, there is?DevOps vs. MlOps. While I’m not a technical person, I often find myself thinking about this.

For software developers this is already rather intuitive:

DevOps methodology helps improve communication between your developers and ops working on projects. It best serves the following purposes:

  • you can launch new features faster
  • increases the customer’s satisfaction and of developers too at the same time.
  • feedback loops help better communication

Key principles of DevOps:


  • Automation
  • Iteration
  • Self-service
  • Continuous improvement
  • Continuous testing
  • Collaboration

No alt text provided for this image

Machine Learning Operations (MLOps)

If you think of how all this plays out in the real world, there appears to be a lack of a good bridge between DevOps and MLOps. Correct me if I am wrong?

Leave a comment

AI has been heralded as the new “brains” for software applications, a role long held by databases. Think about it, ML models depend on specific combinations of hardware and software infrastructure. Without the right infrastructure, the models either cannot perform well enough to be viable or, in some cases, become prohibitively costly.

According to Databricks, MLOps stands for Machine Learning Operations. MLOps is a core function of Machine Learning engineering, focused on streamlining the process of taking machine learning models to production, and then maintaining and monitoring them. MLOps is a collaborative function, often comprising data scientists, devOps engineers, and IT.

No alt text provided for this image

How DevOps and MLOps operates together seems to be a bit lacking. There’s a lot of wasted inefficiency.

Today there is no efficient bridge between the creation of ML models and the process of getting them into production. To illustrate this: The average time to production for ML models is 12 weeks. That’s 4 months, it’s not ideal.

The MLOps loop can be complicated with some bottlenecks along the way: data collection, data processing, feature engineering, data labeling, model building, training, optimizing, deploying, risk monitoring, and retraining. And in each organization, different people and teams may own one or more steps.

Why AI Falls Flat

What’s worse, nearly half of the models are shelved for performance or cost reasons, which makes AI less transformational than many hoped. Organizations have to think better about how to integrate DevOps and MLOps, and what tools can help?

I’m sometimes reading SeattleDataguy maybe one of the best Substack’s on data science right now in 2022:

SeattleDataGuy’s Newsletter

Learn About End-To-End Data Flows (Data Engineering, MLOps, and Data Science)

This is more his realm of expertise.

Clearly in the real world reasons why A.I. isn’t so transformative have to be dealt with head one. If AI is to be the “brains” of applications, a world where ML models are heavily specialized, requiring unique and customized workflows and tools is problematic.

Companies like?Snowflake ?and?Databricks ?are looking to create easier access to applications, machine learning models, and dashboards through their data marketplaces. They want to be your data platform, not your data warehouse or lakehouse. - Seattle Data Guy

One of the reasons I like Seattle Data guy is because he’s also often a guest on YouTube podcasts, I find this supplements his Substack and LinkedIn posts well. In case you are wondering who this guy really is, it’s?Benjamin Rogojan .

Ben on what is Data Science

Ben Rogojan is a data engineering solutions architect with expertise in data architecture and statistics. He focuses on developing end-to-end data solutions that help take data from raw format into data products and analytics.

Ben has nearly 50k followers on?Medium . I believe he does consulting as well. I view him as definately a pioneer of Substack’s data science community as well. On his LinkedIn , he says he talks about #bigdata , #datainfra , #datascience , #dataengineering , and #datawarehousing . LinkedIn has an?incredible data science community ?(check out my list). I recommend you super-follow (tap on the notification bell) all of the people on this list.

MLOps Cycle

For developing machine learning solutions the standard lifecycle goes like this:

  • Requirement gathering
  • Exploratory data analysis
  • Feature engineering
  • Feature selection
  • Model creation
  • Model hyperparameter tuning
  • Model deployment
  • Retraining, if needed

The fact is once an ML model is trained and ready, we should be able to work with it as we do with any other software module because it is just code and data.

The theory goes that since DevOps came first, MLops has to integrate better with it and its loop cycle. It still seems to lack a good bridge. What do you think?

As you know, MLOps originated as a term to refer to a set of best practices to design, build, deploy and maintain machine-learning models in production. As it evolves, however, the scope has expanded to the whole of?ML lifecycle management.

It’s no surprise the?Blog of Databricks ?often mentions MLOps.

So the current reality is sub-optimal at most organizations. Siloed teams of data engineers, data scientists, IT ops professionals, auditors, business domain experts, and ML engineering teams operate in a patchwork arrangement that bogs down the process. It’s not good. This means A.I. isn’t being implemented properly.

According to some ML Engineers, when model creation and model deployment are forced together into one mega-process, however, it usually limits flexibility and choice in a way that creates obstacles. Organizations clearly need to re-vamp how they integrate their DevOps, MLOps vis-a-viz model creation as distinct from model deployment. I don’t know what the answer is, but these problems are unique to each organization and to the field as a whole.

Databricks vs. Snowflake

I really want to do a deep dive on the topic again sometime soon.

In some sense I view the Databricks vs. Snowflake debate also as symbolic. Snowflake is a?relational database management system?and analytics data warehouse for structured and semi-structured data.

Again, I’m not an engineer. Both are incredible companies. With enterprises large and small racing to build out their data infrastructure, one foundational piece these enterprise companies all need is an easy place to store their data.

Databricks, has auto-scaling of clusters but is supposedly not so user friendly. The UI is more complex as it is aimed at a technical audience. It requires more manual input when it comes to things like resizing clusters, updating configurations, or switching options. There is a steeper learning curve to overcome.

Databricks, which innovated what is called a?data lake, a place where you can dump all of your data – no matter the format. This is super convenient.

Some Terms


  • A?data warehouse?is the database of choice for general-purpose analytics, including reporting, dashboards, ad hoc, and any other high-performance analytics.
  • A?data lake?is a data store (only) for any raw structured, semi-structured, and unstructured data that makes data easily accessible to anyone. You can use it as a batch source for a data warehouse or any other workload.
  • A?data lakehouse?is often described as a new, open data management architecture that combines the best of a data lake with a data warehouse. The goal is to implement the best of a data lake and a data warehouse, and to reduce complexity by moving more analytics directly against the data lake, thereby eliminating the need for multiple query engines.

In reality in 2022, I think many companies use Databricks and Snowflake together, so they aren’t really direct competitors per se. That being said they are rising Giants that are overlapping. Functionally, Databricks and Snowflake have been steadily moving into each other’s core markets - ETL and data processing, and data warehousing/lakehousing - for some time as they both try to become a data platform of choice for multiple workloads.

I think overtime Databricks and Snowflake will create a better bridge between DevOps and MLOps, among others. This will reduce friction between A.I. model creation and model deployment, thereby reducing cost and improving efficiency making A.I. easier to implement in the real world.

On the business side, I cannot wait for Databricks to go public with an IPO. Snowflake?SNOW?1.95%↑ ?has a lot of great momentum. Incredibly it already has a market cap of $54.3 Billion, with gross margins of 64%. By the time it goes public, it could be worth approximately what Snowflake is worth or maybe a little less. Databricks is worth around $38 billion following its latest fundraise of $1.6 billion in August 2021, led by Counterpoint Global.

How do you see DevOps and MLops evolving together and the data science community forming on Substack or active on LinkedIn? I see some really good posts on LinkedIn and of course articles on Medium.

Thanks for reading! If you want to support the channel and allow me to continue to write Newsletters feel free to get access to more content.

If you enjoy programming, datascience and WFH topics, you can subscribe to Datascience Learning Center?here . I cannot continue to write without tips, patronage and community support.

https://datasciencelearningcenter.substack.com/subscribe

Join 29 other paying subscribers. (the price of a cheap coffee)

Tolulope Zechariah

Experienced and Versatile Professional: Ghostwriter | Copywriter | Historian | Researcher | Event Manager | Web Content Specialist | Social Media Manager | S. Chauffeur

2 年

Thanks for sharing

Dana Mayer

Get Hired or Promoted 2x Faster Doing Meaningful Work | Leadership Career Coach ?? Dog Lover | ?? Let's Take Your Career to the Next Level!

2 年

Anna Wall

Takahide Maruoka

Credly Top Legacy Badge Earner | ISO/IEC FDIS 42001 | ISO/IEC 27001:2022 | NVIDIA | Google | IBM | Cisco Systems | Generative AI

2 年

I believe that business efficiency will improve. On the other hand, however, the question is how it can be used for business. High value-added issues such as machine learning remain a challenge.

要查看或添加评论,请登录

Michael Spencer的更多文章

  • The Genius of China's Open-Source Models

    The Genius of China's Open-Source Models

    Why would an obscure Open-weight LLM out of China be worth watching? Just wait to see what happens in 2025. ?? In…

    6 条评论
  • First Citizen of the AI State: Elon Musk

    First Citizen of the AI State: Elon Musk

    Thank to our Sponsor of today's article. ?? In partnership with Encord ?? Manage, curate and annotate multimodal AI…

    13 条评论
  • The Future of Search Upended - ChatGPT Search

    The Future of Search Upended - ChatGPT Search

    Hey Everyone, I’ve been waiting for this moment for many many months. Upgrade to Premium (?—??For a limited time get a…

    8 条评论
  • Can India become a Leader in AI?

    Can India become a Leader in AI?

    Hey Everyone, As some of you may know, readers of Newsletters continue to have more and more readers from South Asia…

    9 条评论
  • NotebookLM gets a Meta Llama Clone

    NotebookLM gets a Meta Llama Clone

    “When everyone digs for gold, sell shovels”. - Jensen Huang Apple Intelligence is late and other phone makers are…

    6 条评论
  • Top Semiconductor Infographics and Newsletters

    Top Semiconductor Infographics and Newsletters

    TSMC is expanding globally and driving new levels of efficiency. Image from the LinkedIn post here by Claus Aasholm.

    2 条评论
  • Anthropic Unveils Computer Use but where will it lead?

    Anthropic Unveils Computer Use but where will it lead?

    Hey Everyone, This could be an important announcement, whereas the last two years (2022-2024) LLMs have showed us an…

    10 条评论
  • Why Tesla is not an AI Company

    Why Tesla is not an AI Company

    Hello Everyone, We have enough data now to surmise that Tesla won't be a robotaxi or robot winner. Elon Musk has helped…

    11 条评论
  • The State of Robotics 2024

    The State of Robotics 2024

    This is a guest post by Diana Wolf Torres - please subscribe to her Deep Learning Daily Newsletter on LinkedIn if you…

    4 条评论
  • The Datacenter Big Bang is about to start

    The Datacenter Big Bang is about to start

    Hey Everyone, I’m very drawn to the idea that a major datacenter expansion is underway that will change the future of…

    15 条评论

社区洞察

其他会员也浏览了