登录查看更多内容

MLOps and Burst Coding

Charles Givre

Experienced cyber security data scientist and data engineer. CISSP | Ex CIA, JP Morgan. GenAI | NLP | Python | SQL | Java | Speaker | Blackhat Instructor and O'Reilly Author | Classic car enthusiast.

发布日期: 2022年3月23日

I've written about this before but as a technical CEO and Co-Founder, my days are usually filled with meetings of various types. My day starts with a daily standup about sales and growth and can take any number of directions. Mondays usually have sprint planning meetings, Tuesdays exec meetings, Thursdays are meetings with investors etc. The unfortunate result is that I don't have large amounts of uninterrupted time for tech work and other work that requires intense concentration.?

Burst Coding: Coding for Those With No Time

Given my insane schedule, if I'm going to do any kind of technical work, it means that I have to do it in VERY short increments of time. This approach flies in the face of commonly accepted approach of software development which is that developers need long amounts of uninterrupted time to be productive. Given that I don't have long amounts of uninterrupted time, I had to develop a way to be productive and still sleep and spend time with my family. I call it?Burst Coding?and here's how it works.

Firstly, let me say that I've always believed that the way to write really bad code and spend a lot of time doing it is to simply just dive right in and start coding. When I teach classes, I always encourage students to think about what they are trying to do before they actually start writing code. My new found position has forced me to do is exactly that, but to an extreme. What I've realized is that if I know I'll only have 30 min of time to do development work, but I have a large project to work on, I'll set very small incremental goals for myself. Then in that limited time, try to achieve one of those goals. When I'm not actively coding, I'm mentally trying to sort out exactly how to achieve this goal. This way, I may only spend a few minutes actually coding, but I can still tackle complex problems.

As an aside, if I have to work on something non-coding related, I've learned that I have to make sure my IDE is closed, lest I get sucked into development work. Anyway, I thought I'd share something that I've been working on as I think it is pretty cool.?

I've Got a Model, Now What?

One of the major challenges in the data world is now known as MLOps, which Wikipedia defines as: "MLOps or ML Ops is a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently." From my perspective, one of the major holes in in the industry is model deployment. In other words, once you've built and trained an ML model, what do you do with it? How do you get it out of a Jupyter Notebook and put it into production??

For the last few years, I've been looking for better ways to deploy models once they've been trained. This topic actually is a bit of a sore one for me because a few years ago, I built a really effective model for detecting malicious administration activity on production servers. However, the company I was working for at the time had absolutely no way of deploying the model once it was built, so all my work effectively went in the bin, but that's a story for another day.

Anyway, getting back to the model thing, I've always thought it would be an interesting idea for a user to be able to serialize a machine learning model and include the predictions in a SQL query. I even included a really poor example of this in?Learning Apache Drill. The challenges in doing this are manifold:

Most people who create machine learning models, do so in Python, R, Spark or a few other tools. For this to be practical you have to let people do the modeling using whatever tools they currently use.
You have to be able to save not just the model, but the whole data pipeline going from raw data to features.

领英推荐

What is AI Coding?

Blockchain Council 3 个月前

Best AI-Powered Coding Assistant Tools in 2024

Hanu Koshti 1 个月前

How AI is Transforming Coding at Google: Over 60%…

Jo?o Fernandes 4 个月前

How do you save a model?

This is something I've been trying to figure out for some time. When I wrote?Learning Apache Drill?I had been following h2o's machine learning libraries which had a way of saving models created in h2o, and then reusing them in different languages. IE: You could write a model using their libraries in Python and then use the h2o Java SDK to make predictions. I experimented with this, but the code was clunky and didn't seem to lend itself well to what I was trying to do. I also found the MLleap project (https://github.com/combust/mleap) which seems to be defunct.

In the python ecosystem, the commonly taught approach is to pickle objects and then create docker-based micro services for this. There's even a module called?scikit-deploy?for this very purpose. From my perspective, it doesn't seem like there is a widely accepted solution for this problem.

With all that said, a week ago, I stumbled on Predictive Model Markup Language (PMML) which is an XML based language for serializing ML models and pipelines. PMML has been around for a long time, but more importantly is that there exist modules for preserving models and pipelines in PMML. This solves the first part of the problem: saving the models.?

How do you include a model in a query?

Ok, so let us stipulate that you can save a model. How do you do the next piece which is to actually pipe data through it and produce a result? I did some experiments with Drill and wrote some custom functions which allow a user to do just that. Basically, you can write a query like this:

SELECT ... predict('model.xml', feature1, feature2, feature3...) 
FROM <data>

The output for this function is a map with the predictions and probabilities.

Why would you want to do this?

Well... good question. The main reason that I was thinking was that if I as a data scientist want to allow others to use a model that I built I can do this. What's more, is that on the DataDistillr platform, I could wrap all that in a tidy view and the non-technical user would be able to work the model's output really easily. What's more, is that you could also publish the model's output via API so you could use it in Tableau or other tools. All of this without coding. What do you think? Good idea? Waste of time? Somewhere in between? Please let me know.

Tim Lortz

AI/DS/ML @ Databricks

2 年

Great advice Charles Givre ! I haven't launched my own startup lol, but I do find context switching from hours of customer calls into actually writing code to be pretty daunting. This was helpful. Regarding MLops, have you tried out MLflow? I think it could solve your pain points. Package up your Python, R, Spark models in an open source API that can be deployed in any number of ways (from Spark SQL UDFs to Docker containers). The model can include custom transformation logic, even non-ML frameworks. Also has a native model registry and experiment tracking server. Just FYI

Ken Besser

LifeCycle Lawyer and Motivational Speaker - I help people to be Great! All the Time!

2 年

Congratulations! on the new startup. Just in case you work yourself to death, do you have a good estate plan in place? If not, then let me know. I'll be happy to help you develop one. All best.

查看更多评论

要查看或添加评论，请登录

查看全部

MLOps and Burst Coding

Charles Givre

Experienced cyber security data scientist and data engineer. CISSP | Ex CIA, JP Morgan. GenAI | NLP | Python | SQL | Java | Speaker | Blackhat Instructor and O'Reilly Author | Classic car enthusiast.

Burst Coding: Coding for Those With No Time

I've Got a Model, Now What?

领英推荐

How do you save a model?

How do you include a model in a query?

Why would you want to do this?

更多精彩文章

社区洞察

其他会员也浏览了

How AI is Changing the Programmer’s Game

A new coding paradigm is emerging

Top AI Tools That Make Coding Easier for Everyone

Supercharge Your Coding with Local LLMs: A Step-by-Step Guide featuring Phi-3 Mini

10 Best AI Code Generators (July 2024).

November 05, 2023

Code generation: revolution or just mere abstraction?

The Future of Code: How AI is Revolutionizing Software Development

Why AI-Assisted coding doesn't always lead to faster or better code

Hot (Takes) Off the Presses: AI Coding in the News!

Burst Coding: Coding for Those With No Time

I've Got a Model, Now What?

领英推荐

How do you save a model?

How do you include a model in a query?

Why would you want to do this?

All Great Things Part 2: The Founder's Dilemma

2023年12月14日

All Great Things...

2023年12月4日

Why You Shouldn't Rely on GPT to Write Code

2023年7月26日

Tests in a GenAI World

2023年6月2日

Five Things I Learned Writing SQL with Gen AI

2023年3月31日

It's The Assumptions That Get You

2023年2月7日

ChatGPT, Meet DataDistillr! You’ll have lots to discuss!

2023年1月6日

Five Technologies That I Think Are Bullshit

2022年11月13日

We Launched! (Beta)

2022年9月28日

Joining Difficult Data: How to Join Data on Extracted Domains

2022年8月24日

社区洞察

其他会员也浏览了

How AI is Changing the Programmer’s Game

A new coding paradigm is emerging

Top AI Tools That Make Coding Easier for Everyone

Supercharge Your Coding with Local LLMs: A Step-by-Step Guide featuring Phi-3 Mini

10 Best AI Code Generators (July 2024).

November 05, 2023

Code generation: revolution or just mere abstraction?

The Future of Code: How AI is Revolutionizing Software Development

Why AI-Assisted coding doesn't always lead to faster or better code

Hot (Takes) Off the Presses: AI Coding in the News!