登录查看更多内容

Pitfalls In Enterprise ML Strategy

Sudhir Jangam

Enterprise Data Architect at Barclaycard

发布日期: 2021年6月19日

Each BI strategy presentation talks about machine learning and actionable insights. It looks magical and exciting on slides, but ground reality is totally different. For many organizations, ML is coming down from peak of inflated expectations to dissatisfaction. In this article I will talk about four main pitfalls that corporate should avoid to reach target state- “data driven organization”.

Not Using ETL Tools

60% of efforts in ML are related to data preparation. Python is best language for data mining and exploration. This should be used during model building phase but should that same script deployed in production?. If your answer is “Yes” then please note this leads to complex mesh of data pipelines and soon it becomes unmanageable.

If right governance and ETL tooling is not used, then it creates tech debt. Organizations lose agility as they must spend more time in fixing data quality issues. This totally derails focusing on AI and ML.

Multiple Tools

I have seen many teams discussing about what tool to be used for which use case. Or which vendor product should be bought. Influence of vendors and preferences of tech communities within organization makes things worst. There are so many libraries, languages and vendor tools to achieve one single goal – “build models to help with predictions”. Many times, even experts fail to realize this. All it needs is selection of one single appropriate tooling. Other option is to build framework to bring it all together.

Model Lifecycle Management

Building model and training model can be done easily. Many times, It doesn’t take more than 20 lines of code. Complex part is deploying model, version controlling and tracking it’s performance. Integrating models with REST APIs or deploying it as a scoring engine are not solved cleanly even by tech savvy organizations. Many times, each new model goes through same learning process leading to longer time to market. Above point (Multiple Tooling) makes this problem highly complex. Simple solution for this is to build model deployment frameworks - I will write about it in next article.

Over Engineering

I have been part to many discussions where people debated on “how to train model in distributed fashion” or “using in-memory database to support training process”. More than 95% of ML use cases or many companies do not need this. If organizations are working with structured text data then final dataset for model training doesn’t go beyond few Gigabytes. If we are trying to solve high end problems like – Computer vision, image recognition, audio tagging then that might need more resources. Here focus is diverted from solving business requirement to an arbitrary technical requirement.

Rama Nimmagadda

Helping people make better decisions

3 年

Excellent points Sudhir Jangam. This is definite learning for me. Looking forward to your article on model deployment frameworks

查看更多评论

要查看或添加评论，请登录

Sudhir Jangam的更多文章

Beyond Imitation: Crafting Data Strategies That Suit Your Unique Business Needs

2024年11月15日

Beyond Imitation: Crafting Data Strategies That Suit Your Unique Business Needs

In the age of data-driven decision-making, many companies look to tech giants like Google, Amazon, and Facebook as the…

4 条评论
Cloud Strategy - Myths and Realities

2020年5月23日

Cloud Strategy - Myths and Realities

Cloud is one of the biggest buzzword for years. Now it’s on CEOs top agenda, technology and business teams are blindly…
Geovisualization on COVID19

2020年5月15日

Geovisualization on COVID19

In today’s world enterprises are processing lots of data. That data is of no use if it can’t provide any actionable…
Unnoticed Gem - HBASE

2019年5月8日

Unnoticed Gem - HBASE

Last decade of technology and data world was dominated by Hadoop and NoSQL. Organizations were racing to adopt these…

2 条评论
Build REST services on AWS

2018年12月8日

Build REST services on AWS

Building REST APIs that are secure, scalable and manageable is quite a challenging task. You can read my earlier blog…
Empowering machine learning architecture using D3Js.

2018年7月21日

Empowering machine learning architecture using D3Js.

We all know a saying “A picture is worth a thousand words”. This statement has never been more accurate than in…
REST APIs on AWS

2018年4月22日

REST APIs on AWS

I started this as a small project to build a RESTful API to serve data in RDBMS. Aim was to build RESTful API on AWS…

5 条评论
Artificial Intelligence

2018年3月2日

Artificial Intelligence

With evolution of computing systems and reduction in hardware cost theories and concepts are getting into reality…

3 条评论

See all articles

Pitfalls In Enterprise ML Strategy

Sudhir Jangam

Enterprise Data Architect at Barclaycard

Not Using ETL Tools

Multiple Tools

Model Lifecycle Management

Over Engineering

Sudhir Jangam的更多文章

社区洞察

其他会员也浏览了

How to approach a Machine Learning Project ?

DataOps vs. MLOps: Understanding the Differences and Choosing the Right Approach

ML Systems for Business: A Step-by-Step Guide

DATA Pill #048 - Zero-ETL, Chat GPT and why NOT to use Kubeflow

Why Data Science and AI Are the Ultimate Career Choices of the Future

24 Ultimate Data Science (ML) projects to work on in 2022

ML and CI/CD Pipelines for Unstructured datasets: Efficiency and Optimization Investigation

Machine Learning in Predictive Analytics

Revolutionizing Data Engineering: 5 Trends to Transform Your Business in 2024

What Are Data, Machine Learning, and MLOps Pipelines (ML4Devs Newsletter, Issue 14)

Not Using ETL Tools

Multiple Tools

Model Lifecycle Management

Over Engineering

Sudhir Jangam的更多文章

Beyond Imitation: Crafting Data Strategies That Suit Your Unique Business Needs

Cloud Strategy - Myths and Realities

Geovisualization on COVID19

Unnoticed Gem - HBASE

Build REST services on AWS

Empowering machine learning architecture using D3Js.

REST APIs on AWS

Artificial Intelligence

社区洞察

其他会员也浏览了

How to approach a Machine Learning Project ?

DataOps vs. MLOps: Understanding the Differences and Choosing the Right Approach

ML Systems for Business: A Step-by-Step Guide

DATA Pill #048 - Zero-ETL, Chat GPT and why NOT to use Kubeflow

Why Data Science and AI Are the Ultimate Career Choices of the Future

24 Ultimate Data Science (ML) projects to work on in 2022

ML and CI/CD Pipelines for Unstructured datasets: Efficiency and Optimization Investigation

Machine Learning in Predictive Analytics

Revolutionizing Data Engineering: 5 Trends to Transform Your Business in 2024

What Are Data, Machine Learning, and MLOps Pipelines (ML4Devs Newsletter, Issue 14)