Pitfalls In Enterprise ML Strategy

Pitfalls In Enterprise ML Strategy

   Each BI strategy presentation talks about machine learning and actionable insights. It looks magical and exciting on slides, but ground reality is totally different. For many organizations, ML is coming down from peak of inflated expectations to dissatisfaction. In this article I will talk about four main pitfalls that corporate should avoid to reach target state- “data driven organization”.

Not Using ETL Tools

  60% of efforts in ML are related to data preparation. Python is best language for data mining and exploration. This should be used during model building phase but should that same script deployed in production?.  If your answer is “Yes” then please note this leads to complex mesh of data pipelines and soon it becomes unmanageable. 

If right governance and ETL tooling is not used, then it creates tech debt.  Organizations lose agility as they must spend more time in fixing data quality issues.  This totally derails focusing on AI and ML.

Multiple Tools

  I have seen many teams discussing about what tool to be used for which use case. Or which vendor product should be bought.  Influence of vendors and preferences of tech communities within organization makes things worst.  There are so many libraries, languages and vendor tools to achieve one single goal – “build models to help with predictions”.   Many times, even experts fail to realize this. All it needs is selection of one single appropriate tooling. Other option is to build framework to bring it all together.

Model Lifecycle Management

 Building model and training model can be done easily. Many times, It doesn’t take more than 20 lines of code.   Complex part is deploying model, version controlling and tracking it’s performance.   Integrating models with REST APIs or deploying it as a scoring engine are not solved cleanly even by tech savvy organizations.  Many times, each new model goes through same learning process leading to longer time to market.  Above point (Multiple Tooling) makes this problem highly complex.  Simple solution for this is to build model deployment frameworks - I will write about it in next article.

Over Engineering

    I have been part to many discussions where people debated on “how to train model in distributed fashion” or “using in-memory database to support training process”.  More than 95% of ML use cases or many companies do not need this. If organizations are working with structured text data then final dataset for model training doesn’t go beyond few Gigabytes.  If we are trying to solve high end problems like – Computer vision, image recognition, audio tagging then that might need more resources.  Here focus is diverted from solving business requirement to an arbitrary technical requirement.

Rama Nimmagadda

Helping people make better decisions

3 年

Excellent points Sudhir Jangam. This is definite learning for me. Looking forward to your article on model deployment frameworks

回复

要查看或添加评论,请登录

Sudhir Jangam的更多文章

  • Beyond Imitation: Crafting Data Strategies That Suit Your Unique Business Needs

    Beyond Imitation: Crafting Data Strategies That Suit Your Unique Business Needs

    In the age of data-driven decision-making, many companies look to tech giants like Google, Amazon, and Facebook as the…

    4 条评论
  • Cloud Strategy - Myths and Realities

    Cloud Strategy - Myths and Realities

    Cloud is one of the biggest buzzword for years. Now it’s on CEOs top agenda, technology and business teams are blindly…

  • Geovisualization on COVID19

    Geovisualization on COVID19

    In today’s world enterprises are processing lots of data. That data is of no use if it can’t provide any actionable…

  • Unnoticed Gem - HBASE

    Unnoticed Gem - HBASE

    Last decade of technology and data world was dominated by Hadoop and NoSQL. Organizations were racing to adopt these…

    2 条评论
  • Build REST services on AWS

    Build REST services on AWS

    Building REST APIs that are secure, scalable and manageable is quite a challenging task. You can read my earlier blog…

  • Empowering machine learning architecture using D3Js.

    Empowering machine learning architecture using D3Js.

    We all know a saying “A picture is worth a thousand words”. This statement has never been more accurate than in…

  • REST APIs on AWS

    REST APIs on AWS

    I started this as a small project to build a RESTful API to serve data in RDBMS. Aim was to build RESTful API on AWS…

    5 条评论
  • Artificial Intelligence

    Artificial Intelligence

    With evolution of computing systems and reduction in hardware cost theories and concepts are getting into reality…

    3 条评论

社区洞察

其他会员也浏览了