登录查看更多内容

FeatureStore is the new standard (and the golden egg)

Jiri Steuer

Architect??Data/App, MLOps+/AI/ML

发布日期: 2022年1月9日

Why I think the FeatureStore is the new standard (and the golden egg) for data science.

There are several strong indications for my statement. It is also important to note that the successful use of the new ML/AI model for data science is not only about building a model, but also about deploying and using this model in production. I see these very common and important problems in many industries (telco operators, banks, financial services, energy, etc.), where importance grows with the size of the company (small/garage companies can avoid this):

Missing or limited capabilities

Limited sharing of know-how, missing model storage (with versioned predictors, scores and descriptions) and missing training data storage. A look at history is also very important for model comparison, rebuilding and monitoring.
Missing automation of delivery life cycle (for data wrangling, model re-builing/testing and deployment of model to production). This means that many repetitive tasks are manual or semi-manual and it is not only about standard DevOps/DevSecOps but about MLOps.
Missing real-time storage in production for ability to calculate intraday counters/predictors, keep dynamic/behavioral view to client/party, etc. and usage this one during real-time ML/AI.
Missing model playground for data scientists, where they can build/tune their data models with expected performance, relevant rights and without necessity to have large IT support
Missing high-performance and real-time production environment for execution of models (typically in python and maintaining with the appropriate SLA, HA based on your BCM)
Limited possibility of model monitoring, in many cases monitoring is based on common tools without a link to the model repository, where it is possible to see and use relevant predictors efficiently, etc.
and others

The lack of the above capabilities can frequently generate other negative impacts

Long time to market/delivery and thus loss of new opportunities
Companies focusing only on low hanging fruits. Typically main priority has pricing then partly support of sales/offers/retention/collections and in many cases they avoid areas such as security, anti-fraud and others.
There is a problem to use the model in production when high throughput is needed (typically issues with performance >10k calls per second and relation to global e-commerce, synergy with event streaming, client/party behavioral, etc.)
It is difficult to maintain the quality of the model and calibrate the model due to the time consuming
Limited ability to do data mining/discovery due to the time consuming and delivery pressure
You can often see application of blind variable pattern, where tracking content over time is problematic
and others

Conclusion

You can eliminate these and other issues by using FeatureStore with MLOps (and this is the golden egg), but keep in mind that it's not just about the right tools, but about focusing on these four main topics:

领英推荐

July Junk Drawer of Data

Lori MacVittie 7 个月前

DATA Pill #048 - Zero-ETL, Chat GPT and why NOT to use…

Adam Kawa 1 年前

DATA Pill #095 - Real-Time RAG, pick between Kimball…

Adam Kawa 1 年前

people (keep the team and knowledge)
technology (choose relevant solution for FeatureStore and MLOps)
processes (setup whole life cycle including the governance and management)
related areas (thinking about linkage to Data Management, Security Management, Infrastructure, Operation, BCM and Legal & Complience)

Do you have the same or similar experience? Please, post your comment.

And finally, something for fun

I heard a few times from business that FeatureStore will be universal hammer for everything but it is really not true. Please, do not mix terms:

RT-ODS and real-time storage in FeatureStore you can really see different capabilities and functions
DevOps/DevSecOps and MLOps also different scope and expectation
Feature Store (off-line part) and DataLake, you can see different data amount, different view to data quality processes, different type of data models (relational vs key-value model, etc.)
...

It will be nice to discover holy grail and simplify everything but please stick to reality and not dreams. “There is really a big difference between knee-boots and watches, even though they both stretch” ;-)

Venkata Pingali

Scribble Data | AI for Finance | Knowledge Agents | Co-Founder

3 年

Thoughtful note. I concur. I would only add that the many dimensions and contexts of the problem means that there will be multiple design points in the space each of which is fit for purpose under certain conditions.

2 次回应

查看更多评论

要查看或添加评论，请登录

Jiri Steuer的更多文章

The performance comparison between the Cassandra version 4.1 and 5

2024年12月5日

The performance comparison between the Cassandra version 4.1 and 5

I expect you know that Apache Cassandra is an open-source distributed NoSQL database designed to process large amounts…

15 条评论
Cleaning & Archiving - Accelerate applications and reduce TCO (Part I.)

2022年8月4日

Cleaning & Archiving - Accelerate applications and reduce TCO (Part I.)

Data is, was and will be the foundation on which everything is built. Topics such as data literacy, the data lifecycle,…

7 条评论
Why didn't I make my own FeatureStore?

2022年5月22日

Why didn't I make my own FeatureStore?

Let me describe in the six steps, why in my case making my own FeatureStore (as part of MLOps) didn't make sense. BTW:…

12 条评论

FeatureStore is the new standard (and the golden egg)

Jiri Steuer

Architect??Data/App, MLOps+/AI/ML

领英推荐

Jiri Steuer的更多文章

社区洞察

其他会员也浏览了

Issue #4: Marvelous MLOps

Scaling Multi-Agent Systems with Data Pipelines: Solving Real-World Industrial Challenges

The journey to AI

How to detect drift with Evidently and MLFlow

The Intersection of Data Engineering and MLOps: Building the Backbone for Machine Learning Success

MLOps: Modeling phase

Demystifying Data Science, Part V: AutoML

Art of Data Newsletter - Issue #19

Knowledge Modeling | The story continues…

Data Structures and Algorithms - An Overview of The Four Categories of Data Structures and Corresponding Algorithms

领英推荐

Jiri Steuer的更多文章

The performance comparison between the Cassandra version 4.1 and 5

Cleaning & Archiving - Accelerate applications and reduce TCO (Part I.)

Why didn't I make my own FeatureStore?

社区洞察

其他会员也浏览了

Issue #4: Marvelous MLOps

Scaling Multi-Agent Systems with Data Pipelines: Solving Real-World Industrial Challenges

The journey to AI

How to detect drift with Evidently and MLFlow

The Intersection of Data Engineering and MLOps: Building the Backbone for Machine Learning Success

MLOps: Modeling phase

Demystifying Data Science, Part V: AutoML

Art of Data Newsletter - Issue #19

Knowledge Modeling | The story continues…

Data Structures and Algorithms - An Overview of The Four Categories of Data Structures and Corresponding Algorithms