FeatureStore is the new standard (and the golden egg)
#featurestore #ml #ai #mlops

FeatureStore is the new standard (and the golden egg)

Why I think the FeatureStore is the new standard (and the golden egg) for data science.

There are several strong indications for my statement. It is also important to note that the successful use of the new ML/AI model for data science is not only about building a model, but also about deploying and using this model in production. I see these very common and important problems in many industries (telco operators, banks, financial services, energy, etc.), where importance grows with the size of the company (small/garage companies can avoid this):

Missing or limited capabilities

  1. Limited sharing of know-how, missing model storage (with versioned predictors, scores and descriptions) and missing training data storage. A look at history is also very important for model comparison, rebuilding and monitoring.
  2. Missing automation of delivery life cycle (for data wrangling, model re-builing/testing and deployment of model to production). This means that many repetitive tasks are manual or semi-manual and it is not only about standard DevOps/DevSecOps but about MLOps.
  3. Missing real-time storage in production for ability to calculate intraday counters/predictors, keep dynamic/behavioral view to client/party, etc. and usage this one during real-time ML/AI.
  4. Missing model playground for data scientists, where they can build/tune their data models with expected performance, relevant rights and without necessity to have large IT support
  5. Missing high-performance and real-time production environment for execution of models (typically in python and maintaining with the appropriate SLA, HA based on your BCM)
  6. Limited possibility of model monitoring, in many cases monitoring is based on common tools without a link to the model repository, where it is possible to see and use relevant predictors efficiently, etc.
  7. and others

The lack of the above capabilities can frequently generate other negative impacts

  1. Long time to market/delivery and thus loss of new opportunities
  2. Companies focusing only on low hanging fruits. Typically main priority has pricing then partly support of sales/offers/retention/collections and in many cases they avoid areas such as security, anti-fraud and others.
  3. There is a problem to use the model in production when high throughput is needed (typically issues with performance >10k calls per second and relation to global e-commerce, synergy with event streaming, client/party behavioral, etc.)
  4. It is difficult to maintain the quality of the model and calibrate the model due to the time consuming
  5. Limited ability to do data mining/discovery due to the time consuming and delivery pressure
  6. You can often see application of blind variable pattern, where tracking content over time is problematic
  7. and others

Conclusion

You can eliminate these and other issues by using FeatureStore with MLOps (and this is the golden egg), but keep in mind that it's not just about the right tools, but about focusing on these four main topics:

Simple view to AI and ML life cycle

  • people (keep the team and knowledge)
  • technology (choose relevant solution for FeatureStore and MLOps)
  • processes (setup whole life cycle including the governance and management)
  • related areas (thinking about linkage to Data Management, Security Management, Infrastructure, Operation, BCM and Legal & Complience)

Do you have the same or similar experience? Please, post your comment.

And finally, something for fun

I heard a few times from business that FeatureStore will be universal hammer for everything but it is really not true. Please, do not mix terms:

  • RT-ODS and real-time storage in FeatureStore you can really see different capabilities and functions
  • DevOps/DevSecOps and MLOps also different scope and expectation
  • Feature Store (off-line part) and DataLake, you can see different data amount, different view to data quality processes, different type of data models (relational vs key-value model, etc.)
  • ...

It will be nice to discover holy grail and simplify everything but please stick to reality and not dreams. “There is really a big difference between knee-boots and watches, even though they both stretch” ;-)

Venkata Pingali

Scribble Data | AI for Finance | Knowledge Agents | Co-Founder

3 年

Thoughtful note. I concur. I would only add that the many dimensions and contexts of the problem means that there will be multiple design points in the space each of which is fit for purpose under certain conditions.

要查看或添加评论,请登录

Jiri Steuer的更多文章

社区洞察

其他会员也浏览了