Feature Store for ML - enabling AI adoption at scale
Sasirekha Cota
AI Strategist, Generative AI, Enterprise Architect, Transformation Consultant, Content Writer
We all know that the AI models are ONLY as good as the data they are trained on. Data and Application Silos is one of the main challenges of implementing and scaling AI solutions in Enterprise. Feature stores – the concept and associated products/capabilities coming into the marker – is aimed at addressing this issue and enable AI adoption at scale.
A machine learning model maps a set of data inputs, known as features, to a predictor or target variable.?A feature?is an individual measurable property or characteristic of a phenomenon. In a relational dataset, features appear as columns and typically referred to as attributes or variables.
In Machine learning, the performance and quality of the model is dependent on the choice of the algorithm, quantity of data available as well as the quality of the dataset – accuracy, reliability, completeness, consistency, granularity etc. While it sounds counter-intuitive, it is a well-proven fact that using all the features (especially in its raw form – even if it meets all the basic quality requirements) will not result in the best prediction models. Feature selection and feature engineering can be used as levers for improving and/or optimizing model performance.
Feature selection involves limiting the data inputs used for training (say by eliminating redundant, irrelevant, contradicting attributes in a available dataset) with the aim of increasing accuracy, reducing cost as well as producing an interpretable model (this becoming more and more important with the focus on Explainable AI).
Feature engineering is about transforming existing features to create new features by using various techniques imputation, extracting date/time, handing outliers, grouping, feature split etc. (One good reference is https://towardsdatascience.com/feature-engineering-for-machine-learning-3a5e293a5114).
Arriving at a single new feature that is effective is a long process (that involves trial and error) and resource intensive. What if these new features and the transformation pipeline that are painstakingly created are not limited to a specific project or model, but made available to the entire data scientist population of the Enterprise. And “Feature Store” is aimed at exactly doing that – by publishing a catalog of available features.?Feature Store is the data management layer of ML - which is considered as one of the missing pieces of the puzzle and a superior alternative to the issues arising out of the micro-services architecture in place.
The feature store is a data warehouse of features for machine learning (ML) with the data scientists as the end-user. It is typically implemented as a dual-database:
领英推荐
1.??????Online Feature store - Row-oriented database (returning a single row of features called “feature vector”) to be used as input for an online model for prediction. Mostly implemented as key-value stores to be able to provide millisecond latency.
2.??????Offline Feature store - Provides large batches of features used to create training/test datasets
In other words, a feature store is a Machine Learning specific data system that stores and manages features, runs data pipelines that transforms raw data into feature values and serves features for both training and production model.
AI-powered products that are limited to the data available within its application are like jellyfish: its autonomic system makes it functional, but it lacks a brain. However, you can evolve your models with data enriched "brains" through the help of a feature store. -https://www.kdnuggets.com/2021/06/ai-with-feature-store.html
Uber Michelangelo aimed at democratizing machine learning and making scaling AI easy seems to be the starting point of this feature store concept. Today Palette is Michelangelo’s Feature store – that is centralized (providing single source of truth for features), catalogued (with features grouped into perspectives) and having reduced training/serving skew (as the features used for training and serving are the same). In effect, Feature stores is the Uberization of AI implementation in Enterprises.
Tecton Feature Store, Kaskada, Feast (open source feature store), Hopsworks, Amazon Feature Store capability of SageMaker (Dec 2020), Splice Machine Feature Store (Jan 2021), Databricks feature store co-designed with MLOps (May 2021), Google Vertex Feature Store (May 2021) are some of the options companies can explore right now.
Clearly the concept of Feature Store has picked up and expected to grow with better products and more “features”. As Enterprises are moving from the experimentation to exploitation of AI, the feature stores concept brings a host of advantages including increased model accuracy, faster development, smoother deployment, better collaboration and improved compliance.