Amazon Sagemaker Feature Store

Recap

Just to recap, the main reasons we need a feature store are

  1. Consistency in features for Models (variations can impact model results)
  2. Reuse of features across models and teams saving time and cost
  3. Enabling feature discovery and versioning

Sagemaker

Amazon Sagemaker is a fully managed, purpose-built repository for features. Being fully managed means the entire infrastructure, setup and provisioning are managed by AWS and do not require any management.

The gateway to the feature store is Amazon Sagemaker Studio. The Studio is a fully managed Jupyter Lab environment.?When you open the studio you would find the following widget in the Launcher tab.

Feature groups

To bundle related features together,?we use feature groups. Imagine the Feature group as a table and each feature as a column.?Each row is a way to group related features. As a simple example:

  • The customer would be a feature group
  • Recency, Frequency and Monetary categories/values would be separate features
  • Each customer would be a separate record

Each record contains a unique RecordIdentifier to uniquely identify the record. In our example, it could be the customerId.

Feature groups could be made available online or offline or both. Online Feature groups are mainly used for real-time predictions and store only the latest version of the feature data.?The read latency for an online store is a few milliseconds.

No alt text provided for this image

From there, navigating to the feature store you get a?view to see the list of all features and feature groups.

No alt text provided for this image

In true AWS tradition, every action that you can perform from the UI is also available via APIs. There is a Sagemaker python library that is available that can be used for API access.

The sample code to create a feature group is below and it is quite self-explanatory:

import sagemaker

sagemaker_session = sagemaker.Session()
region = sagemaker_session.boto_region_name

products_feature_group = FeatureGroup(
name=customers_feature_group_name, sagemaker_session=sagemaker_session
)

product_data = pd.read_csv("data/product.csv")

product_data["EventTime"] = pd.Series([current_time_sec] * len(product_data), dtype="float64")
customers_feature_group.load_feature_definitions(data_frame=product_data)

customers_feature_group.create(s3_uri="<s3 path>", record_identifier_name=record_identifier_feature_name, event_time_feature_name="EventTime", role_arn=role, enable_online_store=True)        

With the use of a short code snippet above, we can create features into Sagemaker. The API for retrieval is equally simple and intuitive.

In a nutshell, Sagemaker Feature Store makes it a breeze to store, discover and retrieve features for Machine Learning.

Footnotes:

  1. Sagemaker library version must be greater than 2.0

要查看或添加评论,请登录

Pronam Chatterjee的更多文章

  • Feature Store and Why is it needed

    Feature Store and Why is it needed

    When you build ML models you don't input raw data as it rarely is in a format that can be used by the ML Models…

  • Are You Ready For The Artificial Intelligence Supply Chain Model And Its Impact On Sales?

    Are You Ready For The Artificial Intelligence Supply Chain Model And Its Impact On Sales?

    Modern industrial sales and supply chains are complex. The amount of a product or service that a company can sell or…

  • Demand Forecasting Retail-Best Practices

    Demand Forecasting Retail-Best Practices

    Demand forecasting retail is one of the toughest jobs. One has to look into the existing market data, store…

  • How ML In Supply Chain Optimization Is Improving Management And Efficiency

    How ML In Supply Chain Optimization Is Improving Management And Efficiency

    79% of companies that see greater revenue growth in their industry have a well-optimized and high-performing supply…

    1 条评论
  • How Deep Learning Solves Retail Forecasting Challenges

    How Deep Learning Solves Retail Forecasting Challenges

    A study by Harvard Business Review and Snowflake Computing points out that retailers who choose to make data-driven…

    1 条评论
  • How AI is making the fashion industry smarter!

    How AI is making the fashion industry smarter!

    Sephora’s Color IQ recommends customized foundation and concealer shades after scanning the shopper’s skin using AI…

    3 条评论
  • Success in Supply Chain with AI: Proven Ways

    Success in Supply Chain with AI: Proven Ways

    A growing number of enterprises today are turning their trust towards machine learning in AI. There are various reasons…

    1 条评论
  • Data-Driven Transformation Affair in Retail

    Data-Driven Transformation Affair in Retail

    Retail is one of the strongest and fastest-growing industries worldwide. 2019 onwards it will further pick up the pace…

  • Life of BluePi: 4 years of growth and how!

    Life of BluePi: 4 years of growth and how!

    Clocking a 100% growth in revenues every year is no joke! More so, if you’re a self-funded, lean company, working…

    7 条评论
  • Migrate Enterprise Applications to AWS

    Migrate Enterprise Applications to AWS

    In this Webinar we would look at why migrating enterprise applications to the cloud are all the rage in the industry…

社区洞察

其他会员也浏览了