登录查看更多内容

Amazon Forecast

Amit Jain

Actively looking for new job | 7.2+ YoE as a Data Scientist

发布日期: 2020年10月1日

Amazon Forecast is a fully managed service that uses machine learning to deliver highly accurate forecasts.

Amazon Forecast uses machine learning to combine time series data with additional variables to build forecasts. Amazon Forecast requires no machine learning experience to get started. You only need to provide historical data, plus any additional data that you believe may impact your forecasts. For example, the demand for a particular color of a shirt may change with the seasons and store location. This complex relationship is hard to determine on its own, but machine learning is ideally suited to recognize it. Once you provide your data, Amazon Forecast will automatically examine it, identify what is meaningful, and produce a forecasting model capable of making predictions that are up to 50% more accurate than looking at time series data alone.

To begin, the creation of a Dataset Group is required, up to three datasets in the CSV file format can be added to the Dataset Group.

Amazon Forecast employs Forecasting Domains to help automate the process of selecting, tuning, and training the most optimal model for your purpose.

The creation of the Dataset Group requires a name and a Forecasting Domain designation. The following seven Forecasting Domains are available:

Retail – used to forecast retail demand.
Inventory planning – used to forecast demand for raw materials and determine how much inventory of any given item is in stock.
EC2 capacity – used to forecast Amazon Elastic Compute Cloud capacity.
Work force – used to plan and identify the amount of workforce that is required.
Web traffic – used to forecast web traffic to a web property or set of web properties.
Metrics – used to forecast metrics such as revenue, sales, and cashflow.
Custom – used to generate forecasts that do not fit into any of the other predefined forecasting domains.

The only required dataset in Amazon Forecast is the target time series dataset. The target time series requires the following fields with associated datatypes:

item_id(string)
timestamp(timestamp)
target_value(float)

The required target_value field of the target time series dataset will vary based upon the selected Forecasting Domain of the Dataset Group. For instance, if you select Inventory planning as the Forecasting Domain the target_value would be represented by demand and be labeled “demand” in the CSV column header and schema in place of “target_value”.

The related time series dataset is the first of the two optional datasets. The related time series has two required fields, item_id, and timestamp. Price, stockout_days, inventory_onhand, revenue, and in_stock are all recommended fields for the related time series. It is important to note, any field added to the related time series that is not one of the required fields must be of the datatype integer or float.

The item metadata dataset is the second of the two optional datasets. The item metadata dataset only requires the item_id field. Category, brand, lead_time, order_cycle, and safety_stock are all examples of additional fields that could be included in the item metadata dataset. All fields in the item metadata dataset must be of the string datatype.

To import your datasets to the Dataset Group, Amazon Forecast requires the creation of a schema. The schemas used to import datasets to Amazon Forecast Dataset Groups must match the column headers both in name and order.

Below is an example of a target time series schema:

{

   "Attributes": [

       {

           "AttributeName": "item_id",

           "AttributeType": "string"

       },

       {

           "AttributeName": "timestamp",

           "AttributeType": "timestamp"

       },

       {

           "AttributeName": "demand",

           "AttributeType": "float"

       }

   ]

}

The following is an example of what a correlating target time series CSV file might look like:

ITEM_ID,TIMESTAMP,DEMAND

11111,2019-10-05,10.7

22222,2019-10-05,42.0

33333,2019-10-05,3.12

Two of the three datasets involve a third configuration step that requires selecting the frequency of the data. Data frequency for the target time series dataset and the related time series datasets is based on the timestamp values of the timestamp fields.

Both datasets should have parity between them in regard to the timestamp values and accordingly should share the same data frequency.

Another important detail to note is the limitation of available formats that can be used to represent timestamp fields in the forecast datasets. All timestamp datatype values must be of the yyyy-MM-dd format, or the yyyy-MM-dd HH:mm:ss format. Any other timestamp format will result in a failure during import to the Dataset Group. If the data frequency of your dataset is minutes or hours, you must use the yyyy-MM-dd HH:mm:ss format.

Once the datasets are all successfully imported into the Dataset Group, a Predictor can be created. The Predictor is a simplified and automated means of selecting, configuring, and training your forecast model. There are a few configuration values to be mindful of to ensure a successful training:

Forecast horizon – tells Amazon Forecast how far into the future to predict your data and is set in units that should have a direct correlation to the data frequency of your target time series dataset.
Forecast frequency – the frequency at which your forecasts are generated. This value must be greater than or equal to the target time series dataset frequency.

Once the Predictor finishes training it can be used to create a Forecast. And once the Forecast has been created it can be queried via a forecast lookup to generate a forecast for a specific item.

With that, all that is left is to interpret the results. The P10, P50, and P90 values have respectively 10%, 50%, and 90% probability of satisfying actual demand; with the P50 value coming in closest to what the actual demand should be.

For more information, Kindly click here

要查看或添加评论，请登录

Amit Jain的更多文章

How to install WML(Watson Machine Learning) using catalog in Openshift

2022年9月14日

How to install WML(Watson Machine Learning) using catalog in Openshift

WML Installation process Step 1: Login into https://cloud.ibm.
Using Fast loading libraries like Vaex

2021年12月15日

Using Fast loading libraries like Vaex

Vaex is a high-performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore…

1 条评论
Shapash : Machine Learning Interpretable & Understandable

2021年12月15日

Shapash : Machine Learning Interpretable & Understandable

Shapash is a Python library which aims to make machine learning interpretable and understandable by everyone. It…

1 条评论
Azure Cognitive Services

2021年12月14日

Azure Cognitive Services

What is Azure Cognitive Services? Cognitive Services brings AI within reach of every developer and data scientist. With…
Autoviz & Autovizwidget

2021年11月24日

Autoviz & Autovizwidget

Autoviz is an open-source python library that mainly works on visualizing the relationship of the data, it can find the…

3 条评论
Exploratory Data Analysis using pandas visual analysis library

2021年11月12日

Exploratory Data Analysis using pandas visual analysis library

Pandas Visual Analysis is an open-source python library which is used to visually analyze the data and that too in just…
Exploratory Data Analysis Using D-Tale Library

2021年11月11日

Exploratory Data Analysis Using D-Tale Library

D-Tale for interactive data exploration D-Tale is python library allows us to visualize a Pandas DataFrame. D-Tale…

1 条评论
Exploratory Data Analysis Using Pandas Profiling

2021年11月10日

Exploratory Data Analysis Using Pandas Profiling

Pandas profiling is an open-source Python module with which we can quickly do an exploratory data analysis, it also…

2 条评论
Exploratory Data Analysis with Sweetviz

2021年9月8日

Exploratory Data Analysis with Sweetviz

Sweetviz is an open-source pandas-based library to perform the primary EDA task. It also generates a summarized report…
Python program to check available slots for Covid vaccination centers in your nearest pin code

2021年5月3日

Python program to check available slots for Covid vaccination centers in your nearest pin code

Here is the Python script which checks the available slots for Covid-19 vaccination centers pin code wise from CoWIN…

1 条评论

See all articles

Amazon Forecast

Amit Jain

Actively looking for new job | 7.2+ YoE as a Data Scientist

Amit Jain的更多文章

社区洞察

其他会员也浏览了

The Future Of Cloud-Based Data, Analytics, and Machine Learning: Highlights from AWS re:Invent 2022

The future of companies developing data analytics services is already here

AMAZON FORECAST

AWS SAGEMAKER

GroundToCloud Let’s Lift Series: Designing a File Processing Workflow with AWS Step Functions

Insights From Amazon: Making Digital Transformation Work With AI, Data, And Cloud

Amazon Connect SIP Connector Ladder Diagrams

Week 44 (28 Oct - 3 Nov)

AWS Goodies - August 9, 2024

My Experience with Amazon Q: Use Cases

Amit Jain的更多文章

How to install WML(Watson Machine Learning) using catalog in Openshift

Using Fast loading libraries like Vaex

Shapash : Machine Learning Interpretable & Understandable

Azure Cognitive Services

Autoviz & Autovizwidget

Exploratory Data Analysis using pandas visual analysis library

Exploratory Data Analysis Using D-Tale Library

Exploratory Data Analysis Using Pandas Profiling

Exploratory Data Analysis with Sweetviz

Python program to check available slots for Covid vaccination centers in your nearest pin code

社区洞察

其他会员也浏览了

The Future Of Cloud-Based Data, Analytics, and Machine Learning: Highlights from AWS re:Invent 2022

The future of companies developing data analytics services is already here

AMAZON FORECAST

AWS SAGEMAKER

GroundToCloud Let’s Lift Series: Designing a File Processing Workflow with AWS Step Functions

Insights From Amazon: Making Digital Transformation Work With AI, Data, And Cloud

Amazon Connect SIP Connector Ladder Diagrams

Week 44 (28 Oct - 3 Nov)

AWS Goodies - August 9, 2024

My Experience with Amazon Q: Use Cases