Amazon Forecast
Amazon Forecast is a fully managed service that uses machine learning to deliver highly accurate forecasts.
Amazon Forecast uses machine learning to combine time series data with additional variables to build forecasts. Amazon Forecast requires no machine learning experience to get started. You only need to provide historical data, plus any additional data that you believe may impact your forecasts. For example, the demand for a particular color of a shirt may change with the seasons and store location. This complex relationship is hard to determine on its own, but machine learning is ideally suited to recognize it. Once you provide your data, Amazon Forecast will automatically examine it, identify what is meaningful, and produce a forecasting model capable of making predictions that are up to 50% more accurate than looking at time series data alone.
To begin, the creation of a Dataset Group is required, up to three datasets in the CSV file format can be added to the Dataset Group.
Amazon Forecast employs Forecasting Domains to help automate the process of selecting, tuning, and training the most optimal model for your purpose.
The creation of the Dataset Group requires a name and a Forecasting Domain designation. The following seven Forecasting Domains are available:
- Retail – used to forecast retail demand.
- Inventory planning – used to forecast demand for raw materials and determine how much inventory of any given item is in stock.
- EC2 capacity – used to forecast Amazon Elastic Compute Cloud capacity.
- Work force – used to plan and identify the amount of workforce that is required.
- Web traffic – used to forecast web traffic to a web property or set of web properties.
- Metrics – used to forecast metrics such as revenue, sales, and cashflow.
- Custom – used to generate forecasts that do not fit into any of the other predefined forecasting domains.
The only required dataset in Amazon Forecast is the target time series dataset. The target time series requires the following fields with associated datatypes:
- item_id(string)
- timestamp(timestamp)
- target_value(float)
The required target_value field of the target time series dataset will vary based upon the selected Forecasting Domain of the Dataset Group. For instance, if you select Inventory planning as the Forecasting Domain the target_value would be represented by demand and be labeled “demand” in the CSV column header and schema in place of “target_value”.
The related time series dataset is the first of the two optional datasets. The related time series has two required fields, item_id, and timestamp. Price, stockout_days, inventory_onhand, revenue, and in_stock are all recommended fields for the related time series. It is important to note, any field added to the related time series that is not one of the required fields must be of the datatype integer or float.
The item metadata dataset is the second of the two optional datasets. The item metadata dataset only requires the item_id field. Category, brand, lead_time, order_cycle, and safety_stock are all examples of additional fields that could be included in the item metadata dataset. All fields in the item metadata dataset must be of the string datatype.
To import your datasets to the Dataset Group, Amazon Forecast requires the creation of a schema. The schemas used to import datasets to Amazon Forecast Dataset Groups must match the column headers both in name and order.
Below is an example of a target time series schema:
{ "Attributes": [ { "AttributeName": "item_id", "AttributeType": "string" }, { "AttributeName": "timestamp", "AttributeType": "timestamp" }, { "AttributeName": "demand", "AttributeType": "float" } ] }
The following is an example of what a correlating target time series CSV file might look like:
ITEM_ID,TIMESTAMP,DEMAND 11111,2019-10-05,10.7 22222,2019-10-05,42.0 33333,2019-10-05,3.12
Two of the three datasets involve a third configuration step that requires selecting the frequency of the data. Data frequency for the target time series dataset and the related time series datasets is based on the timestamp values of the timestamp fields.
Both datasets should have parity between them in regard to the timestamp values and accordingly should share the same data frequency.
Another important detail to note is the limitation of available formats that can be used to represent timestamp fields in the forecast datasets. All timestamp datatype values must be of the yyyy-MM-dd format, or the yyyy-MM-dd HH:mm:ss format. Any other timestamp format will result in a failure during import to the Dataset Group. If the data frequency of your dataset is minutes or hours, you must use the yyyy-MM-dd HH:mm:ss format.
Once the datasets are all successfully imported into the Dataset Group, a Predictor can be created. The Predictor is a simplified and automated means of selecting, configuring, and training your forecast model. There are a few configuration values to be mindful of to ensure a successful training:
- Forecast horizon – tells Amazon Forecast how far into the future to predict your data and is set in units that should have a direct correlation to the data frequency of your target time series dataset.
- Forecast frequency – the frequency at which your forecasts are generated. This value must be greater than or equal to the target time series dataset frequency.
Once the Predictor finishes training it can be used to create a Forecast. And once the Forecast has been created it can be queried via a forecast lookup to generate a forecast for a specific item.
With that, all that is left is to interpret the results. The P10, P50, and P90 values have respectively 10%, 50%, and 90% probability of satisfying actual demand; with the P50 value coming in closest to what the actual demand should be.
For more information, Kindly click here