The four pillars of an effective and robust forecasting solution
Accurate forecasting is a critical element of many business processes across industries. Whether you need to forecast cash flow, demand, sales, or resource allocation, the core pillars for building an effective and robust forecasting solution still apply.
As soon as you go beyond a “forecasting” (quotes intended!) strategy of plugging in last year’s actuals into an Excel spreadsheet and applying an x% change, developing and managing effective forecasting solutions quickly becomes a complex and multi-disciplinary proposition.
Over the years, Neal Analytics has developed, deployed at scale, and maintained forecasting solutions for multiple use cases across industries, using a wide variety of ML (Machine Learning) techniques and with various scale and business process integration levels. This article shares the design and implementation framework Neal developed to ensure high-quality outcomes for forecasting projects.
Pillar 1: Assessing the needs
Before any work can start on the architectural and technology front, assessing and reaching an agreement on the needs of the forecasting project is critical. In most cases, forecasting models already exist, and ensuring clarity regarding “what success looks like” for any newly developed models is paramount.
Together with this goal, e.g., “20% more precise at a one-month horizon”, the organizational and technological maturity of the organization will dictate how far the new solution can go and the timeline for execution.
Whether the starting point is as simple as a linear regression model in a spreadsheet or as complex as an elaborate ensemble model combining multiple custom and open-source machine learning models, this assessment of goals, current situation, and existing data will be crucial to ensure that the project achieves satisfactory results.
This will also serve as an objective and shared goals roadmap for all the parties involved: team, company, system integrator, and technology suppliers.
Pillar 2: Defining the data strategy
Often, the data required to build the forecasting models will come from several heterogeneous internal and external data sources such as public web services (weather forecast, traffic information, etc.), internal (often manually filled) spreadsheets, ERP systems, marketing technology (martech) software, and more.
Therefore, to ensure that the solution reaches the desired quality level and is manageable over the long run, the project team must define a solid and appropriate data strategy.
The chosen data platform will leverage existing integrations (such as those available with tools such as Azure Data Factory) and custom-developed data connectors (using tools such as Azure Functions, Batch, or Event Hub) to securely store the model(s) training and operational data.
In most cases, those data sources will be a combination of real-time and historical data stored locally (e.g., spreadsheets), on-premises (e.g., ERP), and cloud data.
The selected data platform, usually Azure Synapse or Azure Databricks in the Azure ecosystem, will ingest and normalize the relevant sources into a “source of truth” database that the ML algorithms can tap.
Pillar 3: Selecting and developing the machine learning models
The actual forecasting can start now that the goal is defined and data is aggregated in a single source of truth.
There are so many options for forecasting algorithms that it’s easy to either under- or over-engineer them. Under-engineering will often result in subpar results. Over-engineering will result in expensive model training and maintenance. This could also jeopardize project viability over the long run.
Unfortunately, no magic tool can automatically identify the best algorithm (or ensemble of algorithms, i.e., so-called “ensemble models”) for a given forecasting need.
Some algorithms work well at modeling seasonality, while others natively incorporate external data (exogenous regressors). Each project is different, and using a general algorithm without selecting the right one will lead to subpar results.
This is where leveraging the expertise of data scientists with years of real-world experience developing, deploying, and maintaining forecasting models will make all the difference. If this competence is unavailable in-house, hiring experts such as Neal Analytics to help develop and train the best algorithms can make all the difference. Internal data scientists can then update and maintain those algorithms to ensure they remain effective.
For instance, for one customer, after looking at the particularities of the product and the various store locations, our data scientists built an ensemble model that combined both regional and local store data. It helped them develop an ensemble model that accurately forecasts demand across several stores and takes into account national (e.g., holidays) and regional (e.g., school breaks) seasonality.
Pillar 4: Operationalizing the forecasting models
Many projects fail not because the models are not good enough but because the project team fails to operationalize them effectively. Operationalizing forecasting models require that models are integrated with the appropriate business processes and that they are refreshed over time, through regular retraining, to reflect new data and changing business conditions. Those unchecked changes are referred to as model drift.
The “island” aspect, when a model lives somewhere separate and is not integrated into a user-accessible dashboard, app, ERP, or any relevant system, and model drift, can doom a forecasting project to a slow and certain death.
From the project onset, it is therefore critical to identify how their end users will access them, whether they are humans (e.g., through a Power BI dashboard) or applications (e.g., by being integrated into an ERP), and what are the best solutions to achieve this integration into manual or automatic existing business processes.
Once your forecasting models are deployed and accessible natively for their relevant business processes, the next step is to ensure those models remain precise as the situation evolves. Therefore, implementing best practices around Machine Learning Operations (MLOps) to manage models at scale will be a critical final step to ensure those models remain accurate over time.
Often forgotten or minimized, this operationalization fourth pillar is arguably the most important because this is where R&D (pillars 1 to 3) translates to business impact.
Typically, the technologies that will support this last pillar will span dashboarding tools (such as Power BI), MLOps tools (such as Azure MLOps), low/no-code application development platforms (such as Power Apps), RPA tools (such as Power Automate), and of course ERP and other core business applications that will leverage those forecasts.
(This article was originally published on Neal Analytics blog)