Integrating PyCaret time-series module into Power BI
Working with Power BI is very nice to do Business Intelligence overall. It is even much better when adding ‘super powers’ like R or Python, as beyond Business Intelligence, it is possible to perform machine learning, forecasting and advanced analytics.
In this particular case, I will be using the Python PyCaret Time Series module into Power BI to create a forecast analysis for the price of a commodity. Now, if you are like me, just visualizing a forecast is not enough to answer business questions. So, I was wondering how to ask business questions by adding time intelligence to the forecast. This can be done combining a bit of DAX with the forecast performed by PyCaret package and its Time Series module.
You can learn a lot more Machine Learning techniques in Power BI like this from our book Machine Learning en Power BI with R and Python, that you can get in English and Spanish -and soon in Portuguese- in our website https://mlbi.io/ Consider that 10% of our revenues will be donated to @PyCaret team.
For the sake of the example, I will not go into making very accurate model. I will focus more on the solution and will leave the reader the task to research how to make a very accurate model; something not complex with PyCaret Time Series Model.
The dataset to use is within a csv file called ‘commodity_price.csv’, which is a time series with the price of a given commodity from 01/06/2017 to date with daily datapoints -no gaps-. This dataset contains only 2 columns, ‘Date’ and ‘price’, and it looks like this
Before we start, we can install PyCaret and specifically the Time Series module. To do so, I advise you to check here on how to install PyCaret Time Series Module. PyCaret works very well with Python 3.8.
Creating the virtual environment
It is highly recommended that you create a new virtual environment where to install the PyCaret Time Series Module, as right now it has some version conflicts with PyCaret original package. The PyCaret team is working hard to integrate everything under one single package, but nowadays, it has to be in separate environments. You can check here how to create a new Anaconda virtual environment.
Once created a new virtual environment and installed the PyCaret Time Series Module, it is necessary to sync up Power BI with that virtual environment, so all functionalities can be used within Power BI. In my example, I created a Anaconda virtual environment called `pycaret_ts`. To sync up this new virtual environment with Power BI, just to go Options and Settings > Options > Python scripting and at the option ‘Detected Python home directories:’, select ‘Other’, and at the second drop down menu called ‘Set a Python home directory:’, select the location of the new virtual environment; as it is shown at the picture.?
Now Power BI is ready, so we are! Let’s do some forecasting.
Creating the forecast in Power Query
As we select ‘Transform data’ in Power BI desktop, we open the Power Query environment, where we will perform the forecasting. Before working with time series, it is very important that the data is sorted descending. Once there, and after loading the original dataset, let’s create a copy of it that I will rename ‘forecast’
Let’s make a forecast of the next 60 days after the last data point that it is available at the dataset. So, selecting the ‘forecast’ query, let’s go to Transform > Run Python script and insert the following code:
import pandas as pd
import numpy as np
from pycaret.time_series import *
df = dataset[['price']]
setup(df, fh = 60, fold = 4, session_id = 100, seasonal_period = 'D', html = False, verbose = False)
model = create_model('auto_arima', verbose = False)
fcst = predict_model(model, alpha = 0.1, return_pred_int = True)
fcst = pd.DataFrame(fcst)
fcst['day_added'] = np.arange(len(fcst))
fcst['type'] = pd.Series('forecast', index=df.index, dtype='category')
fcst
The code inserted looks like this:
Once Executed, Power Query returns all the tables -data frames- generated by the code. In this case, the one that contains the forecast is called ‘fcst’; so, let’s expand it. At the ‘forecast’ table, let’s rename the column ‘y_pred’ to ‘price’. The ‘forecast’ table looks like this:?
There are a few things that need to be done before ‘Close & Apply’. Let’s go back to the original dataset ‘commodity_price’ and add a new custom column named ‘type’ with the only value of “historical” -you will see later why do we need this. To do this, just go to Add Column > Custom Column and create a new column with those features.
领英推荐
Now, those tables are ready to be appended. So, let’s go to Home > Append Queries > Append Queries as new, and select ‘commodity_price’ as First table and ‘forecast’ as Second table, like this:
A new table called ‘Append1’ is created. So, let’s rename it to ‘consolidated’, as this new table contains both historical data -commodity_price- and forecast data -forecast-.
So, more transformation that need to be done on the ‘consolidated’ table.
1.??????Let’s fill down the ‘Date’ column, as now those new 60 rows added are blank. To do this, just Transform > Fill > Down
2.??????Replace the ‘null’ values with 0 of the columns ‘lower’, ‘upper’ and ‘day_added’, and cast them all those columns as ‘decimal number’. Make sure that ‘price’ is also decimal number.
3.??????Duplicate column ‘Date’ and convert it to ‘integer number’
4.??????Create a new column where you sum the duplicated ‘Date’ column with the column ‘day_added’
5.??????Convert the new column as ‘date’ type, and rename it as Date.
6.??????Finally, remove all the unnecessary columns, like the original ‘Date’, ‘day_added’
After all those transformations, this is how the ‘consolidated’ table looks like:
Now, everything is ready to go, so let’s hit ‘Close & Apply’ and let everything to be uploaded to memory. Here is the visual result
Now it comes the nice part: as this forecast data is part of the data model, it is possible to create DAX measures to calculate the expected growth of the price with the forecast, or even the Standard Moving Average of the price, but now including the forecast.
Let’s calculate the Standard Moving Average of 60 days, but adding a new measure using the following DAX measure:?
SMA 60d = CALCULATE( AVERAGEX( Append1, Append1[price] ),
???????????????DATESINPERIOD( Append1[fecha], LASTDATE( Append1[fecha] ), -60, DAY ) )
Something also interesting is the ability to calculate the growth rate of the price day by day, for example comparing the same day of last year.
To do so, it is necessary to create several DAX measures:
Price Last Year = CALCULATE( AVERAGE( Append1[price] ), SAMEPERIODLASTYEAR( Append1[fecha] ) )
Var % YoY = DIVIDE( ( SUM( Append1[price] ) - Append1[Price Last Year] ), SUM( Append1[price] ) )
Let’s look at the result:
Now we can estimate the growth of forecasted points against the same price value 1 year ago.
This is the great advantage of having forecast points into the same data model: it is possible to add intelligence over the forecast.
Thank you
Pablo Moreno
Sustainability Advocate | Business Analytics | Digital Transformation
2 年Hi Pablo, Good Day! After running python script, i am getting following error: DataSource.Error: ADO.NET: Python script error. <pi>ImportError: DLL load failed while importing _sqlite3: The specified module could not be found. </pi> Details: ??DataSourceKind=Python ??DataSourcePath=Python ??Message=Python script error. <pi>ImportError: DLL load failed while importing _sqlite3: The specified module could not be found. </pi> ??ErrorCode=-2147467259 Kindly guide me about the possible reasons for this error. Best Regards
Proximamente estaremos anunciando cursos. En breve. Gracias Germán Cruz
Ingeniero Civil | Maestría en vias terrestres | Gestión de infraestructura vial |
2 年Estimado Pablo, tiene usted sus cursos en alguna plataforma o escuela online ?
BI Development Lead at Algori | Power BI | Python | Azure
2 年Un gran salto de posibilidades!!