登录查看更多内容

Integrating PyCaret time-series module into Power BI

Pablo M.

发布日期: 2022年1月14日

Working with Power BI is very nice to do Business Intelligence overall. It is even much better when adding ‘super powers’ like R or Python, as beyond Business Intelligence, it is possible to perform machine learning, forecasting and advanced analytics.

In this particular case, I will be using the Python PyCaret Time Series module into Power BI to create a forecast analysis for the price of a commodity. Now, if you are like me, just visualizing a forecast is not enough to answer business questions. So, I was wondering how to ask business questions by adding time intelligence to the forecast. This can be done combining a bit of DAX with the forecast performed by PyCaret package and its Time Series module.

You can learn a lot more Machine Learning techniques in Power BI like this from our book Machine Learning en Power BI with R and Python, that you can get in English and Spanish -and soon in Portuguese- in our website https://mlbi.io/ Consider that 10% of our revenues will be donated to @PyCaret team.

For the sake of the example, I will not go into making very accurate model. I will focus more on the solution and will leave the reader the task to research how to make a very accurate model; something not complex with PyCaret Time Series Model.

The dataset to use is within a csv file called ‘commodity_price.csv’, which is a time series with the price of a given commodity from 01/06/2017 to date with daily datapoints -no gaps-. This dataset contains only 2 columns, ‘Date’ and ‘price’, and it looks like this

Before we start, we can install PyCaret and specifically the Time Series module. To do so, I advise you to check here on how to install PyCaret Time Series Module. PyCaret works very well with Python 3.8.

Creating the virtual environment

It is highly recommended that you create a new virtual environment where to install the PyCaret Time Series Module, as right now it has some version conflicts with PyCaret original package. The PyCaret team is working hard to integrate everything under one single package, but nowadays, it has to be in separate environments. You can check here how to create a new Anaconda virtual environment.

Once created a new virtual environment and installed the PyCaret Time Series Module, it is necessary to sync up Power BI with that virtual environment, so all functionalities can be used within Power BI. In my example, I created a Anaconda virtual environment called `pycaret_ts`. To sync up this new virtual environment with Power BI, just to go Options and Settings > Options > Python scripting and at the option ‘Detected Python home directories:’, select ‘Other’, and at the second drop down menu called ‘Set a Python home directory:’, select the location of the new virtual environment; as it is shown at the picture.?

Now Power BI is ready, so we are! Let’s do some forecasting.

Creating the forecast in Power Query

As we select ‘Transform data’ in Power BI desktop, we open the Power Query environment, where we will perform the forecasting. Before working with time series, it is very important that the data is sorted descending. Once there, and after loading the original dataset, let’s create a copy of it that I will rename ‘forecast’

Let’s make a forecast of the next 60 days after the last data point that it is available at the dataset. So, selecting the ‘forecast’ query, let’s go to Transform > Run Python script and insert the following code:

import pandas as pd

import numpy as np

from pycaret.time_series import *

df = dataset[['price']]

setup(df, fh = 60, fold = 4, session_id = 100, seasonal_period = 'D', html = False, verbose = False)

model = create_model('auto_arima', verbose = False)

fcst = predict_model(model, alpha = 0.1, return_pred_int = True)

fcst = pd.DataFrame(fcst)

fcst['day_added'] = np.arange(len(fcst))

fcst['type'] = pd.Series('forecast', index=df.index, dtype='category')

fcst

The code inserted looks like this:

Once Executed, Power Query returns all the tables -data frames- generated by the code. In this case, the one that contains the forecast is called ‘fcst’; so, let’s expand it. At the ‘forecast’ table, let’s rename the column ‘y_pred’ to ‘price’. The ‘forecast’ table looks like this:?

There are a few things that need to be done before ‘Close & Apply’. Let’s go back to the original dataset ‘commodity_price’ and add a new custom column named ‘type’ with the only value of “historical” -you will see later why do we need this. To do this, just go to Add Column > Custom Column and create a new column with those features.

领英推荐

Playing Referee

Helen Wall 2 年前

Analyzing Excel Sales Data with Python Pandas and…

Eduardo Miranda 8 个月前

The Evolution of My Data Analytics Journey with Python

Gerard V. Edom 5 个月前

Now, those tables are ready to be appended. So, let’s go to Home > Append Queries > Append Queries as new, and select ‘commodity_price’ as First table and ‘forecast’ as Second table, like this:

A new table called ‘Append1’ is created. So, let’s rename it to ‘consolidated’, as this new table contains both historical data -commodity_price- and forecast data -forecast-.

So, more transformation that need to be done on the ‘consolidated’ table.

1.??????Let’s fill down the ‘Date’ column, as now those new 60 rows added are blank. To do this, just Transform > Fill > Down

2.??????Replace the ‘null’ values with 0 of the columns ‘lower’, ‘upper’ and ‘day_added’, and cast them all those columns as ‘decimal number’. Make sure that ‘price’ is also decimal number.

3.??????Duplicate column ‘Date’ and convert it to ‘integer number’

4.??????Create a new column where you sum the duplicated ‘Date’ column with the column ‘day_added’

5.??????Convert the new column as ‘date’ type, and rename it as Date.

6.??????Finally, remove all the unnecessary columns, like the original ‘Date’, ‘day_added’

After all those transformations, this is how the ‘consolidated’ table looks like:

Now, everything is ready to go, so let’s hit ‘Close & Apply’ and let everything to be uploaded to memory. Here is the visual result

Now it comes the nice part: as this forecast data is part of the data model, it is possible to create DAX measures to calculate the expected growth of the price with the forecast, or even the Standard Moving Average of the price, but now including the forecast.

Let’s calculate the Standard Moving Average of 60 days, but adding a new measure using the following DAX measure:?

SMA 60d = CALCULATE( AVERAGEX( Append1, Append1[price] ),

???????????????DATESINPERIOD( Append1[fecha], LASTDATE( Append1[fecha] ), -60, DAY ) )

Something also interesting is the ability to calculate the growth rate of the price day by day, for example comparing the same day of last year.

To do so, it is necessary to create several DAX measures:

Price Last Year = CALCULATE( AVERAGE( Append1[price] ), SAMEPERIODLASTYEAR( Append1[fecha] ) )

Var % YoY = DIVIDE( ( SUM( Append1[price] ) - Append1[Price Last Year] ), SUM( Append1[price] ) )

Let’s look at the result:

Now we can estimate the growth of forecasted points against the same price value 1 year ago.

This is the great advantage of having forecast points into the same data model: it is possible to add intelligence over the forecast.

Thank you

Pablo Moreno

Owais Qureshi

Sustainability Advocate | Business Analytics | Digital Transformation

3 年

Hi Pablo, Good Day! After running python script, i am getting following error: DataSource.Error: ADO.NET: Python script error. <pi>ImportError: DLL load failed while importing _sqlite3: The specified module could not be found. </pi> Details: ??DataSourceKind=Python ??DataSourcePath=Python ??Message=Python script error. <pi>ImportError: DLL load failed while importing _sqlite3: The specified module could not be found. </pi> ??ErrorCode=-2147467259 Kindly guide me about the possible reasons for this error. Best Regards

Pablo M.

3 年

Proximamente estaremos anunciando cursos. En breve. Gracias Germán Cruz

4 次回应

Germán Cruz

Ingeniero Civil | Maestría en vias terrestres | Gestión de infraestructura vial | IA

3 年

Estimado Pablo, tiene usted sus cursos en alguna plataforma o escuela online ?

1 次回应

Luca Piattelli

BI Development Lead at Algori | Power BI | Python | Azure

3 年

Un gran salto de posibilidades!!

4 次回应

查看更多评论

要查看或添加评论，请登录

Pablo M.的更多文章

Tipos de sesgos en Análisis de Datos

2025年3月15日

Tipos de sesgos en Análisis de Datos

El análisis de datos, inteligencia de negocios, machine learning, ciencia de datos o inteligencia artificial está…

2 条评论
RoadMap de Producto de Datos (BI / ML / AI)

2025年2月4日

RoadMap de Producto de Datos (BI / ML / AI)

Recomendaciones iniciales En cada fase, trata de definir un horizonte temporal. Para todo el Roadmap u hoja de ruta…

5 条评论
#ML + #RPA = IA (Intelligence Automation)

2024年10月12日

#ML + #RPA = IA (Intelligence Automation)

El objetivo último de la transformación digital es “permitir que los humanos hagan o produzcan más con menos recursos”.…
La mejor opción de Power BI con tus clientes (si sabes lo que haces)

2024年5月16日

La mejor opción de Power BI con tus clientes (si sabes lo que haces)

Cada vez que publico o menciono algo en LinkedIn sobre evitar la descarga de datos en Ex**l desde un dashboard, media…

43 条评论
Análisis geo-espacial con Microsoft Fabric

2024年5月15日

Análisis geo-espacial con Microsoft Fabric

?Alguna vez te has preguntado cómo podrías rastrear tu propia logística o quizás la ubicación de tu equipo de trabajo…

4 条评论
eDiscovery: La joya escondida en Microsoft Purview

2024年4月2日

eDiscovery: La joya escondida en Microsoft Purview

Si trabaja con Microsoft Fabric, es posible que sepa que es responsable de todos los datos alojados en la nube, lo que…
Entendiendo la Transformación digital

2024年3月28日

Entendiendo la Transformación digital

Recientemente me encontré en una conversación conmigo mismo sobre cómo puedo explicar todo esto de la revolución de la…

10 条评论
KQL Databases para tiempo-real

2024年1月2日

KQL Databases para tiempo-real

En mi artículo anterior La promesa de Análisis en Tiempo Real con Microsoft Fabric, hago una breve introducción a qué…
Vector Database: desarrollo e integración en Microsoft Fabric

2023年12月28日

Vector Database: desarrollo e integración en Microsoft Fabric

Estoy seguro de que ahora cualquier persona conoce o ha oído hablar de Gen-AI, ChatGPT, OpenAI o LLM. También estoy…
Pero... Cuánto cuesta Microsoft Fabric?

2023年12月22日

Pero... Cuánto cuesta Microsoft Fabric?

Probablemente la pregunta más habitual que escucho y que me hacen muchos amigos y colegas. Así que en aras de…

7 条评论

See all articles

Integrating PyCaret time-series module into Power BI

Pablo M.

领英推荐

Pablo M.的更多文章

社区洞察

其他会员也浏览了

Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018 -Trends and Analysis

Matplotlib

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part I

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part II

Mastering Data Visualization with Matplotlib: A Comprehensive Guide to Creating Powerful Plots and Charts

Seaborn

Accessing Data with loc: Label-Based Indexing in Pandas

Accessing Data with iloc: Position-Based Indexing in Pandas

SQL Challenge: Weekly Orders Report

Boost Your Data Analysis with These 30 Essential Pandas Tricks!

领英推荐

Pablo M.的更多文章

Tipos de sesgos en Análisis de Datos

RoadMap de Producto de Datos (BI / ML / AI)

#ML + #RPA = IA (Intelligence Automation)

La mejor opción de Power BI con tus clientes (si sabes lo que haces)

Análisis geo-espacial con Microsoft Fabric

eDiscovery: La joya escondida en Microsoft Purview

Entendiendo la Transformación digital

KQL Databases para tiempo-real

Vector Database: desarrollo e integración en Microsoft Fabric

Pero... Cuánto cuesta Microsoft Fabric?

社区洞察

其他会员也浏览了

Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018 -Trends and Analysis

Matplotlib

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part I

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part II

Mastering Data Visualization with Matplotlib: A Comprehensive Guide to Creating Powerful Plots and Charts

Seaborn

Accessing Data with loc: Label-Based Indexing in Pandas

Accessing Data with iloc: Position-Based Indexing in Pandas

SQL Challenge: Weekly Orders Report

Boost Your Data Analysis with These 30 Essential Pandas Tricks!