登录查看更多内容

Creating a Machine Learning App with Power BI and PyCaret

Pablo M.

发布日期: 2021年6月4日

Can we use Power BI as application to productionalize machine learning like a web app? Of course!

Let’s create a Machine Learning App in Power BI, but no in Power Query, rather, let’s do it in Desktop, so our customers can interact with the model.

Techniques like this a much more will be available at the book that we are working on ‘Machine Learning in Power BI with R and Python’. This book is planned to be released in July 2021 in Spanish, and November 2021 in English. If you are interested to know more about this book, you can view the recorded presentation here or view the preliminary content here

We will create a model in Python using -the amazing’ PyCaret package, created by @Moez Ali. In this example, we will use the insurance dataset to create a regression project, as we want to predict ‘charges’, given the other variables: ‘age’, ‘sex’, ‘bmi’, ‘children’, ‘smoker’ and ‘region’. We select ‘random forest’ as model.

After doing the whole pipeline in Jupyter, we save the model and the entire pipeline in a ‘pkl’ file named ‘rf_ins.pkl’

Once the pipeline and the trained model is completed, we need 3 things to pur the model in production:

- The saved pipeline

- The trained model

- New data

We do have already the saved pipeline and the trained model, so all we need is new data.

To use Power BI as a web app for machine learning, all we have to do is create the different columns in Power BI Desktop using DAX and use slicers to allow the end user to select values for each feature. Finally let's a Python visualization where machine learning results will be displayed. Every time slicers and filters changed, the model will update the visual with the new prediction.

In our example, let’s open Power BI and create the same columns that we used to train the model: ‘age’, ‘sex’, ‘bmi’, ‘children’, ‘smoker’ and ‘region’. We don’t need to create ‘charges’ as that’s the predicted value.

Creating the new data

Continuous variables can be created using the ‘What if’ parameters, as in addition of a calculated column, it will produce a measure. The value of such measure will be the ‘SELECTEDVALUE()’ function by using the filter. We can do the same thing for the categorical features. We will use the measures as the new dataframe within the Python visual at which we will do the predictions.

It is imperative that the measures have the same name as the column names used on the training dataset, otherwise the model will not recognize them. Let’s go step by step.

Step 1: Creating the numeric features

To create the numeric features, go to Modeling > New parameter and start filling the dialogue box. We will create the ‘age feature’ first, and we must do the same with all the other features.

Select Mdeling > New parameter, to create a 'What-if' parameter

Once you click ‘OK’, you will see how a new calculated table is crated with 1 calculated column and its measure. Also, a new slicer is added to the canvas.

We need to rename the measure to ‘age’, as that’s what we will use for the new dataframe. All you must do is double click on the measure and leave it as ‘age’ (fully trimmed, no blank spaces)

Let’s do the same with the rest of the numeric features.

As ‘What-if parameter’ in Power BI only works with numerical, date or decimal features, so, for the categorical features we need to create a new table with just one column with the categories of the feature and create a measure associated using SELECTEDVALUE().

Step 2: Create categorical features

To create the categorical features, let’s use Home > Enter data, and create a new table with the categorical feature ‘sex’, which it contains only 2 values: ‘male’ and ‘female’

Once ‘Load’ is clicked, a new table with 1 column and the two values is crated. Let’s now add the measure called ‘sex’ using the following DAX expression:

sex = SELECTEDVALUE( ‘sex feature’[sex feature] )

To create a new measure, just right click with the over the table ‘sex feature’ and select ‘new measure’, and the formula bar will be activated. Once the DAX expression is added, click ‘Enter’, and the measure will be added to the table.

Let’s do the same thing with ‘smoker’ (with only two values: yes, and no) and ‘region’ (with four values: southwest, northwest, southeast, northeast)

For the three categorical features, let’s add slicers to the canvas, one for each feature. To add slicers, just click on the slicer icon of Visualization pane and select the created column (not the measure) of each table.

It is recommended at this point not to leave the slicers and sliders blank, for that reason, the categorical slicers should be set up to ‘Single select’

Creating the prediction

Now that we have some values selected, let’s create the prediction visualization. To do this, let’s click on the ‘Py’ icon at the Visualizations pane

Once the user enables the Python visual, the script appears, but it is blank and no code can be entered. To activate the script, we need to add values to the ‘Values’ field. This is where all our created measures will be placed

Now we can see how the Python script is active and we can start entering code. So, let’s go!

It is very important to understand that the output of the script in Power BI desktop must be a visualization or chart, not a table or dataframe or any other object.

This is how the entire script will look like

# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:

# dataset = pandas.DataFrame(age, bmi, children, region, sex, smoker)
# dataset = dataset.drop_duplicates()

# Paste or type your script code here:

# Import pycaret, pipeline and model
from pycaret.regression import *
model = load_model(r'C:\...\rf_ins')

# Create prediction with dataset
df = predict_model(model, data  ?= dataset)

# Create table visual with matplotlib to display results
import matplotlib.pyplot as plt

fig, ax =plt.subplots(1,1)
df=pd.DataFrame(df)
ax.axis('tight')
ax.axis('off')
ax.table(cellText = df.values, colLabels = df.columns, loc='center', cellLoc = 'center')
 

plt.show()

And we can see how the output of the script is a table visual with the values selected and the ‘Label’ feature, which is the Predicted value of ‘charges’. As values of slicers and filters moves, the visualization will be refreshed with the new predicion -stored in column 'Label'-.

To display only the column that shows the prediction, you can use the following code:

# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script: 

# dataset = pandas.DataFrame(age, bmi, children, region, sex, smoker)
# dataset = dataset.drop_duplicates()

# Paste or type your script code here:


# Import pycaret, pipeline and model
from pycaret.regression import *
model = load_model(r'C:\...\rf_ins')

# Create prediction with dataset
pd.options.display.float_format = '{:, .2f}'.format
df1 = predict_model(model, data  ?= dataset)
df = df1[['Label']]
df.rename(columns = {'Label' : 'Predicted Charges'}, inplace = True)


# Create table visual with matplotlib to display results
import matplotlib.pyplot as plt

fig = plt.figure()

table = plt.table(cellText = df.values, loc='center', colLabels = df.columns, cellLoc = 'center')
table.auto_set_font_size(False)
table.set_fontsize(24)
table.scale(5,5)


plt.show()

Feel free to use another visual or another settings that will may fit better to you or your customer, to consume your app. If the reader of this blog have another visualization suggestion or want to try it in a different way, you are more than welcome to share it!

Now the user can change the slicers and filters and the prediction (column ‘Label’) will be updated accordingly.

The Power BI Desktop file can be uploaded to Power BI Service and be shared as a web app, but a ‘Pro’ license -or higher- will be needed. Also, when uploading it to Power BI Service, a Gateway in personal mode will be required.

Pablo Moreno is co-author of the book ‘Machine Learning in Power BI with R and Python’ together with Gabriel Gomez

We want to thank Moez Ali and its great team for the creation and development of PyCaret

Isabellí Andrade

1 年

Alexandre, pensei em seguirmos assim, tenho um not pronto já.

1 次回应

Pablo M.

3 年

Hello everyone, just updated the link to the book that I mention in the article. You can view a brief content here: https://online.pubhtml5.com/nkpo/lzyj/

1 次回应

Alvaro Martinez

Ingeniero Industrial

3 年

Could you have a manual how to install pycaret With power bi? I have tried installing pycaret but when i played the algorithm in powerbi python sent an error -2147467259 Thanks Regards

Pablo M.

3 年

I haven't mentioned it, but it works also with #R

1 次回应

查看更多评论

要查看或添加评论，请登录

Pablo M.的更多文章

#ML + #RPA = IA (Intelligence Automation)

2024年10月12日

#ML + #RPA = IA (Intelligence Automation)

El objetivo último de la transformación digital es “permitir que los humanos hagan o produzcan más con menos recursos”.…
La mejor opción de Power BI con tus clientes (si sabes lo que haces)

2024年5月16日

La mejor opción de Power BI con tus clientes (si sabes lo que haces)

Cada vez que publico o menciono algo en LinkedIn sobre evitar la descarga de datos en Ex**l desde un dashboard, media…

43 条评论
Análisis geo-espacial con Microsoft Fabric

2024年5月15日

Análisis geo-espacial con Microsoft Fabric

?Alguna vez te has preguntado cómo podrías rastrear tu propia logística o quizás la ubicación de tu equipo de trabajo…

4 条评论
eDiscovery: La joya escondida en Microsoft Purview

2024年4月2日

eDiscovery: La joya escondida en Microsoft Purview

Si trabaja con Microsoft Fabric, es posible que sepa que es responsable de todos los datos alojados en la nube, lo que…
Entendiendo la Transformación digital

2024年3月28日

Entendiendo la Transformación digital

Recientemente me encontré en una conversación conmigo mismo sobre cómo puedo explicar todo esto de la revolución de la…

10 条评论
KQL Databases para tiempo-real

2024年1月2日

KQL Databases para tiempo-real

En mi artículo anterior La promesa de Análisis en Tiempo Real con Microsoft Fabric, hago una breve introducción a qué…
Vector Database: desarrollo e integración en Microsoft Fabric

2023年12月28日

Vector Database: desarrollo e integración en Microsoft Fabric

Estoy seguro de que ahora cualquier persona conoce o ha oído hablar de Gen-AI, ChatGPT, OpenAI o LLM. También estoy…
Pero... Cuánto cuesta Microsoft Fabric?

2023年12月22日

Pero... Cuánto cuesta Microsoft Fabric?

Probablemente la pregunta más habitual que escucho y que me hacen muchos amigos y colegas. Así que en aras de…

7 条评论
Data Virtualization en Azure y Fabric

2023年12月20日

Data Virtualization en Azure y Fabric

La virtualización de datos es posiblemente una técnica de datos poco conocida y, en mi opinión, una de las más útiles…

3 条评论
KQL vs SQL

2023年12月14日

KQL vs SQL

Kusto Query Language (KQL) es una poderosa herramienta para explorar sus datos y descubrir patrones, identificar…

14 条评论

See all articles

Creating a Machine Learning App with Power BI and PyCaret

Pablo M.

Pablo M.的更多文章

社区洞察

其他会员也浏览了

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part III

Move Faster your ML Pipeline

Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018 -Trends and Analysis

Matplotlib

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part I

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part II

Should we learn programming to Future proof ourselves?

Text Parsing in Python with US-Patent Data

SQL Challenge: Number Of Custom Email Labels

Integrating PyCaret time-series module into Power BI

Pablo M.的更多文章

#ML + #RPA = IA (Intelligence Automation)

La mejor opción de Power BI con tus clientes (si sabes lo que haces)

Análisis geo-espacial con Microsoft Fabric

eDiscovery: La joya escondida en Microsoft Purview

Entendiendo la Transformación digital

KQL Databases para tiempo-real

Vector Database: desarrollo e integración en Microsoft Fabric

Pero... Cuánto cuesta Microsoft Fabric?

Data Virtualization en Azure y Fabric

KQL vs SQL

社区洞察

其他会员也浏览了

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part III

Move Faster your ML Pipeline

Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018 -Trends and Analysis

Matplotlib

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part I

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part II

Should we learn programming to Future proof ourselves?

Text Parsing in Python with US-Patent Data

SQL Challenge: Number Of Custom Email Labels

Integrating PyCaret time-series module into Power BI