Creating a Machine Learning App with Power BI and PyCaret

Creating a Machine Learning App with Power BI and PyCaret

Can we use Power BI as application to productionalize machine learning like a web app? Of course!

Let’s create a Machine Learning App in Power BI, but no in Power Query, rather, let’s do it in Desktop, so our customers can interact with the model.

Techniques like this a much more will be available at the book that we are working on ‘Machine Learning in Power BI with R and Python’. This book is planned to be released in July 2021 in Spanish, and November 2021 in English. If you are interested to know more about this book, you can view the recorded presentation here or view the preliminary content here

We will create a model in Python using -the amazing’ PyCaret package, created by @Moez Ali. In this example, we will use the insurance dataset to create a regression project, as we want to predict ‘charges’, given the other variables: ‘age’, ‘sex’, ‘bmi’, ‘children’, ‘smoker’ and ‘region’. We select ‘random forest’ as model.

After doing the whole pipeline in Jupyter, we save the model and the entire pipeline in a ‘pkl’ file named ‘rf_ins.pkl’

Once the pipeline and the trained model is completed, we need 3 things to pur the model in production:

-      The saved pipeline

-      The trained model

-      New data

We do have already the saved pipeline and the trained model, so all we need is new data.

To use Power BI as a web app for machine learning, all we have to do is create the different columns in Power BI Desktop using DAX and use slicers to allow the end user to select values for each feature. Finally let's a Python visualization where machine learning results will be displayed. Every time slicers and filters changed, the model will update the visual with the new prediction.

In our example, let’s open Power BI and create the same columns that we used to train the model: ‘age’, ‘sex’, ‘bmi’, ‘children’, ‘smoker’ and ‘region’. We don’t need to create ‘charges’ as that’s the predicted value. 

Creating the new data

Continuous variables can be created using the ‘What if’ parameters, as in addition of a calculated column, it will produce a measure. The value of such measure will be the ‘SELECTEDVALUE()’ function by using the filter. We can do the same thing for the categorical features. We will use the measures as the new dataframe within the Python visual at which we will do the predictions.

It is imperative that the measures have the same name as the column names used on the training dataset, otherwise the model will not recognize them. Let’s go step by step.

Step 1: Creating the numeric features

To create the numeric features, go to Modeling > New parameter and start filling the dialogue box. We will create the ‘age feature’ first, and we must do the same with all the other features.

Select Mdeling > New parameter, to create a 'What-if' parameter

Once you click ‘OK’, you will see how a new calculated table is crated with 1 calculated column and its measure. Also, a new slicer is added to the canvas.

We need to rename the measure to ‘age’, as that’s what we will use for the new dataframe. All you must do is double click on the measure and leave it as ‘age’ (fully trimmed, no blank spaces)

No alt text provided for this image

Let’s do the same with the rest of the numeric features. 

No alt text provided for this image

As ‘What-if parameter’ in Power BI only works with numerical, date or decimal features, so, for the categorical features we need to create a new table with just one column with the categories of the feature and create a measure associated using SELECTEDVALUE().

Step 2: Create categorical features

To create the categorical features, let’s use Home > Enter data, and create a new table with the categorical feature ‘sex’, which it contains only 2 values: ‘male’ and ‘female’

No alt text provided for this image

Once ‘Load’ is clicked, a new table with 1 column and the two values is crated. Let’s now add the measure called ‘sex’ using the following DAX expression:

sex = SELECTEDVALUE( ‘sex feature’[sex feature] )

To create a new measure, just right click with the over the table ‘sex feature’ and select ‘new measure’, and the formula bar will be activated. Once the DAX expression is added, click ‘Enter’, and the measure will be added to the table.

No alt text provided for this image

Let’s do the same thing with ‘smoker’ (with only two values: yes, and no) and ‘region’ (with four values: southwest, northwest, southeast, northeast)

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

For the three categorical features, let’s add slicers to the canvas, one for each feature. To add slicers, just click on the slicer icon of Visualization pane and select the created column (not the measure) of each table. 

No alt text provided for this image

It is recommended at this point not to leave the slicers and sliders blank, for that reason, the categorical slicers should be set up to ‘Single select’

No alt text provided for this image

Creating the prediction

Now that we have some values selected, let’s create the prediction visualization. To do this, let’s click on the ‘Py’ icon at the Visualizations pane

No alt text provided for this image

Once the user enables the Python visual, the script appears, but it is blank and no code can be entered. To activate the script, we need to add values to the ‘Values’ field. This is where all our created measures will be placed

No alt text provided for this image

Now we can see how the Python script is active and we can start entering code. So, let’s go!

It is very important to understand that the output of the script in Power BI desktop must be a visualization or chart, not a table or dataframe or any other object. 

No alt text provided for this image

This is how the entire script will look like

# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:

# dataset = pandas.DataFrame(age, bmi, children, region, sex, smoker)
# dataset = dataset.drop_duplicates()

# Paste or type your script code here:

# Import pycaret, pipeline and model
from pycaret.regression import *
model = load_model(r'C:\...\rf_ins')

# Create prediction with dataset
df = predict_model(model, data  ?= dataset)

# Create table visual with matplotlib to display results
import matplotlib.pyplot as plt

fig, ax =plt.subplots(1,1)
df=pd.DataFrame(df)
ax.axis('tight')
ax.axis('off')
ax.table(cellText = df.values, colLabels = df.columns, loc='center', cellLoc = 'center')
 

plt.show()

And we can see how the output of the script is a table visual with the values selected and the ‘Label’ feature, which is the Predicted value of ‘charges’. As values of slicers and filters moves, the visualization will be refreshed with the new predicion -stored in column 'Label'-.

No alt text provided for this image

To display only the column that shows the prediction, you can use the following code: 

# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script: 

# dataset = pandas.DataFrame(age, bmi, children, region, sex, smoker)
# dataset = dataset.drop_duplicates()

# Paste or type your script code here:


# Import pycaret, pipeline and model
from pycaret.regression import *
model = load_model(r'C:\...\rf_ins')

# Create prediction with dataset
pd.options.display.float_format = '{:, .2f}'.format
df1 = predict_model(model, data  ?= dataset)
df = df1[['Label']]
df.rename(columns = {'Label' : 'Predicted Charges'}, inplace = True)


# Create table visual with matplotlib to display results
import matplotlib.pyplot as plt

fig = plt.figure()

table = plt.table(cellText = df.values, loc='center', colLabels = df.columns, cellLoc = 'center')
table.auto_set_font_size(False)
table.set_fontsize(24)
table.scale(5,5)


plt.show()
No alt text provided for this image

Feel free to use another visual or another settings that will may fit better to you or your customer, to consume your app. If the reader of this blog have another visualization suggestion or want to try it in a different way, you are more than welcome to share it!

Now the user can change the slicers and filters and the prediction (column ‘Label’) will be updated accordingly.

The Power BI Desktop file can be uploaded to Power BI Service and be shared as a web app, but a ‘Pro’ license -or higher- will be needed. Also, when uploading it to Power BI Service, a Gateway in personal mode will be required. 

Pablo Moreno is co-author of the book ‘Machine Learning in Power BI with R and Python’ together with Gabriel Gomez

We want to thank Moez Ali and its great team for the creation and development of PyCaret

Isabellí Andrade

Data Scientist at @ Santander Brasil | PySpark | AI | SQL | Machine Learning | Databricks |

1 年

Alexandre, pensei em seguirmos assim, tenho um not pronto já.

Hello everyone, just updated the link to the book that I mention in the article. You can view a brief content here: https://online.pubhtml5.com/nkpo/lzyj/

Alvaro Martinez

Ingeniero Industrial

3 年

Could you have a manual how to install pycaret With power bi? I have tried installing pycaret but when i played the algorithm in powerbi python sent an error -2147467259 Thanks Regards

回复

I haven't mentioned it, but it works also with #R

要查看或添加评论,请登录

Pablo M.的更多文章

  • #ML + #RPA = IA (Intelligence Automation)

    #ML + #RPA = IA (Intelligence Automation)

    El objetivo último de la transformación digital es “permitir que los humanos hagan o produzcan más con menos recursos”.…

  • La mejor opción de Power BI con tus clientes (si sabes lo que haces)

    La mejor opción de Power BI con tus clientes (si sabes lo que haces)

    Cada vez que publico o menciono algo en LinkedIn sobre evitar la descarga de datos en Ex**l desde un dashboard, media…

    43 条评论
  • Análisis geo-espacial con Microsoft Fabric

    Análisis geo-espacial con Microsoft Fabric

    ?Alguna vez te has preguntado cómo podrías rastrear tu propia logística o quizás la ubicación de tu equipo de trabajo…

    4 条评论
  • eDiscovery: La joya escondida en Microsoft Purview

    eDiscovery: La joya escondida en Microsoft Purview

    Si trabaja con Microsoft Fabric, es posible que sepa que es responsable de todos los datos alojados en la nube, lo que…

  • Entendiendo la Transformación digital

    Entendiendo la Transformación digital

    Recientemente me encontré en una conversación conmigo mismo sobre cómo puedo explicar todo esto de la revolución de la…

    10 条评论
  • KQL Databases para tiempo-real

    KQL Databases para tiempo-real

    En mi artículo anterior La promesa de Análisis en Tiempo Real con Microsoft Fabric, hago una breve introducción a qué…

  • Vector Database: desarrollo e integración en Microsoft Fabric

    Vector Database: desarrollo e integración en Microsoft Fabric

    Estoy seguro de que ahora cualquier persona conoce o ha oído hablar de Gen-AI, ChatGPT, OpenAI o LLM. También estoy…

  • Pero... Cuánto cuesta Microsoft Fabric?

    Pero... Cuánto cuesta Microsoft Fabric?

    Probablemente la pregunta más habitual que escucho y que me hacen muchos amigos y colegas. Así que en aras de…

    7 条评论
  • Data Virtualization en Azure y Fabric

    Data Virtualization en Azure y Fabric

    La virtualización de datos es posiblemente una técnica de datos poco conocida y, en mi opinión, una de las más útiles…

    3 条评论
  • KQL vs SQL

    KQL vs SQL

    Kusto Query Language (KQL) es una poderosa herramienta para explorar sus datos y descubrir patrones, identificar…

    14 条评论

社区洞察

其他会员也浏览了