Creating a Machine Learning App with Power BI and PyCaret
Can we use Power BI as application to productionalize machine learning like a web app? Of course!
Let’s create a Machine Learning App in Power BI, but no in Power Query, rather, let’s do it in Desktop, so our customers can interact with the model.
Techniques like this a much more will be available at the book that we are working on ‘Machine Learning in Power BI with R and Python’. This book is planned to be released in July 2021 in Spanish, and November 2021 in English. If you are interested to know more about this book, you can view the recorded presentation here or view the preliminary content here
We will create a model in Python using -the amazing’ PyCaret package, created by @Moez Ali. In this example, we will use the insurance dataset to create a regression project, as we want to predict ‘charges’, given the other variables: ‘age’, ‘sex’, ‘bmi’, ‘children’, ‘smoker’ and ‘region’. We select ‘random forest’ as model.
After doing the whole pipeline in Jupyter, we save the model and the entire pipeline in a ‘pkl’ file named ‘rf_ins.pkl’
Once the pipeline and the trained model is completed, we need 3 things to pur the model in production:
- The saved pipeline
- The trained model
- New data
We do have already the saved pipeline and the trained model, so all we need is new data.
To use Power BI as a web app for machine learning, all we have to do is create the different columns in Power BI Desktop using DAX and use slicers to allow the end user to select values for each feature. Finally let's a Python visualization where machine learning results will be displayed. Every time slicers and filters changed, the model will update the visual with the new prediction.
In our example, let’s open Power BI and create the same columns that we used to train the model: ‘age’, ‘sex’, ‘bmi’, ‘children’, ‘smoker’ and ‘region’. We don’t need to create ‘charges’ as that’s the predicted value.
Creating the new data
Continuous variables can be created using the ‘What if’ parameters, as in addition of a calculated column, it will produce a measure. The value of such measure will be the ‘SELECTEDVALUE()’ function by using the filter. We can do the same thing for the categorical features. We will use the measures as the new dataframe within the Python visual at which we will do the predictions.
It is imperative that the measures have the same name as the column names used on the training dataset, otherwise the model will not recognize them. Let’s go step by step.
Step 1: Creating the numeric features
To create the numeric features, go to Modeling > New parameter and start filling the dialogue box. We will create the ‘age feature’ first, and we must do the same with all the other features.
Once you click ‘OK’, you will see how a new calculated table is crated with 1 calculated column and its measure. Also, a new slicer is added to the canvas.
We need to rename the measure to ‘age’, as that’s what we will use for the new dataframe. All you must do is double click on the measure and leave it as ‘age’ (fully trimmed, no blank spaces)
Let’s do the same with the rest of the numeric features.
As ‘What-if parameter’ in Power BI only works with numerical, date or decimal features, so, for the categorical features we need to create a new table with just one column with the categories of the feature and create a measure associated using SELECTEDVALUE().
Step 2: Create categorical features
To create the categorical features, let’s use Home > Enter data, and create a new table with the categorical feature ‘sex’, which it contains only 2 values: ‘male’ and ‘female’
Once ‘Load’ is clicked, a new table with 1 column and the two values is crated. Let’s now add the measure called ‘sex’ using the following DAX expression:
sex = SELECTEDVALUE( ‘sex feature’[sex feature] )
To create a new measure, just right click with the over the table ‘sex feature’ and select ‘new measure’, and the formula bar will be activated. Once the DAX expression is added, click ‘Enter’, and the measure will be added to the table.
Let’s do the same thing with ‘smoker’ (with only two values: yes, and no) and ‘region’ (with four values: southwest, northwest, southeast, northeast)
For the three categorical features, let’s add slicers to the canvas, one for each feature. To add slicers, just click on the slicer icon of Visualization pane and select the created column (not the measure) of each table.
It is recommended at this point not to leave the slicers and sliders blank, for that reason, the categorical slicers should be set up to ‘Single select’
Creating the prediction
Now that we have some values selected, let’s create the prediction visualization. To do this, let’s click on the ‘Py’ icon at the Visualizations pane
Once the user enables the Python visual, the script appears, but it is blank and no code can be entered. To activate the script, we need to add values to the ‘Values’ field. This is where all our created measures will be placed
Now we can see how the Python script is active and we can start entering code. So, let’s go!
It is very important to understand that the output of the script in Power BI desktop must be a visualization or chart, not a table or dataframe or any other object.
This is how the entire script will look like
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script: # dataset = pandas.DataFrame(age, bmi, children, region, sex, smoker) # dataset = dataset.drop_duplicates() # Paste or type your script code here: # Import pycaret, pipeline and model from pycaret.regression import * model = load_model(r'C:\...\rf_ins') # Create prediction with dataset df = predict_model(model, data ?= dataset) # Create table visual with matplotlib to display results import matplotlib.pyplot as plt fig, ax =plt.subplots(1,1) df=pd.DataFrame(df) ax.axis('tight') ax.axis('off') ax.table(cellText = df.values, colLabels = df.columns, loc='center', cellLoc = 'center') plt.show()
And we can see how the output of the script is a table visual with the values selected and the ‘Label’ feature, which is the Predicted value of ‘charges’. As values of slicers and filters moves, the visualization will be refreshed with the new predicion -stored in column 'Label'-.
To display only the column that shows the prediction, you can use the following code:
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script: # dataset = pandas.DataFrame(age, bmi, children, region, sex, smoker) # dataset = dataset.drop_duplicates() # Paste or type your script code here: # Import pycaret, pipeline and model from pycaret.regression import * model = load_model(r'C:\...\rf_ins') # Create prediction with dataset pd.options.display.float_format = '{:, .2f}'.format df1 = predict_model(model, data ?= dataset) df = df1[['Label']] df.rename(columns = {'Label' : 'Predicted Charges'}, inplace = True) # Create table visual with matplotlib to display results import matplotlib.pyplot as plt fig = plt.figure() table = plt.table(cellText = df.values, loc='center', colLabels = df.columns, cellLoc = 'center') table.auto_set_font_size(False) table.set_fontsize(24) table.scale(5,5) plt.show()
Feel free to use another visual or another settings that will may fit better to you or your customer, to consume your app. If the reader of this blog have another visualization suggestion or want to try it in a different way, you are more than welcome to share it!
Now the user can change the slicers and filters and the prediction (column ‘Label’) will be updated accordingly.
The Power BI Desktop file can be uploaded to Power BI Service and be shared as a web app, but a ‘Pro’ license -or higher- will be needed. Also, when uploading it to Power BI Service, a Gateway in personal mode will be required.
Pablo Moreno is co-author of the book ‘Machine Learning in Power BI with R and Python’ together with Gabriel Gomez
We want to thank Moez Ali and its great team for the creation and development of PyCaret
Data Scientist at @ Santander Brasil | PySpark | AI | SQL | Machine Learning | Databricks |
1 年Alexandre, pensei em seguirmos assim, tenho um not pronto já.
Hello everyone, just updated the link to the book that I mention in the article. You can view a brief content here: https://online.pubhtml5.com/nkpo/lzyj/
Ingeniero Industrial
3 年Could you have a manual how to install pycaret With power bi? I have tried installing pycaret but when i played the algorithm in powerbi python sent an error -2147467259 Thanks Regards
I haven't mentioned it, but it works also with #R