Utilizing the Model Builder and AutoML for creating Lead Decision and Lead Scoring model in Microsoft ML.NET
Photo by Rodolfo Clix from Pexels

Utilizing the Model Builder and AutoML for creating Lead Decision and Lead Scoring model in Microsoft ML.NET

Recently, I wrote an article explaining the utilization of the ONNX format in integrating the Scikit-learn lead scoring machine learning model into the .NET ecosystem. I described one possible way of deploying the Python-based regression model as Microsoft Azure Function. That is a procedure applicable for integrating the trained model as part of the Web API or Console Application as well. What I have mentioned there was the opportunity to use the approach for bridging the technical differences between the different data science and application development platforms, in this case targeting the .NET cross-platform framework. Talking about Microsoft .NET, which is the technology I am professionally working with daily, in this article I want to uncover the native machine learning potential of the framework, more specifically, the ML.NET.

Since I have already presented the lead scoring idea in .NET, I will proceed presenting the implementation of the lead decision solution as a continuation of the lead decision solution designed and implemented using the KNIME Platform. As mentioned in the article, it conceptually follows the same approach and supervised ML idea, differing only in the classification-based prediction strategy. Besides this, I will also cover and leverage the idea of Lead Scoring as part of the created model`s prediction evaluation.

* Note: This article's solution design and source code are simplified to emphasize the core concepts and integration strategy in general. Still, it is a fully functional approach for training, building, evaluating and implementing predictive decision-based/supervised driven models within real-life testing or production deployed prototypes and applications environments.

** Note: I will design and build the solution utilizing the ML.NET Model Builder powered by the automated machine learning, or AutoML, using the intuitive and user friendly graphical Visual Studio extension. The more detailed techniques for building data processing and transformation pipelines, customizing the train and test data splitting, applying the concept of cross-validation and interpretation of the model performance and evaluation are beyond the scope of the article and can be referenced via the ML.NET API.

What is ML.NET?

ML.NET represents an open-source and cross-platform machine learning framework that can incorporate and use the machine learning algorithms in .NET related applications. Providing support for various popular use cases, it is equipped for building different models in different business domains using the already stored application`s data. Its central concept is designed around the Model Builder, a tooling mechanism specifying the steps needed to transform the input data into a prediction. Complementary, it is also using the Automated ML, a concept known as AutoML, which wraps and simplifies the interface providing automatically generated code projects for describing the input data and consuming the models. In addition to this, the integration of machine learning is supported by another fundamental module, the ML.NET CLI. ML.NET, in general, is supporting training, building and evaluating machine learning models using the C# and F# programming languages.

Environment Preparation

Following the newest Microsoft trends and announcements, Visual Studio 2022 and the final (so far) version of the .NET 6 will be officially released in November 2021. Traditionally, they are releasing preview versions for community use in the context of shaping the major release at its finest. In terms of this, I am excited to proceed with the practical presentation using the Microsoft Visual Studio Community Preview edition (version 17.0.0, preview 4.1) and the current .NET 6 Preview version of the framework.

No alt text provided for this image

* Note: The prerequisite for using the ML.NET Model Builder are Visual Studio 2019 16.10.4 or later/.NET Core 3.1 SDK or later.

Creating the Solution

I will begin the demonstration from scratch, with creating new solution “LeadGeneration” consisting of single console application named “LeadDecision”, process where it is important to select the .NET 6 (Preview) as a target framework. Completion of this setup will result in creating empty C# console application following the new .NET 6 template.

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

ML.NET Extension

Installing and enabling the ML.NET Model Builder in Visual Studio can be configured using the Visual Studio Installer and modifying the current version installation accordingly. In general, the ML.NET component is placed under the Individual components, more precisely in the .NET section.

No alt text provided for this image
No alt text provided for this image

Additionally, the ML.NET Model Builder UI extension tool should be installed in the Extensions management area available on the main menu bar.

No alt text provided for this image

Dataset

Since this article will follow the design and implementation of the lead decision predictive ML model, I will use the advantage of the identical analyzed, processed and scaled dataset used in my previous article for building the model using the KNIME platform. In terms of this, the initial Lead Scoring raw dataset is publicly available on Kaggle.

ML.NET Model Builder Setup

As mentioned before, the Model Builder represents a very user-friendly graphical tool extension for managing the machine learning process in Visual Studio. Its main characteristic is the mbconfig configuration file, which is managing the current session and keeping track of the changes specific for each phases of ML model building. The Model Builder providing the complete ML experience can be added within the created project following the procedure presented below.

No alt text provided for this image
No alt text provided for this image

Adding the Machine Learning support will actually open the Model Builder details, where I will walk through all steps in order to build and evaluate the lead decision model.

Scenario Selection

Model Builder is coming equipped with a lot of different built-in scenarios for machine learning application. In fact, each scenario is mapped to a different learning approach, depending on the specific business domain use case. Since I am building lead decision predictive model, I will select the Data classification scenario, based on the classification related algorithms.?

No alt text provided for this image

Training Environment

As displayed, the Data classification scenario is only locally supported, meaning that the model training will be executed on my local machine. So, considering that the Azure and GPU training modes are not currently supported, the only valid selection here remains the Local environment with the power of the CPU.

No alt text provided for this image

Import the Data

There are two different options of importing the dataset, using file or through the data source from SQL Server instance. Taking in consideration that I have already exported the csv file from Jupyter Notebook, I will browse the local system path to import it.?

No alt text provided for this image

As it is presented in the screenshot, the result of the successful importing is the dataset preview followed by additional data configuration options. In terms of this, the next step is to select the ‘Converted’ as a label or target column, but also open the advanced data options in order to check and configure the other features. The configuration consists of setting all other columns as single numerical features, excluding the 'column1' which is not relevant for the model building (holding numerical information related to the specific row number).

No alt text provided for this image

* Note: Class 0 represents the ‘Not Converted’ leads, while Class 1 represents the ‘Converted’ leads. The ‘Converted’ column can also be aligned as a categorical one within the process of data processing - so that instead of working with integers (converted into strings), the categorical description can take place.

Train Model

The next step is dedicated to the train configuration where I will configure the time for training the model. In fact, this is an automatic process where the Model Builder is leveraging the benefits and flexibility of the AutoML to investigate and apply many different models with a wide range of parameters. So, overriding the default time interval of 10 seconds with specifying more time for training will uncover the possibility of exploring more models maximizing the chances of retrieving more accurate final model. According to this, I will set the time interval to 900 seconds (15 minutes). It is worth mentioning here that there is official Microsoft guideline regarding for the recommended time interval in accordance to the dataset size.

No alt text provided for this image

The training process and different models’ selection is available within the Output Window, where every model selection as well as iteration accuracy is presented.

No alt text provided for this image
No alt text provided for this image

In the end, when the training process is successfully completed, the Model Builder is a complete generating Experiment Results Summary which is in the format presented in the screenshot below.

No alt text provided for this image

Also, I can review the output of the process in the Train area, where the best performant model is presented.

No alt text provided for this image

In this particular scenario, the LightGbmMulti (LightGbmMulticlassTrainer Class) was evaluated as a best algorithm fit providing the best model accuracy.

Evaluate Model

Going further, I have the opportunity of evaluating the best model accuracy. Also, it is worth emphasizing the fact that I have a possibility of interacting with the model, meaning that I can provide some so far unseen data and immediately review the prediction.

No alt text provided for this image

Consume Model

After I am done the evaluation process, I am proceeding with the consuming model screen. As it is presented in the screenshot below, there is a code snippet for explicit model integration and consummation in the end application of interest. The end application can be the console application I already created, but I will also present how to integrate the model within the generated Web API (using the Add Web API solution options).

No alt text provided for this image

I will generate the Web API application as separate project within the solution named as LeadDecisionModel_API.

No alt text provided for this image
No alt text provided for this image

Next Steps

There are two additional possibilities as a wrap up of the Model Builder journey, the 'Deploy your model' and 'Improve the model' sections. In fact, they are currently implemented as a redirection buttons addressing the official documentation where more details related to the model improvement and deployment can be found.

No alt text provided for this image

Console Application touch

Taking in consideration the generated code snippet for integrating and consuming the model within the Model Builder`s Consume step, I will copy it within the Program.cs file.

* Note: The mbconfig configuration file is accessible even after the UI tool is closed.

After commenting the predefined console application template code and pasting the generated code from the Model Builder, I only need to reference the LeadDecision namespace where the actual model input class was generated.

No alt text provided for this image

In the context of this, utilizing the Visual Studio built-in debugger, I will start the application and review the type as well as the content of the result output object.

No alt text provided for this image

So, as it can be observed, the result output represents ModelOutput object, including the prediction and scores or probabilities of successful or not successful lead conversion. This explicitly means that we can use this approach for creating and analyzing machine learning models for predictive Lead Decision and Lead Scoring system.

No alt text provided for this image

Before integrating the model into the previously generated Web API, I want to present and explain the LeadDecisionModel.mbconfig structure.

No alt text provided for this image

The LeadDecisionModel.training.cs file is consisted of the generated machine learning model, including the selected algorithm and its configuration parameters, input features set, and the configuration for the predicted label. As presented in the screenshot below, everything is wrapped in a pipeline transformation referenced in the method for executing the pipeline and fitting the model.

No alt text provided for this image

On the other side, the LeadDecisionModel.consuption.cs file is consisted of the ModelInput class describing the input features, ModelOutput class containing the prediction output result (prediction and scores) as well as the PredictEngine covering the Predict functionality of the model. The last is generated using the MLContext?class and ITransformer interface, which can be utilized for more advanced custom ML solutions.?

No alt text provided for this image
No alt text provided for this image

Finally, the LeadDecisionModel.zip archive contains all files needed for creating and training the machine learning model. As it is visible from the previous screenshot, the path of the zip archive is referenced in the prediction engine construction, from where the supervised-based predictions are made.

No alt text provided for this image

Web API touch

Let`s recall that the Web API project was automatically generated using the Add to solution option within the Model Builder UI tool. So, it is created as a separate project within the LeadGeneration solution, automatically incorporating the identical files for supporting the integration of the created model. In general, the ML model for Lead Decision and Lead Scoring is now an integral part of the API project. Taking in consideration the .NET target framework, the Web API is now following new and lightweight minimal design, generated by the ML.NET Model Builder presented in the screenshot below.

No alt text provided for this image

I will start the API project and try to call the predict endpoint, since the MapPost functionality is defining an exposing its route and handler. In terms of this, I will proceed to use the Postman API Platform for preparing the endpoint url and the POST request body.

* Note: The binding url and port are part of the iisSettings configuration section within the launchSettings.json configuration file.?

No alt text provided for this image
No alt text provided for this image

The predict API call was successful, retrieving back the lead model`s prediction/decision and scores.

Final Words

In this article, I presented a detailed step by step process of building a Lead Decision and Lead Scoring predictive machine learning model leveraging the advantages of the Model Builder AutoML within the ML.NET framework. This machine learning framework is part of the .NET ecosystem and can be very easily utilized to build different use case scenarios related to specific business domain tasks and data. As mentioned within the referenced articles, this strategy is part of the broadest systematic approach of working and interpreting marketing and sales data to establish a more insightful and practical lead generation process.

Working natively within the .NET ecosystem, using the ML.NET and automated Model Builder UI tool, can be an advantage for the application developers feeling enthusiastic about integrating machine learning within the applications. Moreover, ML.NET represents a fully designed framework for the software developers experienced in machine learning and artificial intelligence.

I am using the ML.NET in combination with the Python-based Scikit-learn ML library, R Studio, and the KNIME platform and trying to maximize the full potential and flexibility of all different possibilities and built-in algorithms. From my point of view, there is not a single and best environment or library, but everything depends, and it is up to the concrete business use case and domain.

------------------------

Thank you for reading the article. I believe it is constructive and comprehensive in covering all aspects of building and evaluating machine learning models using the Microsoft ML.NET framework.

Currently, I am working on utilizing the machine learning algorithms in bioinformatics, more precisely in understanding the role of the microbiome in cancer diagnostics and therapeutics. Therefore, I am combining the potential of all previously mentioned platforms to explore the full potential of the data and the knowledge as well as the insights behind it.

Feel free to start a discussion and share your thoughts and experience in this regard.

I would be grateful if you take the time to comment, share the article and connect for further discussions and potential collaboration.

要查看或添加评论,请登录

Miodrag Cekikj, PhD CSE的更多文章

社区洞察

其他会员也浏览了