Building Machine Learning Models With Microsoft Azure Machine Learning Studio
Why machine learning is important?
Machine learning (ML) not only helps us understanding our data better but also more simple and intelligent ways for making better future business decisions. It is impossible to analyse and making sense of all the massive amount of internal and external data available.
As Microsoft’s CEO mentioned that the machine learning, empowered by big data, a key development in his memo to Microsoft.
Data is a new oil, we should be mining data for venturing new way of living and critical decision making not only in business, but also in our daily life.
ML will help us in taking faster decisions, develops insights beyond human capabilities, act on the right problems, taking advantage of the opportunities.
Hershey’s is running USD 7.4B business without almost a data scientist on Azure machine learning -
· only through the drag and drop interface on the canvas
· without any programming
· large varieties of algorithms
· large amount of pre-built APIs available as a service
· can go from experiment to production API in a matter of minutes
· complete flexibility of data storage
· supports variety of data storage options. Companies such as Tata, Uber, Roll-Royce, aircraft engines, X-box, Landrover have started using Azure ML, cloud-based ML which supports SQL, R and Python scripts
The growth trend of data-
90% of today's data is created in the last two years alone.
What is Machine Learning (ML)?
ML is a sub-part application of artificial intelligence that provides systems the ability to learn and improve the performance without explicit programming. The main aim of machine learning is developing programs which can access data, learn automatically, find patterns and make predictions for the future without being explicitly programmed.
“The field of Machine Learning seeks to answer the question “How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?” – Carnegie Mellon University
“Machine Learning at its most basic is the practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something in the world.” – Nvidia
How computers learn?
Computers learn same like us. Human behaviour is mainly dependent on their past behaviour. Machine learning algorithms are also dependent on the extraction of knowledge from the past data and build algorithms based on the patterns found on the past data.
Machine learning finds its application across various domains such as artificial intelligence, deep learning, statistics and data mining.
Machine learning can be subcategorized into
- Supervised learning
- Unsupervised learning
- Reinforcement learning
Supervised learning is related to our physical life when we have started learning from our parents, teachers or friends about our physical world during our childhood.
Unsupervised learning has started in our life when we have started taking our own actions or decisions without any past outcome-based intelligence learning or experiences.
Reinforcement learning rewards the good behaviour and penalises the bad behaviour. The main goal is maximising the gain or rewards.
For an example, if you pay your bill on time to credit card company, you will have a good credit history. The unsupervised learning algorithms can group the good credit history accounts and find patterns in their behaviour. Without any past data, you are trying to maximize the future gain or reward. An another example, if your marketing company wants to maximize the ROI on advertisement through maximum click though rate or how are they going to target set of people who will have interest for this product or services.
Why Azure ML is a preferred ML tool to use?
Azure ML is a cloud based predictive analytics service. It provides tools to create complete machine learning solution on the cloud through building a quick model and deploying the Azure studio. It allows models to be deployed as web services. Azure ML solves most complex ML problems with a large library of pre-built machine learning algorithms and modules. Any pre-built algorithm will also work in this environment.
Azure ML studio provides an interactive and visual workplace to easily build and iterate on a predictive analysis model. You can drag and drop your data set for analysis purpose on the canvas and connect them altogether for an experiment purpose inside of the Azure studio being on the cloud environment.
The standard workflow of Azure ML
In the below example, we are going to create an automatic loan eligibility checking and approval process through Azure ML.
Data set example
At the first stage, we need to cleaning missing data twice- because one is for categorical variables and another one is for numerical variables.
We want to replace missing values with the most frequent value in the categorical feature columns. Hence, we select replace with mode. This was the reason that we have selected credit history as well so that we retain the 2 unique values and did not get into the trap of mean.
Let's now launch the column selector and select numeric features which have got missing values. Those were loan amount and loan term. Due to numeric features, we canl replace the missing values with the mean or average of all the values in that feature for now.
We will be preparing the machine learning exercise for a supervised learning problem.
Missing Vales categories
Select column parameter:
All columns are selected except the Loan_Id for selecting columns in the data set
You need to split the dataset into train and test. The train dataset will help your model to get the required training and come up with a transformation algorithm using which, you can predict the new outcome.
Whereas the test dataset will help us validate your model.
So actually, you will split the existing data and use part of the data for training your model and the rest for testing the result as well as validating the results for accuracy purpose.
Let's make the correct connections of output of previous transformation to the split data module's input
It takes some parameters and we are going to split it in 70:30 ratio. Hence, this value will be 0.7. The Random seed is 123 and also do the stratified split on the column, loan status, so that you have the even distribution of values between train and test data set.Now, launch the column selector and select the loan status (predicted variable).
The initial data output
Scored test data of 30%
The model validation result
The AUC is 79% and accuracy is 81% which is a pretty good model.
Azure ML gives the environment to compare two models and choosing the best one after validation.
Adult Census Income Prediction Model- Two Class Decision Forest
The data sample for the adult census income prediction -
Model Creation
Now, adding SVM (Support Vector Machine) model to the existing Two-class decision forest model to compare the best model output.
The accuracy of the AUC curve using SVM is slightly less (88.6%) than the decision forest(89.3%). This model is typically used when speed is more important than the accuracy.The accuracy as well as the AUC using SVM is slightly less than the decision forest. This model is typically used when speed is more important than the accuracy.
Building a linear regression model and comparing two algorithms for the best model selection based on the accuracy of the model.
Model Output
All the errors for boosted decision tree are less than the errors for OLS. Also, the coefficient of determination is higher .882 compared to the 0.862 of linear regression. So, we conclude that the decision boosted tree has performed much better than the linear regression using OLS. Creating and comparing different algorithms on Azure ML is quite easy. If we need to compare few more models such as, linear regression using gradient descent or any other such model, we can simply create another branch and compare them using evaluate model.