A brief look into machine learning models

A brief look into machine learning models

Machine learning model is a program that can find a patterns and make decisions from a previously unseen dataset. The predictions are made based on a set of data points already collected. For example, facial recognition model could be able to detect a person based on a set of facial patterns and other parameters. In natural language processing, machine learning model parse the input data and recognize the intent of previously unheard sentences.

The machine learning model can perform such tasks having it trained with large, predefined dataset. During training process, the machine learning algorithm is optimized to find patterns or outputs from the dataset. The output of this process is called a machine learning model. A model is trained to determine patterns and make decisions based on data points.?

The model training to find patterns is driven by machine learning algorithms - a mathematical method often derived from statistics, calculus and algebra. The machine learning algorithm is run on a dataset and optimize the algorithm to find certain patterns at desired level.

Different types of machine learning:

Machine Learning techniques can be classified into supervised learning, unsupervised learning and reinforced learning,

Machine learning classifications

Supervised Machine Learning:

Classification:

In supervised learning, the machine learning algorithm is provided with an input dataset and is optimized to derive a set of specific outputs. For example, in image recognition leverage supervised machine learning where the model is trained with a wide range of image datasets. With this training, the model uses a technique called classification by which the output is derived.

Classification model

Regression Model:?

Supervised Learning is also used in predictions such as traffic congestions, weather predictions, price growth, health metrics leveraging technique called regression. Regression model predicts the continuous output variable based on different set of independent input variables.

Example: In an use case to predict the number of cycles get rented on a particular hour of a day, there are multiple factors (independent variables) involved in predicting the number of rentals (dependent variable). The objective of regression machine learning algorithm is to find the best model that outputs the optimum dependent variable for a given input independent variables.

In below diagram, the plotted line denotes the line of regression that has the maximum closer data points, by which the output variable could be more accurate than any other model.

Regression model

Unsupervised Machine Learning:?

Unsupervised Machine Learning is a machine learning algorithm that is provided with an input dataset and not optimized to a specific output dataset but trained to analyze and cluster/group the dataset based on the patterns of similarities and differences in the unlabeled dataset. These algorithms discover hidden patterns or data groupings without human intervention. It's ability to discover similarities & differences in dataset makes it ideal solution for exploratory data analysis, image recognition, customer segmentation use cases.

For example, suppose the algorithm is provided an input dataset containing an image of fruits like apple, mango, banana. The algorithm is never trained with such a dataset before, which means the algorithm does not have any idea about the features & characteristics of the dataset. The task of the unsupervised learning algorithm is to identify the hidden features in the input image dataset on its own.

Unsupervised Learning is as similar as a human learns to think by own experience that makes this algorithm a real AI. It is helpful to identify useful insights from the data. It works on the unlabeled and unclassified data that makes is more important.

Unsupervised learning

In the above illustration, we have input raw unlabeled data to the unsupervised machine learning model. The model interprets the data, identify the hidden features and make decisions based on the similarities & differences in the data.

The unsupervised learning algorithm can be categorized into 2 types - Clustering and Association

Clustering model:

Clustering is a method of grouping the data objects into clusters such that the object with most similarities remains into a group and has less or no similarities with the objects of another group. Cluster analysis finds the common attributes between the data objects and categorizes them as per the presence & absence of those common attributes.

Some of the popular clustering algorithms are,

  • K-means clustering
  • KNN (K-Nearest Neighbors) clustering
  • Hierarchical clustering
  • Probabilistic clustering

A typical example of K-means clustering appear as below,

Clustering model

K-means clustering is a common example of an exclusive clustering method where data points are assigned into K groups, where K represents the number of clusters based on the distance from each group's centroid. The data points closest to a given centroid will be clustered under the same category. A larger K value will be indicative of smaller groupings with more granularity whereas a smaller K value will have larger groupings with less granularity. K-means clustering is commonly used in market segmentation, image segmentation and image compression use cases.?

Overlapping clustering differs from exclusive clustering in that it allows data points to belong to more than one cluster with separate degrees of membership.

Association model:

An association clustering is a rule-based method for finding the relationships between variables in a given dataset. These methods are frequently used for market basket analysis, allowing enterprises to better understand relationships between different products. Understanding consumption habits of customers enable businesses to develop better cross selling strategies. For example, this kind of clustering can be seen in Amazon's "Customers who bought this item also bought" recommendation.

Some of the association algorithms are Apriori, Eclat, and FP-Growth. Apriori is the most widely used algorithm.

Reinforcement Learning:

In reinforcement learning, the model is subjected to train by itself, by continually interact with the environment that acts with, using trial and error experiments. It makes the model much efficient rather than relying on any training dataset. Some of the popular use cases are autonomous driving, chess game.

For example, in chess game, the machine learning model plays the game, get the continuous feedback for the actions it performed in the game, refine it's model through learning. This continuous refinement makes the model much more efficient than the model trained with predefined set of training dataset. In this scenario, the model is named as agent, the moves performed are the actions and the chess board is the environment.

There are a set of key terminologies that help to understand reinforcement learning more in detail.

Agent: The learner and the decision maker

Environment: where the agent learns and decides what action to perform

Action: a set of actions an agent can perform in the environment

State: the state of the agent in the environment

Reward: for each action performed by the agent, the environment returns feedback, also known as reward. Most likely a scalar value that the agent can use for further training.

Policy: the decision-making function of the agent, defines the mappings from situations to actions

Value Function: it measures the optimality of a specific state. It is the expected discounted rewards that the agent collects after executing specific policy

Reinforcement learning

There are two main types of reinforcement learning algorithms: Model-based algorithms and Model-free algorithms.

Model-based algorithms:

Model-based algorithm uses experience to construct an internal model of the transitions and immediate outcomes in the environment. When a situation arise, appropriate action is chosen by searching and planning in this internal model. Model based learning prepares a model of the environment based on the experiences that the model gathered, then choose appropriate policy to apply based on the model constructed.

Model-free algorithms:

Model-free algorithm uses experiences to learn directly by choosing a policy with trial-and-error experience. As it does not construct an internal model, it finds the policy with the limited knowledge on the characteristics of the environment. In such cases, there are certain erroneous factor applied on the decision to choose a policy.

Thiyagarajan VP

IT Program Manager with extensive experience managing multiple large and complex projects, I specialize in assisting customers in achieving their goals.

1 年

Helpful!

回复

要查看或添加评论,请登录

Saravanan Ponnaiah的更多文章

  • Partitioning vs Z-Ordering - Explained

    Partitioning vs Z-Ordering - Explained

    OVERVIEW: This article explains the keys optimization techniques of Databricks - partitioning & z-ordering. Also…

  • Azure Databricks - Cluster Capacity Planning

    Azure Databricks - Cluster Capacity Planning

    Overview: Today the terminology “Data Analytics” becomes a buzz across all industries & enterprises. Every organization…

    2 条评论

社区洞察

其他会员也浏览了