Machine Learning & It's use cases
Sourav Dinda
AWS DevOps | SRE | Senior Software Engineer | CKA | GitOps | 2x AWS certified | 3x Redhat Certified( EX294+EX180+EX280) | Python Backend Developer | AWS DevOps Engineer | FastAPI | Flask | Trainer | Codechef
WHAT IS ML ?
According to Arthur Samuel, Machine Learning algorithms enable the computers to learn from data, and even improve themselves, without being explicitly programmed.
Machine learning (ML) is a category of an algorithm that allows software applications to become more accurate in predicting outcomes without being explicitly programmed. The basic premise of machine learning is to build algorithms that can receive input data and use statistical analysis to predict an output while updating outputs as new data becomes available.
Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves.
The process of learning begins with observations or data, such as examples, direct experience, or instruction, in order to look for patterns in data and make better decisions in the future based on the examples that we provide. The primary aim is to allow the computers learn automatically without human intervention or assistance and adjust actions accordingly.
But, using the classic algorithms of machine learning, text is considered as a sequence of keywords; instead, an approach based on semantic analysis mimics the human ability to understand the meaning of a text.
Types of Machine Learning?
Overview of Supervised Learning Algorithm
In Supervised learning, an AI system is presented with data which is labeled, which means that each data tagged with the correct label.
The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data.
As shown in the above example, we have initially taken some data and marked them as ‘Spam’ or ‘Not Spam’. This labeled data is used by the training supervised model, this data is used to train the model.
Once it is trained we can test our model by testing it with some test new mails and checking of the model is able to predict the right output.
Types of Supervised learning
- Classification: A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”.
- Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”.
Overview of Unsupervised Learning Algorithm
In unsupervised learning, an AI system is presented with unlabeled, uncategorized data and the system’s algorithms act on the data without prior training. The output is dependent upon the coded algorithms. Subjecting a system to unsupervised learning is one way of testing AI.
In the above example, we have given some characters to our model which are ‘Ducks’ and ‘Not Ducks’. In our training data, we don’t provide any label to the corresponding data. The unsupervised model is able to separate both the characters by looking at the type of data and models the underlying structure or distribution in the data in order to learn more about it.
Types of Unsupervised learning
- Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior.
- Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.
Overview of Reinforcement Learning
A reinforcement learning algorithm, or agent, learns by interacting with its environment. The agent receives rewards by performing correctly and penalties for performing incorrectly. The agent learns without intervention from a human by maximizing its reward and minimizing its penalty. It is a type of dynamic programming that trains algorithms using a system of reward and punishment.
In the above example, we can see that the agent is given 2 options i.e. a path with water or a path with fire. A reinforcement algorithm works on reward a system i.e. if the agent uses the fire path then the rewards are subtracted and agent tries to learn that it should avoid the fire path. If it had chosen the water path or the safe path then some points would have been added to the reward points, the agent then would try to learn what path is safe and what path isn’t.
It is basically leveraging the rewards obtained, the agent improves its environment knowledge to select the next action.
What are the steps used in Machine Learning?
There are 5 basic steps used to perform a machine learning task:
- Collecting data: Be it the raw data from excel, access, text files etc., this step (gathering past data) forms the foundation of the future learning. The better the variety, density and volume of relevant data, better the learning prospects for the machine becomes.
- Preparing the data: Any analytical process thrives on the quality of the data used. One needs to spend time determining the quality of data and then taking steps for fixing issues such as missing data and treatment of outliers. Exploratory analysis is perhaps one method to study the nuances of the data in details thereby burgeoning the nutritional content of the data.
- Training a model: This step involves choosing the appropriate algorithm and representation of data in the form of the model. The cleaned data is split into two parts — train and test (proportion depending on the prerequisites); the first part (training data) is used for developing the model. The second part (test data), is used as a reference.
- Evaluating the model: To test the accuracy, the second part of the data (holdout / test data) is used. This step determines the precision in the choice of the algorithm based on the outcome. A better test to check accuracy of model is to see its performance on data which was not used at all during model build.
- Improving the performance: This step might involve choosing a different model altogether or introducing more variables to augment the efficiency. That’s why significant amount of time needs to be spent in data collection and preparation.
:: USE CASES ::
Machine learning is exceptionally good at conducting repetitive tasks, finding patterns, and predicting outcomes. When implemented correctly, and used for the right use cases, it can reduce costs, save time, and open up new sources of revenue for companies. But it can also have positive benefits for society as it can scale things that normally would be too expensive and too difficult to do. Which leads me to my first favorite practical use case for machine learning.
Detecting Fake News
This is not an easy problem to solve, partially because fake news is a moving target. There are several different types of fake news; news that just simply isn’t true, news that is highly biased but not factual inaccurate, satire, misleading etc.
Theoretically, one could sort out what classes or categories fake news falls into, train a model on those categories, and then be able to predict or detect fake news in future content. The challenge is no longer the tech chops required to build the right models, but the work it would take to get the right training data. For example, you can use Classification box by Machine Box to create a fake news detector in a few minutes, as long as you have the sample data.
Fake news is a real problem, and part of the problem is that there aren’t enough humans to sit there and manually sift through every article to determine its genuineness. This is the kind of scale problem that machine learning is really good at solving.
Face Authentication
I think it is reasonable to argue over the true effectiveness and necessity of Apple’s iPhone X Face ID technology as compared to the finger print sensor method for authentication. Both are incredibly convenient, and pretty darn good at solving the problem of remembering and typing in passwords.
What I particularly like about using face recognition to solve the use case of authentication is that it can be applied to almost any device without special hardware. Just about every computer or phone today has a camera in it, and a camera is all you need to do face recognition. It may be necessary, in some cases, to add extra hardware to verify that someone isn’t just holding up a picture of your face to fool the system, but I don’t think its an absolute requirement. First of all, there are ways around that. You can use some logic to detect movement, or track people’s eyes, or ask them to enter in a shorter pin to keep the speed of authentication high but to also provide a secondary method of verification.
But secondly, you may be able to have scenarios where attempts to spoof the system are unlikely or inconsequential. We have customers who are using Facebox to verify people taking tests, buying sandwiches, or entering a building. It still brings a lot of time savings, new sources of revenue, cost savings, and value to customers, since the instances where people would try to spoof it are so low as to not be worth sacrificing everything else to try and prevent.
Content Recommendation
No one really likes to be put into a box and thought of as someone who’s likes and tastes could be predicted, but whether we like it or not, we do, as a whole, act in predictable patterns. Finding those patterns can be tricky to do manually, but its a great task for machine learning.
You might not know why a certain cohort of your users choses to click on a news post about sports when they’re usually interested in science or why millennials keep trying to buy monocles from your online store. Thanks to machine learning, you don’t really have to know why. You just want to make sure you catch those kinds of trends, and exploit them to maximize engagement, revenue, clicks, views or any other metric.
For example, you can feed a tool like SuggestionBox all the data you have about users, give it some things to chose from, reward the model when people decide to click, buy or otherwise engage with something, and sit back and watch the machine learning model learn about users and their behavior on your site. Its quite astonishing.
:: Uses of Machine Learning at Netflix ::
There are 5 key areas Netflix focusses on:
Step One: Ranking & Layout
The entire catalogue of movies and shows at Netflix is ranked and ordered for each user in a personalized manner (you can blame your flatmate for messing up your algorithms). Through prolonged use, Netflix can work out what a customers' favourite shows are based on their activity. If Customer X has watched a few comedies (understandable in times like these), it can be presumed that they have an interest in comedy films/shows. Therefore, comedies would hold a higher ranking score than thriller films for example. Sounds simple right? Keep reading...
On a basic level, the recommender system learns from your account which type of series or movie you're likely to be interested in based on your previous history, and suggests the most relevant titles.
Step Two: Similarity & Promotion
Once they have found your favourites, the data is then used to find similarities across the platform for content suggestion. The similarity in regard to plot line, actor/actresses, age restriction are all taken into account.
Step Three: Evidence & Search
Through testing, correlations can then be drawn between people’s interests, watch history, etc. The results of these tests give evidence as to what is working and what isn’t. Better search and acquisition of new movies to encourage people to sign up is a machine learning problem. One of the methods used for this is 'Collaborative filtering', an ML technique whereby you try to group similar users together and then extrapolate from their consumption patterns to recommend relevant and highly personalized movie and TV shows to members with similar taste. Finding similar users is a hard problem and one which many Netflix Data Scientists spend their days trying to work out.
Step Four: Improve Models
The first stage of model improvement is the aforementioned data collection period, lasting several months to build up a large amount of good quality data. Then, A/B testing is carried out to say whether this new model is better than the current model. At this time, half the users get the new model and half the users get the old model and the results are analysed to decide which model gets rolled out.
There are many problems with batch learning methods. In the time in which it takes to build an understanding of UX on the new platform, the customer may have many months of what they deem to be a worse experience during testing.
Step Five: Explore / Exploit Learning
For explore / exploit learning, Netflix then sample a large number of hypothesises and suppress the ones that aren't doing as well as others.
- Uniform population hypotheses
- Choose a random hypothesis h
- Act according to h and observe outcome
- Re-weight hypotheses
- Go to 2
Netflix then use explore/exploit learning to find which pictures best describe movies; therefore, Netflix modifies the imagery that represents the movie to suit each customer. To be successful with this, Netflix run tests to see which images are better for each movie and how other factors such as a customers' genre preference affect their choices.
THANKS FOR READING TILL END . KEEP SHARING IF YOU LIKE .
DevOps Engineer @Amdocs
3 年Nice work ?