Machine Learning for Beginners: The 3 Basic Strategies

Machine Learning for Beginners: The 3 Basic Strategies

If you're new to the world of machine learning, you may be wondering where to start. There are many different approaches to machine learning, and it can be tricky to know which one is right for you. In this blog post, we will discuss the three most basic strategies for machine learning: supervised learning, unsupervised learning, and reinforcement learning.

We'll also provide a few examples of how each approach can be used in practice. So read on to learn more about these essential techniques!

AI vs ML vs Deep Learning

No alt text provided for this image

There is a lot of confusion about the difference between artificial intelligence (AI) and machine learning (ML). The two are often used interchangeably, but they are actually quite different.

  • Artificial Intelligence (AI) is the broader concept of machines being able to perform tasks that normally require human intelligence, such as understanding natural language and recognizing objects in images.
  • Machine Learning (ML) is a subset of AI that involves using algorithms to automatically improve the performance of a computer system on a specific task, such as recognizing objects in images or translating text from one language to another.

So while all machine learning is artificial intelligence, not all artificial intelligence is machine learning. For example, IBM's Watson computer system is an example of AI that doesn't use any machine learning algorithms. However, the majority of AI research these days focuses on machine learning.

  • Deep learning is a subset of machine learning that uses neural networks to learn in an hierarchical manner, with each layer of neurons learning to recognize patterns in data.

The key difference between machine learning and deep learning is that deep learning can automatically learn representations of data in multiple layers, whereas machine learning usually only learns one level of representation. This makes deep learning more powerful for tasks such as image recognition and natural language processing.

Deep Learning is similar, using a very large dataset (that may be labelled like the apples in the supermarket — supervised learning) or not (like all the other apples you have seen and had to determine for yourself what they are — unsupervised).

No alt text provided for this image

There are three main types of machine learning algorithms: (i) supervised learning, (ii) unsupervised learning, and (iii) reinforcement learning.

No alt text provided for this image

Supervised Learning

No alt text provided for this image

Supervised learning is the most common type of machine learning. In supervised learning, the training set consists of data that have been labeled and annotated by a human observer. The goal of supervised learning is to build a model that can learn from these labeled examples and generalize to new data. In short, it is used to learn from a set of training data, which includes both the input data and the desired output. The training data is used to create a model that can be used to predict the output for new data.

Common Use Case - This approach is often used for tasks such as image classification, facial recognition, and spam detection.

If you have a lot of data and computing power, but no labels, you may be able to use unsupervised learning. This is a Machine Learning technique where the algorithms are left to their own devices to try to find structure in data. It’s often used for things like facial recognition, where it’s hard to get labeled data. The challenge with unsupervised learning is that it can be hard to know if the algorithm has learned anything at all, since there are no labels to compare the results against.

 There are 2 types of supervised learning approaches:

·       Classification - used to group similar data points into different sections in order find rules that explain how they separate. Machine learning uses these explanations and finds patterns within them so it can identify what class an individual belongs too based on their input values/features alone!

·       Regression - The difference between classification and regression is that in Regression, instead of outputting a class definition for something (for example "dog" or “cat"), it outputs an error value which tells you what the correct answer should be.

Features

Some common features of supervised learning algorithms include:

  • Learning curves: A graph that shows how well the algorithm is doing as it learns more and more from the training data.
  • Accuracy: The percentage of correct predictions made by the algorithm on the test data.
  • Precision: The percentage of predictions that are actually correct, out of all predictions made by the algorithm.
  • Recall: The percentage of actual desired outputs that are correctly predicted by the algorithm.
  • F score: A measure of accuracy that takes both precision and recall into account.

Algorithms

Some popular supervised learning algorithms include:

Classification

  • Logistic Regression.
  • Na?ve Bayes.
  • Stochastic Gradient Descent.
  • K-Nearest Neighbors.
  • Decision Tree.
  • Random Forest.
  • Support Vector Machine.

Regression

  • Logistic Regression.
  • Na?ve Bayes.
  • Stochastic Gradient Descent.
  • K-Nearest Neighbors.
  • Decision Tree.
  • Random Forest.
  • Support Vector Machine.

Challenges

  1. One of the challenges of supervised learning is that it can be time-consuming and expensive to label data sets. In some cases, this can be a business obstacle. If a company cannot generate enough quality labels quickly, they may miss out on key opportunities for innovation.
  2. Another challenge of supervised learning is that models can often overfit training data. This means that the model performs well on the training data set but does not generalize well to new data. This can be a problem when deploying models into production because the performance on real-world data may be disappointing.
  3. A third challenge of supervised learning is related to feature engineering. In many cases, expert knowledge is required to identify relevant features for predictive modeling. This can be a difficult and time-consuming task, especially for large data sets.

Unsupervised Learning

No alt text provided for this image

Unsupervised learning is a less common but still important type of machine learning. In unsupervised learning, the training data is not labeled or annotated by a human. Instead, the goal of unsupervised learning is to build a model that can learn from this data and find patterns or relationships within it. In short, it is used to learn from a set of training data without any desired output.

Common Use Case - This approach is often used for tasks such as clustering, dimensionality reduction, and anomaly detection.

Features

Some common features of unsupervised learning algorithms include:

  • Clustering: Clustering is a technique for grouping similar instances together. This can be used to segment customers into groups, cluster documents by topic, or even group genes together based on their expression levels.
  • Dimensionality reduction: Dimensionality reduction is a technique for reducing the number of features in a dataset. This can be used to remove noise from data, or to make datasets easier to work with by reducing the number of features that need to be processed.
  • Anomaly detection: Anomaly detection is a technique for identifying outliers in a dataset. This can be used to find fraudulent transactions, detect faulty equipment, or identify unusual patterns in data.
No alt text provided for this image

Algorithms

Some popular unsupervised learning algorithms include:

  • K-means clustering
  • Hierarchical clustering
  • Principal component analysis
  • Singular value decomposition

Challenges

Unsupervised learning is a type of machine learning that does not require any labeled data. That makes it more challenging than supervised learning, which does require labeled data. Some common challenges of unsupervised learning include:

1. Identification of relevant features: Unsupervised learning algorithms need to identify relevant features in the data in order to group similar instances together. This can be tricky, especially with large and complex datasets.

2. Lack of feedback: Without labeled data, unsupervised learning algorithms can only receive general feedback about whether their predictions are correct or not. They don't know which specific predictions are wrong, so they can't learn from their mistakes as effectively as supervised learning algorithms can.

3. Difficulty of tuning: Unsupervised learning algorithms often have many parameters that can be tuned. This can make it difficult to find the right combination of settings that works best for a particular dataset.

Despite these challenges, unsupervised learning is still a powerful tool that can be used to discover hidden patterns in data.

Reinforcement Learning

No alt text provided for this image

Reinforcement learning is a type of machine learning that is concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The agent receives rewards for performing correct actions and incurs penalties for performing incorrect actions. In short, it is a type of learning where the algorithm tries to learn by trial and error from its own mistakes and successes.

Common Use Case - This approach is often used for tasks such as game playing and robotics.

Features

Some common features of reinforcement learning algorithms include:

  • Learning by trial and error: The ability to learn from its own mistakes and successes.
  • Reward and punishment: The agent receives rewards for performing correct actions and incurs penalties for performing incorrect actions.
  • Exploration and exploitation: The ability to explore new options and exploit existing knowledge.

Algorithms

Some popular reinforcement learning algorithms include:

  • Q-learning
  • Sarsa
  • Deep Q-Networks (DQN)
  • Policy gradients

Challenges

There are several challenges inherent in reinforcement learning that can present difficulties for machine learning systems. Some of the challenges include:

  1. Environment unpredictability - In the real world, there is always something new to learn about how things work. This means that an RL algorithm may perform exceptionally well when trained in closed synthetic environments but struggle outside those constraints due to its inability to adapt as quickly or know what else might happen next which would affect decision making processes drastically changing outcomes depending on specific situations at hand- this isn't always possible simply because we can’t predict everything!
  2. Delayed feedback - AI systems are complicated, and it can be difficult to understand the outcome of any decision. For example, if an artificial intelligence trading system predicts that investing in some assets will bring about a benefit but we don’t know how long this process takes (a month or year) then there is little point in making such investments because by definition they carry no risk unless you wait too long without knowing whether your prediction was accurate!
  3. Infinite time horizons - are needed in real life. For example, an agent’s number one goal is to get the highest reward possible but because we don't know how much effort or tries it will take them there's always an infinite horizon objective that has been established before starting on any task at all - this means knowing ahead of course whether something can fail without affecting anything else (such as crashing into other vehicles).
  4. Defining a precise reward function -. Data scientists may struggle with expressing the definition of good or bad action mathematically, computing rewards for actions-the advice here is to think about them in terms of current states and allow agents to know whether it's close enough that their next move will help get closer toward the goal (e.g., train an autonomous car how to turn right without hitting fence). For example - if there are two cars: one has its sensors facing forward while another tracks what lies behind him/her; both need direction
  5. Data problem and exploration risks - The work of RL requires more data than supervised learning. This makes it difficult to get enough for reinforcement algorithm systems, which are then translated into businesses and practices by applying them to real-world problems like self-driving vehicles that may cause chaos if tested solely on a street—they could hit their neighbor's cars or pedestrians plus collide with guardrails!
No alt text provided for this image

Training & Testing

Human experts have been using a manual approach to analyzing machine learning models for decades, but this is an extremely time-consuming and error prone process. Furthermore there's no way we can automate such analysis because it requires intimate knowledge about each individual experiment/case study which most humans don't possess nowadays given that they're too busy working their 9 - 5 jobs!

A proper ML model testing framework should systematize these practices.

The question is, how?

Testing Types

You can map software development test types to machine learning models by applying their logic on machine learning behavior:

  • Unit test: check the correctness of individual model components, regression tests check for previously encountered bugs, and integration tests check whether the different components work together within a machine learning pipeline.
  • Regression test: verifies that a system still works after it has been changed. This can be done to ensure that a bug fix or new feature has not introduced any new faults. It can also be used to confirm that an existing problem has been fixed. 
  • Integration test: involves testing individual software modules as part of an overall system. This type of testing is usually performed after unit testing and before system testing. Integration testing is used to test the interfaces between software modules and to verify that they work together correctly. Integration tests can be manual or automated. Automated integration tests are often run using scripts or tools that automate the process of running the tests and collecting the results.

And follow conventions such as:

·       don't merge code unless all tests are passing,

·       always write tests for newly introduced logic when contributing code,

·       when contributing a bug fix, be sure to write a test to capture the bug and prevent future regressions.

No alt text provided for this image

Different types of tests can be applied to a machine learning model depending on the type and scope of your problem. This article focuses specifically on post-train testing, so we do not cover other test cases like monitoring or validation in great detail; however, make sure you integrate these into an overall framework for assessing how well models work across various features/indices.

Approach

Some of the most common approaches include cross-validation, holdout sets, and simulated data. Each of these approaches has its own benefits and drawbacks, and the best approach to use will depend on the specific use case.

·       Cross-validation is a good approach for data sets that are relatively small, as it allows for all of the data to be used for both training and testing. However, it can be more computationally intensive than other approaches.

·       Holdout sets are a good option for larger data sets, as they allow for more data to be used for training. However, they can also lead to overfitting if not used correctly.

·       Simulated data is often used for online learning, as it allows for new data to be generated as needed. However, it can be difficult to accurately simulate real-world data. Ultimately, choosing the right testing approach depends on a variety of factors and should be done on a case-by-case basis.

Model Types

When developing machine learning models, it is important to evaluate and test the models concurrently.

  • Model evaluation provides insights into the performance of the model on validation or test datasets. This helps to identify potential areas of improvement.
  • Model testing, on the other hand, verifies that the model exhibits the expected behavior. This is important in ensuring that the model meets the required accuracy standards.

By running both model evaluation and model tests in parallel, we can build high-quality models efficiently.

How do you write model tests?

In my opinion, there's two general classes of model tests that we'll want to write.

  • Pre-train tests allow us to identify some bugs early on and short-circuit a training job.
  • Post-train tests use the trained model artifact to inspect behaviors for a variety of important scenarios that we define.

Model Development Pipeline

Putting this all together, we can revise our diagram of the model development process to include pre-train and post-train tests. These tests outputs can be displayed alongside model evaluation reports for review during the last step in the pipeline.

Depending on the nature of your model training, you may choose to automatically approve models provided that they meet some specified criteria.

For example, you might specify that a model must achieve a certain level of accuracy on the validation set before it can be approved for production. Once a model is approved, it can be deployed to your serving infrastructure and made available to your users. Congratulations, you've now completed the model development process!

No alt text provided for this image

Conclusion

In summary…

·       Supervised Learning is a process where we give the machine training based on unlabeled data, and then use that knowledge to make predictions.

·       Unsupervised learning occurs when there are no labels for things in our dataset - it's just numbered from whatever sensor inputs were collected at random during testing without any guidance whatsoever.

·       Reinforcement learners interact with their environment through trial-and-error interactions until they've learned what works best. The key here is that they get feedback after each interaction. This feedback signal can be positive (rewarded) or negative (punished), but it needs to be some sort of guidance.

The three strategies are not mutually exclusive and can often be combined in creative ways to create more powerful learning models. The important takeaway is that, as a beginner, you should familiarize yourself with all three of these strategies and think about how they might be applied to the problem at hand.

Thanks for reading!

Follow my channels for more content.

No alt text provided for this image
Ayesha M.

Machine Learning (ML&AI) & Cloud Platform Expert | Tech Entrepreneur | Fintech, Web3 & Blockchain (15+ yrs of Cloud Tech Stack Exp. in Amazon, Microsoft, Google, Oracle and Salesforce)

1 å¹´

So how far has the world come along singe Ayesha simplified Ais future with articles such as these. ChatGT? Have you heard of Renegade instead?

赞
回复

要查看或添加评论,请登录

Ayesha M.的更多文章

社区洞察

其他会员也浏览了