A Hands-On Approach to Machine Learning (part 1)

A Hands-On Approach to Machine Learning (part 1)

You can also read this article in my personal blog!

brunocamps.com/2018/07/04/a-hands-on-approach-to-machine-learning-part-1/

We’ll start defining some important concepts and attributes of machine learning systems, things that need to be understood in order to start coding a system. As soon as you finish reading this article, you’ll have a notion of why would you use an ML solution and what do you need to build it.

Defining Machine Learning

Machine Learning is the science of programming computers so they can learn from data. An ML-based would process raw data and transform it into training instances which are part of the training set. 

Summarizing

Raw data: pure data, unprocessed

Training instances: processed sample data. E.g: salary, purchased or not, nationality…

Training set: a set of multiple training instances used by the system to learn autonomously, algorithmic-based.

That is an example of a training set. Each line is a training instance. Note that the “Purchased” field is still unprocessed – your computer understands 0’s and 1’s, and not Yes’s and No’s.

From the example above:

  • Country, Age, Salary and Purchased are data types or also attributes
  • feature is usually an attribute plus its value (“Country” = “France”)

ML/AI systems vs. Mechanical Systems

Spam filters were one of the first practical and mainstream uses of machine learning, and they illustrate well such difference. A spam filter usually analyzes words within the email itself looking for red flags.

If you’re building a mechanical spam filter, you would have to hardcode all spam red flags. As it might be effective, it’s not so efficient since spam strategies are constantly changing.

What I’m saying is that a lot of human effort would be required to keep such a mechanical system up to date. In an ML scenario, the system would learn incrementally by itself by being fed training data (online learning, preferably).

Training what we can’t (or don’t want to) code

Coding a speech recognizer or a personal assistant like Siri or Alexa became possible thanks to machine learning. Well, they could be coded with no ML traces, but that’s the kind of work that becomes unnecessary when you have the powerful tools of ML.

Imagine if you had to hardcode all possible variations of each word and assign all of them to the corresponding letters… a huge chunk of work. Writing an algorithm that learns by itself is a better idea, given many examples for each word.

We can now conclude that ML and AI open uncountable possibilities for innovations, since their building time becomes shorter.

So, when to use machine learning?

  • Dynamic environment (ML can adapt to new data using online/batch learning)
  • Getting intel about complex problems and large amounts of data
  • Complex problems that are not so easy to code or would require a lot of human hours
  • Huge amounts of data and no known or developed algorithms

Machine Learning Systems

There are three ways to generally classify machine learning systems or algorithms:

  • Whether they are trained or not under human supervision (supervised, unsupervised, semisupervised or Reinforcement Learning)
  • Whether they can learn incrementally while running (online or batch learning)
  • Whether they compare new data points to known data points, or detect patterns in the new training data and build predictive models (instance based or model-based learning)

How systems are trained

Supervised Learning

The training data fed to the algorithms includes the desired solutions, called labels. Therefore, every training instance will contain a label.

Classification and Regression are typical supervised learning tasks:

  • Classification will set instances into different groups
  • Regression (or prediction) will predict values or actions by learning from predictors and their labels

Most important supervised learning algorithms:

  • Neural Networks (which can also be unsupervised)
  • Decision Trees
  • Random Forests
  • Linear Regression
  • Logistic Regression

Unsupervised Learning

The training data is unlabeled. The system learns by itself through data interpretation. Therefore, an unlabeled training set. There are three general uses for unsupervised learning:

  • Clustering
  • Association rule learning
  • Visualization and dimensionality reduction

Clustering will divide instances into clusters – which are groups that share traits in common.

Dimensionality reduction has the goal of simplifying the data without losing too much information. For instance, the price of a house might be correlated with its location so the dimensionality reduction algorithm will merge them into one feature. Feature extraction is the name of this technique. This helps performance in a very considerable way.

Anomaly detection is also a task for unsupervised learning, like credit card fraud detection. 

Semisupervised Learning

Combination of both supervised and unsupervised algorithms – a portion of the data is labeled, but the other is not. Usually, the system will identify patterns or will cluster the data and then the programmer needs to insert labels to each pattern or cluster.

Deep Belief Networks (DBNs) are based on Restricted Boltzmann Machine (RBM), which is an unsupervised learning component. RBMs are trained through unsupervised learning, and then the system is fine-tuned using supervised learning techniques (insertion of labels).

Facial recognition is a good example of semisupervised learning: the system by itself will identify that the person is there and, depending on the system, will also identify their physical attributes (hair and eye color, skin tone, shapes…), and then the instance will be fed with the person’s name and the necessary information.

Reinforcement Learning

The learning system is an agent in reinforcement learning. This agent will observe the environment and perform actions to receive rewards or penalties. It will learn by itself what’s the best strategy – called policy – to be rewarded more often. A policy defines what action the agent should choose in a given situation.

Reinforcement Learning is commonly used in robots with higher degrees of freedom, like walking, picking objects and opening doors!

Learning incrementally or not?

  • Batch Learning: the system doesn’t learn incrementally, so it must be trained using all available data at once, typically done offline. When the system is trained, it goes into action and doesn’t learn anymore. If the system needs to learn from new data, it must be stopped and replaced with a new system trained with the new data. As it might take a long time, batch learning systems can be automated and suited for dynamic use.
  • Online/Incremental Learning: the systems learn incrementally by feeding it continuous data instances (grouped in batches). Works great with data that changes a lot.

The Learning Rate is something that needs to be set when working with online learning systems. It’s a rate that defines how fast the algorithms should adapt to new data. Although a high learning rate will increase adaption to new data, the old data tends to be forgotten by the system. A learning rate with some inertia might be interesting to avoid data noise.

Instance-based or Model-based learning?

Generalization is an important task of machine learning systems. Algorithms must be able to generalize new instances, which means handling incoming data.

  • Instance-based learning: generalizes new data using a similarity measure. The system will compare incoming data with already-learned data and try to correctly assign new instances
  • Model-based learning: generalizes from a set of examples building a model from them. Such model will be used to predict where the incoming data will fit.

Summarizing

In order to build a machine learning system for your needs, the following points need to be specified:

  • How is it going to be trained? Supervised, Unsupervised, Semisupervised or through Reinforcement Learning?
  • How is it going to learn? Incrementally (online) or through batch learning (offline)?
  • How is it going to generalize? Instance or model based?

In Part 2, we’re going to build a machine learning system from scratch! Subscribe to my newsletter to keep updated.

Questions? Comment below or email them to [email protected]




 

Dan Liszka

Creating Communities of Business People | Director | Fan of Women on Boards

6 年

Interesting to see what can be done in machine learning, great share.

要查看或添加评论,请登录

Bruno C.的更多文章

  • Startups should be aggressive

    Startups should be aggressive

    Yes, startups should have a decent amount of pressure to succeed. This article is featured in my newsletter:…

    1 条评论
  • WWDC 2023 – Intro to Spatial Computing

    WWDC 2023 – Intro to Spatial Computing

    Here we are, learning more about spatial computing! You can check the video below: https://developer.apple.

  • Making sense of WWDC 2023

    Making sense of WWDC 2023

    I’ve been a longtime enthusiast of the Apple platform and the turning point for me was when Steve Jobs launched the…

  • The next software frontier

    The next software frontier

    To understand this topic, we must first take a step back. We’ve been living in the mobile app world for more than a…

  • Precisamos estar prontos para ciberataques

    Precisamos estar prontos para ciberataques

    Eu penso que o estado da ciberseguran?a é bem similar ao estado de vinte anos atrás: · Os “sistemas seguros” est?o…

  • Machine Learning - Part One

    Machine Learning - Part One

    About this article: you'll read about concepts and applications of Machine Learning. I will present you the building…

  • Machine Learning From Scratch [Part 2]

    Machine Learning From Scratch [Part 2]

    This is part two of Machine Learning from Scratch. You're about to follow a straight forward and short tutorial about…

  • Machine Learning From Scratch [Part 1]

    Machine Learning From Scratch [Part 1]

    This is part one of Machine Learning from Scratch In this lesson, you'll learn how to: Import a module from a bigger…

  • How To Handle Meetings

    How To Handle Meetings

    In most cases, it's not always the most popular person who gets the job done. From all my experiences in the business…

    1 条评论
  • Bad Players Will Be Thrown Away

    Bad Players Will Be Thrown Away

    The internet and social media set new standards in the whole commercial process. People are more likely to buy from…

社区洞察

其他会员也浏览了