Machine Learning - Some basic definitions

Machine Learning - Some basic definitions

Machine learning is a branch in computer science that studies the design and use of algorithms and models that can learn patterns in data and then make predictions without human intervention when similar patterns are found in new data.


Methods

Algorithms are categorized in methods. These are the most important ones:

  • Supervised learning algorithms are trained using past data with labeled examples to predict labels in future data. For example, we could have data points for flowers, each one labeled with the species it belongs to, along with other features such as petal size. The algorithm reads the input set and learns from it, updating the model. Then, the algorithm uses the information in the model to predict the species a new flower belongs to.
  • Unsupervised learning is used against data that does not have labels. The algorithms explore the data and find some structure. For example, we could have data points for flowers. The algorithm would read the input set, learn from it and suggest a set of species for classifying flowers into groups.
  • Reinforcement machine learning algorithms interact with its environment by producing actions and learning from the results. For example, drones may learn to fly by trail and error.


Problems

Labels define the problem. as follows:

  • Categorical and discrete variables can take one of a limited number of possible values. Predicting categorical labels is called classification. For example, the species a flower belongs to.
  • Continuous or real variables can take any real value. Predicting quantitative labels is called regression. For example, the petal width a flower taking into account the species and the petal length.


Features

In machine learning, statistical variables are called features.

Features can be of any of the following types:

  • Numeric: describe a quantity as a number. Subtypes are continuous and discrete.

- Continuous: observations can take any numeric value in a range of real values. Examples include length, width and time.

- Discrete: observations can take any numeric value in a set of numeric values. A discrete variable cannot take the value of a fraction between one value and the next closest value in the set. Examples include number of flowers and number of passengers.

  • Categorical: Describe a quality or characteristic. Subtypes are ordinal and nominal.

- Ordinal: observations can be ordered. Examples include t-shirt size (e.g. XL, L, M, S, XS) and satisfaction grades(e.g. high, medium, low)

- Nominal: observations cannot be ordered. Examples include species, sex, brand, etc.

Formal and detailed classification of statistical variables according to the nature of the information they represent is out of the scope of this article. However, here is a valuable resource if you are interested: Statistical data types.


Algorithms

Most frequently used algorithms are listed below.

Supervised learning algorithms

  • Linear regression algorithms for regression and classification (e.g. Fisher's linear discriminant analysis (LDA), logistic regression, naive Bayes, Winnow, perceptron)
  • Non-linear regression algorithms for regression and classification
  • Linear and non-linear support vector machine (SVM)
  • Learning vector quantization (LVQ)
  • Classification and regression trees (CART)
  • K-nearest neighbours (KNN)
  • Neural networks

Unsupervised learning algorithms

  • Apriori
  • K-means clustering
  • Principal Component Analysis (PCA)

Ensemble learning techniques

  • Random forest
  • AdaBoost

A more comprehensive list of algorithms available in caret package for R can be found here.


Well, enough theory for today. In the following articles we will play with these algorithms using R.

要查看或添加评论,请登录

?? Fernando Bucci的更多文章

  • Pensando en colores

    Pensando en colores

    En este artículo te contaré cómo, aún hoy, nos seguimos perdiendo en los más básicos razonamientos, cuáles son algunos…

    1 条评论
  • Sustainable IT (I)

    Sustainable IT (I)

    This is the first of a series of articles whose goal is to provide an introduction to the concept of Sustainable IT…

  • API Design Patterns

    API Design Patterns

    APIs bring significant benefits when used in different scenarios. In this article, the most relevant kinds of scenarios…

  • Why strategy gurus have lied to us for decades and the truthful truth

    Why strategy gurus have lied to us for decades and the truthful truth

    You must have already pitched upon several strategy experts and gurus explaining with pride the process for defining…

    3 条评论
  • What if Histiaeus used WhatsApp?

    What if Histiaeus used WhatsApp?

    Steganography is the practice of concealing the fact that a secret message is being sent as well as the contents of the…

  • Notes on Hack the Box

    Notes on Hack the Box

    Hack the Box is an online platform allowing to you test your penetration testing skills. The first challenge you face…

    1 条评论
  • Machine Learning - Supervised Learning - Classification (I)

    Machine Learning - Supervised Learning - Classification (I)

    In this article we will use classification algorithms to predict the species flowers belongs to by knowing petal and…

  • Machine Learning - Data visualization with R (III)

    Machine Learning - Data visualization with R (III)

    This article continues presenting different techniques that can be used to communicate data or information by encoding…

  • Machine Learning - Data visualization with R (II)

    Machine Learning - Data visualization with R (II)

    This article continues presenting different techniques that can be used to communicate data or information by encoding…

  • Machine Learning - Data visualization with R (I)

    Machine Learning - Data visualization with R (I)

    This article presents different techniques that can be used to communicate data or information by encoding it in…

社区洞察

其他会员也浏览了