登录查看更多内容

Getting Started with Machine Learning: 4 Essential Models to Know

Luis Fernando Torres

AML/FT Intelligence Analyst @ CloudWalk, Inc. | Microsoft Certified AI Engineer

发布日期: 2023年3月8日

For better understanding of the models presented in this article, as well as seeing their implementation on Python, I suggest you read The ABCs of Machine Learning: 4 Essential Models notebook on Kaggle.

Introduction

Machine Learning has been definitely reshaping many industries throughout the world. After OpenAI's release of its recent ChatGPT tool, a lot of enthusiasts and newcomers started to talk about AI and its possible effects in society.

With the attention caused by all the buzzwords and news outlets, many people are recently discovering the world of machine learning and getting interested in knowing how it works or even how to build their own models.

In this article, I'll briefly introduce to you the four most essential models every beginner in data science and machine learning should know of: Linear Regression, Logistic Regression, Decision Trees, and K-Means.

Machine learning and AI are rapidly changing our society and revolutionizing many industries and markets. With the abundance of data available today, machines are able to extract insights and recognize patterns that would be near-impossible for humans to find at the naked-eye, making it an exciting subject to study and acquire knowledge on.

Even though the vast number of algorithms and techniques of machine learning may seem overwhelming at first, by understanding these four models, it'll be much easier to grasp on more complex and advanced concepts ahead.

Let's get started!

Linear Regression

N?o foi fornecido texto alternativo para esta imagem

Linear regression is definitely the most ideal starting point for a beginner!

It is a type of supervised learning algorithm, which means that we must have a target variable in mind when building a linear regression model for predictions. This model is used to predict the output of continuous variables, based on the relationship the target variable has with one or more input features.

Linear regression models are built on the assumption that the target variable y has a linear relationship with the independent features X that can be modeled as a straight line.

The formula for a simple linear regression is:

Y = mX + b

Where Y represents the output — target variable —, X the independent feature – also referred to as the predictor input —, while m is the estimated slope, and b is the estimated intercept.

The slope(m) indicates the rate at which y changes for every unit increase or decrease in X. Assuming that m = 2, for instance, suggests that, for every unit increase in X, y is expected to increase by 2 units.

The intercept(b) represents the value of y when X = 0. It is the point at which the line crosses the y-axis.

The job of a Linear Regression model is to find the optimal values for m and b, to make predictions on the values of the target variable y for any given value of X.

Linear regression models are widely used in fields such as economics, finance, engineering, and many others. It is able to predict the prices of houses, cars, forecast sales, stock prices, etc.

Logistic Regression

Logistic regression is also a supervised learning algorithm. However, instead of fitting a straight line to the data, we fit an S shape curve called sigmoid to predict binary outcomes based on one or more input features.

It also assumes a linear relationship between the target variable that we wish to predict and the independent feature. The output of a logistic regression model is a value either 0.0 or 1.0 to indicate the probability of an event y – such as the chance of passing an exam – when compared to a feature X – the hours studying.

领英推荐

Artificial Intelligence #207

Andriy Burkov 1 年前

Getting started with AI & ML- 10 plus use cases

Aditya Anand 2 年前

Getting started with AI & ML- 10 plus use cases

Jayashree Baruah 2 年前

The formula for logistic regression is:

p(y = 1 | x) = 1 ÷ 1 + exp(-(β?+ β?x? + β?x? + … + β?x?))

Where p(y = 1 | x) is the probability of a target variable y taking on the value of 1 given the values of the predictor features x?, x? ,…, x?. The β coefficients are the parameters of the logistic regression model that are estimated from the data, they are the optimizers for the best fit of log odds.

Logistic regression models may be used in healthcare for predicting the likelihood of patients developing certain diseases, in finance to predict the likelihood of default, in marketing to predict the likelihood of a customer purchasing a product based on demographics, etc.

Decision Tree

The Decision Tree model is a powerful and intuitive supervised learning algorithm widely used for both classification and regression tasks. It receives its name from the tree-like structure that it is built on, divided into internal nodes to evaluate attributes and assign to them values that follow different paths – the branches – until eventually making it to the leaf node, which is the predicted outcome.

Decision Trees are extremely easy to interpret and visualize. They're also able of handling missing values and less sensitive to outliers. Another advantage is the fact that the model has the ability to capture non-linear relationships between the input features and the output variable, being able to capture complex interactions.

For being able of both classification and regression tasks, it is used in a variety of industries for many activities, such as investment, default risk, healthcare, etc.

K-Means

The K-Means is an unsupervised model, which means that it learns patterns from untagged data. In this case, there is no target variable that the model must predict outcomes to.

The K-Means is a clustering algorithm that is used to identify patterns in data and group similar data points together based on their proximity to each other.

It works by randomly selecting a number K of centroids, which are simply the center of each cluster, the arithmetic mean of the data points assigned to that specific cluster.

The goal of a K-Means model is to identify certain subsets of data that are both meaningful and useful. It is widely used in retail for customer segmentation purposes, dividing the total base of customer into distinct groups that have similar characteristics to make it easier to target each group with the ideal products, services, and marketing strategies.

Conclusion

In conclusion, machine learning is a rapidly growing field that has been revolutionizing many industries and decision-making processes throughout the world. It is an exciting subject to learn and work on, as it offers endless possibilities to explore and innovate. With the increasing availability of large datasets and powerful computing resources.

Overall, it is a fascinating and dynamic field that holds great promise for the future. Whether you are a researcher, developer, or simply curious about this exciting domain, there has never been a better time to get involved and explore the possibilities of machine learning.

This article is just a brief explanation on how these models work. In my Kaggle notebook The ABCs of Machine Learning: 4 Essential Models I dive a bit deeper into the details of each of them and demonstrate hands-on how to apply these models in Python language. I highly encourage you to take a look at it.

Thank you for reading,

Luís Fernando Torres

要查看或添加评论，请登录

Luis Fernando Torres的更多文章

A Closer Look Into Spatial Computing

2024年2月28日

A Closer Look Into Spatial Computing

How tools like the Apple Vision Pro might use spatial computing to enhance the connection between the digital and…

3 条评论
Insta-Fake? Detecting Fake Accounts on Instagram with Machine Learning

2023年3月14日

Insta-Fake? Detecting Fake Accounts on Instagram with Machine Learning

Introduction Instagram is definitely one of the most popular social-media platforms in the world. According to the…

2 条评论
Iniciando em Aprendizado de Máquina: Os 4 Modelos Essenciais

2023年3月8日

Iniciando em Aprendizado de Máquina: Os 4 Modelos Essenciais

Para melhor compreens?o, leia meu notebook no Kaggle The ABCs of Machine Learning: 4 Essential Models ????, que contêm…

2 条评论
Melhorando a Qualidade dos seus Dados com Adversarial Validation

2023年3月3日

Melhorando a Qualidade dos seus Dados com Adversarial Validation

Introdu??o Imagine que você investiu uma quantidade significativa de tempo para coletar e limpar dados, realizar a…

2 条评论
A Practical Guide to Adversarial Validation

2023年3月3日

A Practical Guide to Adversarial Validation

Introduction Let's suppose you have committed a substantial amount of your time to collect and clean data, pre-process…
The Science of Smart Investing: Portfolio Evaluation with Python

2023年2月27日

The Science of Smart Investing: Portfolio Evaluation with Python

This is the second part of my series on Quant Investing with Python. If you haven't had the chance to read the first…
Data Science Aplicada ao Mercado Financeiro: Criando e Avaliando Carteiras de Investimentos

2023年2月24日

Data Science Aplicada ao Mercado Financeiro: Criando e Avaliando Carteiras de Investimentos

Este artigo é uma continua??o direta de outro artigo meu publicado anteriorimente, Introdu??o às Análises Quantitativas…

2 条评论
Building an Accurate Gender Recognition Model with InceptionV3 and CelebA Dataset

2023年2月13日

Building an Accurate Gender Recognition Model with InceptionV3 and CelebA Dataset

Introduction Computer Vision is a field of study focused on enabling computers to identify and understand visual…
Introduction to Quant Investing with Python

2023年2月13日

Introduction to Quant Investing with Python

Introduction Data Science is a rapidly growing field in the current global scenario, combining the power of Statistics…

4 条评论
Introdu??o às Análises Quantitativas de Investimento

2023年2月12日

Introdu??o às Análises Quantitativas de Investimento

Introdu??o A ciência de dados é uma área que está crescendo rapidamente no atual cenário mundial, combinando a for?a da…

11 条评论

See all articles

Getting Started with Machine Learning: 4 Essential Models to Know

Luis Fernando Torres

AML/FT Intelligence Analyst @ CloudWalk, Inc. | Microsoft Certified AI Engineer

Introduction

Linear Regression

Logistic Regression

领英推荐

Decision Tree

K-Means

Conclusion

Luis Fernando Torres的更多文章

社区洞察

其他会员也浏览了

?? Machine Learning Needs Fundamental Math—Here’s Why! ??

The Data Science Dispatch #3: Build First, Learn Along the Way

Getting started with AI & ML- 10 plus use cases

Introduction to Machine Learning

Feature Scaling in Machine Learning: What It Is, Why It Matters, and How to Apply It

Small Steps to Big Results - Understanding Machine Learning Models

Part 2 (Improving Our Model) - Applied Machine Learning: Box Office Predictor for Movies

Explaining Machine Learning, Simply

Introduction

Linear Regression

Logistic Regression

领英推荐

Decision Tree

K-Means

Conclusion

Luis Fernando Torres的更多文章

A Closer Look Into Spatial Computing

Insta-Fake? Detecting Fake Accounts on Instagram with Machine Learning

Iniciando em Aprendizado de Máquina: Os 4 Modelos Essenciais

Melhorando a Qualidade dos seus Dados com Adversarial Validation

A Practical Guide to Adversarial Validation

The Science of Smart Investing: Portfolio Evaluation with Python

Data Science Aplicada ao Mercado Financeiro: Criando e Avaliando Carteiras de Investimentos

Building an Accurate Gender Recognition Model with InceptionV3 and CelebA Dataset

Introduction to Quant Investing with Python

Introdu??o às Análises Quantitativas de Investimento

社区洞察

其他会员也浏览了

?? Machine Learning Needs Fundamental Math—Here’s Why! ??

The Data Science Dispatch #3: Build First, Learn Along the Way

Getting started with AI & ML- 10 plus use cases

Introduction to Machine Learning

Feature Scaling in Machine Learning: What It Is, Why It Matters, and How to Apply It

Small Steps to Big Results - Understanding Machine Learning Models

Part 2 (Improving Our Model) - Applied Machine Learning: Box Office Predictor for Movies

Explaining Machine Learning, Simply