AN INTRODUCTORY GUIDE TO MACHINE LEARNING MODELING (DATA TO PREDICTION)
Mariam Saad
Student & Moderator @iCodeGuru | Front-End Development | Python Programmer | AI/ML enthusiast | Winner??int'l Hackathon | 6X int'l Hackathon Participant | Learning DSA
BASIC INTRODUCTION OF:
WHAT IS MACHINE LEARNING:
Machine learning is a branch of AI focused on building computer systems that learn from data. The breadth of ML techniques enables software applications to improve their performance over time.
ML algorithms are trained to find relationships and patterns in data. Using historical data as input, these algorithms can make predictions, classify information, cluster data points, reduce dimensionality and even generate new content.
TYPES OF MACHINE LEARNING:
SUPERVISED LEARNING:
Supervised machine learning refers to classes of algorithms where the machine learning model is given a set of data with explicit labels for the quantity we're interested in (this quantity is often referred to as the response or target).
UNSUPERVISED LEARNING:
In unsupervised learning problems, the data we're given has no labels, and we're simply looking for patterns. For example, say you're Amazon. Given customers' purchase history, can we identify any clusters (groups of similar customers)?
ARTIFICIAL NEURAL NETWORK:
The term "Artificial neural network" refers to a biologically inspired sub-field of artificial intelligence modeled after the brain. An Artificial neural network is usually a computational network based on biological neural networks that construct the structure of the human brain. Similar to a human brain has neurons interconnected to each other, artificial neural networks also have neurons that are linked to each other in various layers of the networks. These neurons are known as nodes.
WHAT IS PREDICTIVE MODELING?
Predictive modeling is a technique of Machine learning. By using historical data one can make any predictions. Historical data is used to get specific historical behavior related to living things, geographics, etc. Machine Learning works by recognizing the patterns in past data, and then using them to predict future outcomes. To build a successful predictive model, you need data that is relevant to the outcome of interest. For example, #Amazon uses its database of customer purchasing patterns and preferences to recommend items that are likely to be of interest to a particular customer.
STRUCTURED DATA:
it means a well-defined dataset, and well-defined columns based on. Many popular business tools, like Hubspot, Salesforce, or Snowflake, are sources of structured data. More broadly speaking, any well-defined CSV or Excel file XML, or JSON, is an example of structured data, millions of examples of which are available on sites like Kaggle or Data.gov .
DATA PREPROCESSING:
Data preprocessing is a process of preparing the raw data and making it suitable for a machine learning model. It is the first and crucial step while creating a machine-learning model.
When creating a machine learning project, it is not always the case that we come across the clean and formatted data. And while doing any operation with data, it is mandatory to clean it and put in a formatted way. So for this, we use data preprocessing task.
ENCODE CATEGORICAL VARIABLES:
Dealing with categorical data is a common challenge in data science and machine learning. Classification variables represent attributes such as color, type, or font. However, most machine learning algorithms require numerical input, thus requiring the transformation of segmented data into numerical form. This process is called encoding, and there are many ways to optimize this process.
领英推荐
SCALE FEATURES:
Feature scaling is a vital pre processing step in machine learning that involves transforming numerical features to a common scale. It plays a major role in ensuring accurate and efficient model training and performance.
ML APPLICATIONS: REGRESSION
LINEAR REGRESSION:
The most common method for solving regression problems is referred to as linear regression. Say you’re given the following data about the relationship between pH and Citric acid to determine wine quality.
USE CASES OF MACHINE LEARNING:
Machine learning is a subset of artificial intelligence that is focused on systems that can learn from data.
While we’ll explore some of the top applications of machine learning across a number of industries, the academic world is also using AI, largely for research in areas such as biology, chemistry, and materials science.
MODEL TRAINING:
The training phase is where machine learning models are generated out of algorithms. The algorithm may determine which features of the data are most predictive for the desired outcome. This phase can be divided into several sub-steps, including feature selection, model training, and hyperparameter optimization.
DATA PREPARATION:
To recap, data preparation is the process of transforming raw data into a format that is appropriate for modeling, which makes it a key component of machine learning operations. This process typically includes splitting the data into parts for training and validation, and normalizing the data.
EXPERIMENT TO FIND OUT HOW MUCH DATA YOU NEED:
Machine learning is getting easier and faster. There's no need to waste a lot of time on preparation, as a huge dataset isn’t a prerequisite. As Adam Savage puts it: “In the spirit of science, there really is no such thing as a ‘failed experiment.’” Simply experiment and see how much data you need.
Machine learning models are pattern matching machines. They can only capture and predict patterns that have been seen before. This is the one big catch with machine learning. If you want to predict what happens with new data, the model has to have seen similar data before.
MODEL EVALUATION:
Model evaluation is the process that uses some metrics to analyze the performance of the model. As we all know model development is a multi-step process and a check should be kept on how well the model generalizes future predictions. Therefore evaluating a model plays a vital role so that we can judge the performance of our model. The evaluation also helps to analyze a model’s key weaknesses. There are many metrics, such as accuracy, precision, recall, F1 score, the area under the curve, confusion matrix, and mean square error. Cross Validation is one technique that is followed during the training phase and it is a model evaluation technique as well.
DATA VISUALIZATION:
Data visualization is a crucial aspect of machine learning that enables analysts to understand and make sense of data patterns, relationships, and trends. Through data visualization, insights and patterns in data can be easily interpreted and communicated to a wider audience, making it a critical component of machine learning. In this article, we will discuss the significance of data visualization in machine learning, its various types, and how it is used in the field.
MODEL SELECTION:
The process of selecting the machine learning model most appropriate for a given issue is known as model selection.” Model selection is a procedure that may be used to compare models of the same type that have been set up with various model hyperparameters and models of other types.
Kaggle GrandMaster (3x) Rank | 20 (Global) | Data Analyst | Data Scientist | AI Engineer | 11x'Intel Hackathon Submissions | CALICO Fa'24 | Moderator @iCodeGuru
1 个月I disagree with this article , many mistakes in the article
Machine learning | Deep learning | NLP | Computer vision Engineer
1 个月Very informative
Ai / ML Engineer
1 个月informative Mariam Saad