登录查看更多内容

Chained and MultiLabel Algorithms

Rupak Roy

Sr. Manager Data Science for GenAi & ML Analytics | KaggleX BIPOC Mentor | Hybrid-Genetic Auto-Ai Programming | Big Data ML-Ops Automation

发布日期: 2022年8月24日

+ 关注

Chained and MultiLabel Algorithms

New Guide to Advanced Predictive analytics via Multi-label, Multi-output & Chained?

Hi everyone, how are you doing? great! Likewise long story short, today we will look into some advanced techniques of machine learning commonly named as Multi-Class and Multi-Label.

To make it clear I have put the diagram from scikit?

From the above diagram, we can clearly see how they are split into.?

1. Multiclass — This we are already aware of and most sklearn classifiers support the multiclass by default. Here is the list below just in case:?

Inherently multiclass:

naive_bayes.BernoulliNB
tree.DecisionTreeClassifier
tree.ExtraTreeClassifier
ensemble.ExtraTreesClassifier
naive_bayes.GaussianNB
neighbors.KNeighborsClassifier
semi_supervised.LabelPropagation
semi_supervised.LabelSpreading
discriminant_analysis.LinearDiscriminantAnalysis
svm.LinearSVC (setting multi_class=”crammer_singer”)
linear_model.LogisticRegression (setting multi_class=”multinomial”)
linear_model.LogisticRegressionCV (setting multi_class=”multinomial”)
neural_network.MLPClassifier
neighbors.NearestCentroid
discriminant_analysis.QuadraticDiscriminantAnalysis
neighbors.RadiusNeighborsClassifier
ensemble.RandomForestClassifier
linear_model.RidgeClassifier
linear_model.RidgeClassifierCV

Multiclass as One-Vs-One:

svm.NuSVC
svm.SVC.
gaussian_process.GaussianProcessClassifier (setting multi_class = “one_vs_one”)

Multiclass as One-Vs-The-Rest:

ensemble.GradientBoostingClassifier
gaussian_process.GaussianProcessClassifier (setting multi_class = “one_vs_rest”)
svm.LinearSVC (setting multi_class=”ovr”)
linear_model.LogisticRegression (setting multi_class=”ovr”)
linear_model.LogisticRegressionCV (setting multi_class=”ovr”)
linear_model.SGDClassifier
linear_model.Perceptron
linear_model.PassiveAggressiveClassifier

Support multilabel:

Support multiclass-multioutput:

Next, we have 2. MultiLabel Classification, we call it MultiLabel Classification because we will be performing classification tasks and for regression, we called it as 3. Multioutput Regression.

2. MultiLabel Classification is further divided?into?

— MultiOutput Classifier

— Classifier Chain

— MultiLabel MultiOutput Classifier

#This strategy consists of fitting one classifier per target.

#his is a simple strategy for extending classifiers that do not natively support multi-target classification.

import numpy as np
from sklearn.datasets import make_multilabel_classification
from sklearn.multioutput import MultiOutputClassifier
from sklearn.linear_model import LogisticRegression

#create dataset

X, y = make_multilabel_classification(n_classes=3, random_state=0)

clf = MultiOutputClassifier(LogisticRegression()).fit(X, y)

clf.predict(X[-2:])

— MultiLabel Classifier Chain

#Classifier chains are a way of combining a number of?binary classifiers into a single multi-label model that is capable of exploiting correlations among targets.

“Each model makes a prediction in the order specified by the chain using all of the available features provided to the model plus the predictions of models that are earlier in the chain.’’

In simple?words,?

For example, classification of the properties “type of fruit” and “color” for a set of images of fruit. The property “type of fruit” has the possible classes: “apple”, “pear” and “orange”. The property “color” has the possible classes: “green”, “red”, “yellow” and “orange”. Each sample is an image of a fruit, a label is output for both properties and each label is one of the possible classes of the corresponding property.


from sklearn.datasets import make_multilabel_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.multioutput import ClassifierChain

#make dataset


X, Y = make_multilabel_classification(n_samples=12, n_classes=3, random_state=0)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state=0)

base_lr = LogisticRegression(solver='lbfgs', random_state=0)
chain = ClassifierChain(base_lr, order='random', random_state=0)

chain.fit(X_train, Y_train).predict(X_test)

chain.predict_proba(X_test)

3. MultiOutput Regression similarly into?

— Multioutput Classifier

— Regressor Chain

In multioutput regression, involves dividing the regression problem into a separate problem for each target variable to be predicted.

#The Approach assumes that the outputs are independent of each other, In other words, the approach involves developing a separate regression model for each output value to be predicted.?

For example, if a multioutput regression problem required the prediction of three values y1, y2 and y3 given an input X,?then this could be partitioned into three single-output regression problems:

Problem 1: Given X, predict y1.
Problem 2: Given X, predict y2.
Problem 3: Given X, predict y3.

Some regression machine learning algorithms support multiple outputs directly.

Multi-step time series forecasting may be considered a type of multiple-output regression where a sequence of future values are predicted and each predicted value is dependent upon the prior values in the sequence.


#1 linear regression for multioutput regression-----------------
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
# define model
model = LinearRegression()
# fit model
model.fit(X, y)
# make a prediction
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]
yhat = model.predict([row])
# summarize prediction
print(yhat[0])

#2 k-nearest neighbors for multioutput regression------------------
from sklearn.datasets import make_regression
from sklearn.neighbors import KNeighborsRegressor
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
# define model
model = KNeighborsRegressor()
# fit model
model.fit(X, y)
# make a prediction
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]
yhat = model.predict([row])
# summarize prediction
print(yhat[0])

#3 decision tree for multioutput regression----------------
from sklearn.datasets import make_regression
from sklearn.tree import DecisionTreeRegressor
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
# define model
model = DecisionTreeRegressor()
# fit model
model.fit(X, y)
# make a prediction
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]
yhat = model.predict([row])
# summarize prediction
print(yhat[0])

— Multioutput sklearn-Wrapper Approach?

for using those models that don't support by default


from sklearn.datasets import make_regression
from sklearn.multioutput import MultiOutputRegressor
from sklearn.svm import LinearSVR

# define dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)

# define base model
model = LinearSVR()

#define the multioutput wrapper model

wrapper = MultiOutputRegressor(model)
# fit the model
wrapper.fit(X, y)

# make a single prediction
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]

yhat = wrapper.predict([row])

# summarize the prediction
print('Predicted: %s' % yhat[0])

— Regressor Chain

Regressor Chain develops a sequence of dependent models to match the number of numerical values to be predicted.

The approach is an extension of the multioutput method except the models are organized into a chain. The prediction from the first model is taken as part of the input to the second model, and the process of output-to-input dependency repeats along the chain of models.

For example, if a problem required the prediction of three values y1, y2 and y3 given an input X, then this could be partitioned into three dependent single-output regression problems as follows:

Problem 1(yhat1): Given X, predict y1.
Problem 2(yhat2): Given X and yhat1, predict y2.
Problem 3: Given X, yhat1, and yhat2, predict y3.?

from numpy import mean
from numpy import std
from numpy import absolute
from sklearn.datasets import make_regression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
from sklearn.multioutput import RegressorChain
from sklearn.svm import LinearSVR
	

# define dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
	

# define base model
model = LinearSVR()
# define the chain model
wrapper = RegressorChain(model)


# define the evaluation procedure
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the model and collect the scores
n_scores = cross_val_score(wrapper, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)
# force the scores to be positive
n_scores = absolute(n_scores)

# summarize performance
print('MAE: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

Here we go, We have our commonly used methods/ways to perform multilabel classification as well regression modeling techniques.

There are a couple few more techniques to perform multiclass like OutputCode, Binary Relevance, Adapted Algorithm, and Ensemble Approaches, MultiLabel for Deep Learning which we will look at in the next article, Stay Tune!?

Likewise, long story short I tried to bring to the best of from across and rephrasing it into a more simplified version, i will try to bring as much as possible new content across the data science realm and i hope the package will be useful at some point in your work. Because I believe machine learning is not replacing us, it’s about replacing the same iterative work that consumes time and much effort. So people should come to work to create innovations rather than be occupied in the same repetitive boring tasks.

Thanks again, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy

Some of my alternative internet presences Facebook, Instagram, Udemy, Blogger, Issuu, Slideshare, Scribd and more.

Also available on Quora @ https://www.quora.com/profile/Rupak-Bob-Roy

Let me know if you need anything. Talk Soon.

Kaggle Implementation:?

Classification: https://www.kaggle.com/rupakroy/multilabel-multioutput-chain-classification

Regression: https://www.kaggle.com/rupakroy/multilabel-multioutput-chain-regression

Git Repo:??https://github.com/rupak-roy/Chained-and-MultiLabel-Algorithms

要查看或添加评论，请登录

Rupak Roy的更多文章

Crafting a Data Science Success Story: 11 Key Topics to Engage Stakeholders Effectively

2024年11月24日

Crafting a Data Science Success Story: 11 Key Topics to Engage Stakeholders Effectively

Discover the essential elements to transform a data science project into a compelling narrative for stakeholders. From…
Digital Twins vs. Generative AI Digital Twins: Understanding the Difference and Their New* Potential

2024年9月20日

Digital Twins vs. Generative AI Digital Twins: Understanding the Difference and Their New* Potential

As digital transformation reshapes industries across the globe, two groundbreaking technologies—Digital Twins and…
Mastering RAG Fusion in Simple Steps: A Deep Dive into Retrieval-Augmented Generation"

2024年8月19日

Mastering RAG Fusion in Simple Steps: A Deep Dive into Retrieval-Augmented Generation"

Revolutionize the way we approach natural language processing and information retrieval. We delve into the intricacies…

2 条评论
Transitioning from a Sports-Centric Life to a Career in Data Science

2024年8月9日

Transitioning from a Sports-Centric Life to a Career in Data Science

As I was going through my archives, I found these photos that brought back so many memories. It seemed like the perfect…

3 条评论
What are the parameters in?LLM?

2024年3月26日

What are the parameters in?LLM?

Let’s understand how parameters play an important role in LLM The scale of present large language models (LLMs) is…
Fixing LLM Hallucination Conversations

2024年2月18日

Fixing LLM Hallucination Conversations

Enhancing LLM Conversations via Semantic Routing Hi everyone, what's up? how things are up to? i hope its good. Today I…

3 条评论
Machine Capable of Creating Thinking

2022年10月6日

Machine Capable of Creating Thinking

Create your own Mid-journey via VQGAN+CLIP Hi there, it's been a while since i wrote something new and trending, I…
Bias-Variance Decomposition

2022年7月2日

Bias-Variance Decomposition

Implementing quadratic risk function to sklearn classifiers, regressors and keras/tensorflow Hi there, whatsup? things…

2 条评论
Bias-Variance Decomposition

2022年6月7日

Bias-Variance Decomposition

Bias-Variance Decomposition Implementing quadratic risk function to sklearn classifiers, regressors and…
Wilcoxon Signed-Rank, Mann-Whitney U, Kruskal-Wallis H, Friedman Test

2022年4月24日

Wilcoxon Signed-Rank, Mann-Whitney U, Kruskal-Wallis H, Friedman Test

Non-Parametric Hypothesis t-Test via Wilcoxon Signed-Rank, Mann-Whitney U, Kruskal-Wallis H, Friedman Test Hi, last…

See all articles

Chained and MultiLabel Algorithms

2. MultiLabel Classification is further divided?into?

— MultiOutput Classifier

— Classifier Chain

3. MultiOutput Regression similarly into?

— Multioutput Classifier

— Regressor Chain

— Multioutput sklearn-Wrapper Approach?

— Regressor Chain

Rupak Roy的更多文章

Crafting a Data Science Success Story: 11 Key Topics to Engage Stakeholders Effectively

Digital Twins vs. Generative AI Digital Twins: Understanding the Difference and Their New* Potential

Mastering RAG Fusion in Simple Steps: A Deep Dive into Retrieval-Augmented Generation"

Transitioning from a Sports-Centric Life to a Career in Data Science

What are the parameters in?LLM?

Fixing LLM Hallucination Conversations

Machine Capable of Creating Thinking

Bias-Variance Decomposition

Bias-Variance Decomposition

Wilcoxon Signed-Rank, Mann-Whitney U, Kruskal-Wallis H, Friedman Test