登录查看更多内容

The world of classification in Machine Learning

Sachin S Panicker

Chief AI Scientist | Keynote Speaker at International Conferences | Tech Soothsayer | Researcher in Generative AI, Singularity, Web 3.0, Metaverse, Blockchain, IoT, Quantum Computing, Robotics & Design Thinking | Artist

发布日期: 2022年11月10日

Classification in Machine Learning has various applications. For instance, making product recommendations in the field of e-commerce, question answering in open domain, tagging of documents, and dynamism in search advertisements. But consider a scenario, where there are millions and millions of labels applicable, and one must predict a subset of the most relevant ones. Most of the traditional approaches to a multi label text classification falls short in such cases. Let’s do a comparison of the most prevalent and newer approaches out there.

First off, there is the AttentionXML based on Bi-LSTM method base model. This was one of the earliest methods that tried to combine an attention-based deep encoder with a label tree based shortlist. It was successful in adapting Attention maps to each resolution thereby enabling a full resolution while using a label representation to create a multi-level Hierarchical Label Tree [HLT]. Earlier tree-based methods used to employ the entire HLT for an extreme classification, while the more recent methods use label clusters only at a certain level of the HLT as meta-labels which are in turn used to shortlist candidate labels for the extreme task.

Now comes the modern approaches that have replaced the model architecture with a more powerful transformer model and thence fine-tuning a pre-trained instance such as Bert. But one has to be careful with such models as they are computationally very expensive and so far many have not been able to effectively leverage transformers for both computation and performance on such extreme multi-label classification tasks.

Couple of such approaches are the XR-Transformer and the LightXML one.

领英推荐

Unleashing Machine Learning Power

Altug Tatlisu 2 个月前

Regularization in Machine Learning

Sankhyana Consultancy Services Pvt. Ltd. 2 年前

Mastering Machine Learning: A Guide to Hyperparameter…

Pratik Thorat 1 年前

The latter approach employs dynamic negative sampling, which replaces pre-computed label shortlists with a dynamically calculated shortlist that changes as the model’s weights get updated. This enables end-to-end training with a single model by using the final feature representation of the transformer encoder for both the meta- and the extreme classification task. However the downside is that these two tasks interfere with one another. And this could be because the meta task needs the attention maps to focus on different tokens than the extreme task. Also it only uses a single-level tree, which prevents scaling to the largest datasets.

While the former approach is derived from multi-resolution approaches in computer vision like super resolution and progressive growing of Generative Adversarial Networks [GAN], and enables multiple resolutions through iterative training. However, unlike progressively grown GANs, which predict only at the highest resolution, XR-Transformer needs predictions across all resolutions for its progressive shortlisting pipeline, but uses representations trained at a single resolution. In practice, this leads to XR-Transformer having a complex multi-stage pipeline where the transformer model is iteratively trained up to a certain resolution and then frozen. This is followed by a re-clustering and re-training of multiple classifiers, working at different resolutions, with the same fixed transformer features. Unlike AttentionXML, using multiple instances of transformer models becomes undesirable due to their computational overhead. This enforces LightXML and XR-Transformer to make different trade-offs when leveraging a single transformer model for Extreme Multi-Label Text classification tasks compared to AttentionXML.

And finally there’s this newly proposed approach called CascadeXML, that combines the strengths of all these approaches, thereby creating an end to-end trainable multi-resolution learning pipeline which trains a single transformer model across multiple resolutions in a way that allows the creation of label resolution specific attention maps and feature embeddings.

CascadeXML optimises the training objective using Binary Cross Entropy loss as the loss function and AdamW as the optimiser.

It's an exciting world of research and development happening out there in the field of classification, and we are just getting started, so to speak, on unraveling the full potential of Transformer models!

#machinelearning

The world of classification in Machine Learning

Sachin S Panicker

Chief AI Scientist | Keynote Speaker at International Conferences | Tech Soothsayer | Researcher in Generative AI, Singularity, Web 3.0, Metaverse, Blockchain, IoT, Quantum Computing, Robotics & Design Thinking | Artist

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

A Practical Guide to AutoML for Enterprise

Critical Challenges in Modern Machine Learning

Why Is Machine Learning Important?

Embedded Machine Learning enables Artificial Intelligent Machines - 2 / 10

?? Boosting AI Performance by Optimizing Compute at Test-Time!

Machine Learning Vs Human Intelligence: Can Machines Outsmart Us?

Six preparation steps to leverage Machine Learning for hyperautomation

State of Retrosynthesis in Machine Learning era (Part 1 - A brief synopsis)

Unveiling the Power of Variational Autoencoders (VAEs) in Machine Learning

领英推荐

Inexplicably Explainable AI

2022年2月12日

Hyper parameterization - the holy grail of ML!

2022年2月3日

Scaling of Deep Networks

2022年1月27日

Future of Artificial Intelligence

2021年11月8日

Product Engineering at Fulcrum

2021年10月28日

Honey, we vanquished SARS-CoV-2

2020年6月24日

Leveling the AI playing field

2020年1月24日

Netflix, Amazon and the travails of Machine Learning.

2019年9月9日

The future of Mobile Application development

2019年7月24日

India needs a Station F!

2019年7月6日

社区洞察

其他会员也浏览了

A Practical Guide to AutoML for Enterprise

Critical Challenges in Modern Machine Learning

Why Is Machine Learning Important?

Embedded Machine Learning enables Artificial Intelligent Machines - 2 / 10

?? Boosting AI Performance by Optimizing Compute at Test-Time!

Machine Learning Vs Human Intelligence: Can Machines Outsmart Us?

Six preparation steps to leverage Machine Learning for hyperautomation

State of Retrosynthesis in Machine Learning era (Part 1 - A brief synopsis)

Unveiling the Power of Variational Autoencoders (VAEs) in Machine Learning