The State of Competitive Machine Learning, Deep Learning and NLP

Mohammed Karimkhan Pathan

AI Architect | Senior Data Scientist | Generative AI Expert | Data Science Consultant

发布日期: 2023年3月27日

+ 关注

Our Linkedin feed is full of Large Language Models, AI advancements and applications.

A lot of research going on in this area. OpenAI, Meta, Microsoft, Google and other big players are releasing new models and research articles almost daily basis. This competition has helped us to understand and know the outperforming technologies.

In this article, I will try to summarize a review of competitive machine learning in 2022. We summarise the state of the competitive landscape and analyse the 200+ competitions that took place in 2022. Plus a deep dive analysis of 67 winning solutions to figure out the best strategies to win at competitive ML.

Here is the highlight of summmary

Successful competitors have mostly converged on a common set of tools —?Python, PyData, PyTorch, and gradient-boosted decision trees.

Deep learning still has not replaced gradient-boosted decision trees when it comes to tabular data, though it does often seem to add value when ensembled with boosting methods.

Transformers continue to dominate in NLP, and start to compete with convolutional neural nets in computer vision.

Competitions cover a?broad range of research areas?including computer vision, NLP, tabular data, robotics, time-series analysis, and many others.

Large ensembles remain common among winners, though single-model solutions do win too.

There are?several active machine learning competition platforms, as well as dozens of purpose-built websites for individual competitions.

Competitive machine learning?continues to grow in popularity, including in academia.

Around?50% of winners are solo winners;?50% of winners are first-time winners; 30% have won more than once before.

Some competitors are able to?invest significantly into hardware?used to train their solutions, though others who use free hardware like Google Colab are also still able to win competitions.

The largest competition by prize money was DrivenData’s?Snowcast Showdown, sponsored by the US Bureau of Reclamation. A $500k prize pool was made available to participants to help improve water supply management by providing accurate snow-water-equivalent estimates for different areas of the Western United States. As always, DrivenData’s excellent?meet the winners?write-up and?detailed solution reports?are well worth reading through.

The most popular competition in 2022 was Kaggle’s?American Express Default Prediction?competition, with over 4,000 teams entering. $100k was awarded in prizes, split across the top four teams, for predicting whether customers would or would not pay back their loans. The first place prize was won by a?first-time solo winner, with an ensemble of neural net and LightGBM models.

The largest independent competition was?Stanford’s AI Audit Challenge, which offered a $71,000 prize pool for the best “models, solutions, datasets, and tools to improve people’s ability to audit AI systems for illegal discrimination”.

There were three competitions based around financial forecasting, all on Kaggle: JPX’s?Tokyo Stock Exchange Prediction, Ubiquant’s?Market Prediction, and G-Research’s?Crypto Forecasting.

Thank you for taking the time to read my post.

As a senior data scientist with 9.5 years of experience in data science and AI, I am passionate about sharing my knowledge and insights with others in the field. My goal is to inspire and empower individuals and organizations to leverage the power of data science, machine learning, deep learning, and NLP to drive business success.

If you found this post helpful, I encourage you to Follow me on LinkedIn and hit the ?? to stay updated:?https://lnkd.in/dFrkreqY

with my latest content and share it with your network. Let's continue to learn and grow together!

Deep Dive - Data Science

3,112 位关注者

要查看或添加评论，请登录

Mohammed Karimkhan Pathan的更多文章

The Silent Epidemic: Unmasking Modern Addictions in the Digital Age

2025年2月17日

The Silent Epidemic: Unmasking Modern Addictions in the Digital Age

We live in an era of unprecedented connectivity, where information and entertainment are readily available. Yet, this…

2 条评论
DeepSeek AI: Technical Brilliance Meets User Experience Opportunity

2025年1月26日

DeepSeek AI: Technical Brilliance Meets User Experience Opportunity

DeepSeek has indeed developed an impressive AI model with remarkable capabilities, including: Efficient Design:…
DeepSeek-R1: Pioneering AI Reasoning through Reinforcement Learning

2025年1月23日

DeepSeek-R1: Pioneering AI Reasoning through Reinforcement Learning

The world of Artificial Intelligence (AI) is advancing rapidly, with transformer-based models and generative AI leading…
AutoGen - An Open-Source Programming Framework for Agentic AI

2024年8月12日

AutoGen - An Open-Source Programming Framework for Agentic AI

Microsoft's Autogen is blowing up on Github. It's a framework that allows LLM agents to chat with each other to solve…

3 条评论
Why do we require LLM finetuning ?

2024年7月30日

Why do we require LLM finetuning ?

Fine-tuning an LLM (large language model) involves additional training of a pre-trained language model on a…

2 条评论
Transformers - The game changing architecture behind GenerativeAI

2024年7月25日

Transformers - The game changing architecture behind GenerativeAI

The Transformers architecture, introduced in 2017, revolutionized AI by using self-attention mechanisms, enabling…
Hype Cycle for Artificial Intelligence, 2024

2024年7月1日

Hype Cycle for Artificial Intelligence, 2024

There is lot of discussion, confusion, fear among IT, Data science professionals regarding how AI is going to affect…
Run language model on your personal device: SLM ( Small Language Model ) - Phi 3

2024年6月1日

Run language model on your personal device: SLM ( Small Language Model ) - Phi 3

Large language models (LLMs) have created exciting new opportunities to be more productive and creative using AI. But…

5 条评论
Is Your Career at Risk due to Generative AI? Read This and Plan it Before It's Too Late

2024年5月22日

Is Your Career at Risk due to Generative AI? Read This and Plan it Before It's Too Late

As generative AI continues to advance, the landscape of the job market is poised for significant changes. A recent…

3 条评论
Detailed Decoding of job disruption due to Generative AI - Part 2

2024年5月21日

Detailed Decoding of job disruption due to Generative AI - Part 2

In this article we will try understand Job disruption due to generative AI with the help of Exposure score. Will see…

2 条评论

See all articles

Deep Dive - Data Science

3,112 位关注者

Mohammed Karimkhan Pathan的更多文章

The Silent Epidemic: Unmasking Modern Addictions in the Digital Age

DeepSeek AI: Technical Brilliance Meets User Experience Opportunity

DeepSeek-R1: Pioneering AI Reasoning through Reinforcement Learning

AutoGen - An Open-Source Programming Framework for Agentic AI

Why do we require LLM finetuning ?

Transformers - The game changing architecture behind GenerativeAI

Hype Cycle for Artificial Intelligence, 2024

Run language model on your personal device: SLM ( Small Language Model ) - Phi 3

Is Your Career at Risk due to Generative AI? Read This and Plan it Before It's Too Late

Detailed Decoding of job disruption due to Generative AI - Part 2

社区洞察