登录查看更多内容

Winning Strategies for ML Competitions from Past Winners

Kunal Jain

Founder & CEO @ Analytics Vidhya | AI Evangelist | Author | Blogger | Keynote Speaker

发布日期: 2016年10月20日

Knocktober starts in 6 hours. We thought of providing you the winning strategies from our past competitions. Read on, to know the hackathon approach of three top data scientists. They have also shared useful tips & tricks, that will definitely help you to improve your leaderboard position.

Let's dig in and find out what are the ways that can help you win Knocktober.

1.Sudalai Rajkumar (SRK) , Senior Data Scientist, AV Rank 1 (Read detailed article here)

His approach in past competitions:

Understanding the problem and dataset
Pre-processing the data: Data cleansing, Outlier removal, Normalization / Standardization, Dummy variable creation
Feature engineering : Feature selection, Feature transformation, Variable interaction and Feature creation
Selecting the modeling algorithm
Parameter tuning through cross validation
Building the model
Checking the results by making a submission

Once you’ve executed these 7 steps, a basic framework will be ready to do more experimentation. Further, you can concentrate more on:

Feature engineering – This is where bigger improvements come from most of the times
Building varied kind of models and ensembling them – This will help go that extra mile towards the end

Last but not the least, we must perform a solid local validation. Else, we might end up over fitting on the public leader board.

Tips from SRK:

1. Understanding the problem – It is really important to have a thorough understanding of the problem that we are trying to solve. Only after we’ve understood the problem clearly, we can derive suitable insights from data to tackle the problem and obtain good results.

2. Structured Thinking – It’s a unique way of thinking through the problems. Being a data scientist, one needs to be more structured in his/her thinking in order to obtain good results.

3. Effective communication of results – Effective communication of derived results is as important as performing the data analysis.

2. Rohan Rao, Lead Data Scientist, AV Rank 4 (Read detailed article here)

His approach in past competitions:

Understand the problem / objective you are trying to solve.
Understand and summarize what data you have / need.
Carefully read about the evaluation metric.
Explore and visualize the data, build simple, base models for benchmark.
Setup a robust / thorough validation framework consistent with the evaluation conditions.
Work on feature engineering and optimizing algorithms.
Try out as many different models / ideas as you can.
Ensemble / Blend / Stack multiple models.
Never hesitate in asking questions, taking help or even teaming up with others.

Tips from Rohan:

Gauge the complexity of the problem: Explore the data as much as possible. Plot features, summarize columns, build benchmark models, and during the process, get a sense of the problem, data, time, complexity, etc. And then slowly build a good solid concrete solution by working on one idea after another.
Algorithm: I use XGBoost and feature engineering for building ML solutions and it’s been a part of my winning solution for most of the contests I’ve done well in, so a big thanks to the community who are actively developing and improving it each day. I also like Collaborative Filtering techniques, which I’ve implemented very often in my work.
Feature Selection Ways: My thumb rule of feature selection is based on CV or Val scores. If selecting a feature improves CV score, I use it, else discard. For large number of features, I usually build small quick models and check variable importance or information gain, and select the top-x from them.

3. Steve Donoho, Top Data Scientist (Read detailed article here)

His approach in past competitions:

Well, I start by simply familiarizing myself with the data. I plot histograms and scatter plots of the various variables and see how they are correlated with the dependent variable. I sometimes run an algorithm like GBM or Random Forest on all the variables simply to get a ranking of variable importance.
I usually start very simple and work my way toward more complex if necessary. My first few submissions are usually just “baseline” submissions of extremely simple models – like “guess the average” or “guess the average segmented by variable X.” These are simply to establish what is possible with very simple models. You’d be surprised that you can sometimes come very close to the score of someone doing something very complex by just using a simple model.

Tips from Steve:

Making Predictions: This is an important step that is often missed by many – they just throw the raw dependent variable into their favorite algorithm and hope for the best. But sometimes you want to create a derived dependent variable.
I probably spend 50% of my time on data exploration and cleansing depending on the problem.

Go on & use these tips from the winners and takeaway your first win in Knocktober.

If you still haven't registered, don't waste anymore time.

Register Now

Winning prize amount:

INR 50K (~$750) - 1st Place
INR 25K (~$350) - 2nd Place
INR 15K (~$225) - 3rd Place

All the Best !!

Winning Strategies for ML Competitions from Past Winners

Kunal Jain

Founder & CEO @ Analytics Vidhya | AI Evangelist | Author | Blogger | Keynote Speaker

1.Sudalai Rajkumar (SRK) , Senior Data Scientist, AV Rank 1 (Read detailed article here)

2. Rohan Rao, Lead Data Scientist, AV Rank 4 (Read detailed article here)

3. Steve Donoho, Top Data Scientist (Read detailed article here)

Register Now

更多精彩文章

社区洞察

其他会员也浏览了

Kaggle Vs Real-world Projects

Fidel Vetino Working with "Arrays"in Spark 3.5

Complete Hackathon Guide for Data Science Competitions

2018 Annual Review?—?The Year of Exponential Productivity

12 Winning Tips to Clinch Your First Win in Data Science Competitions

MachineHack Project Weekly: The Great Real Estate Data Challenge

Insights 2 Innovations Lab Program Spotlight: John-Michael Louis

Roadmap to Data Science By Mr. Kahenya

DataHack season is here ~ Grab $8000 cash prizes in Machine Learning Hackathons

KDnuggets Blog Contest: Automated Data Science and Machine Learning

1.Sudalai Rajkumar (SRK) , Senior Data Scientist, AV Rank 1 (Read detailed article here)

2. Rohan Rao, Lead Data Scientist, AV Rank 4 (Read detailed article here)

3. Steve Donoho, Top Data Scientist (Read detailed article here)

Register Now

6 Reasons Every Data Scientist Should Attend DataHack Summit 2019 (13 - 16 November, Bengaluru)

2019年9月21日

5 Reasons why you should join AI & ML BlackBelt Plus Program

2019年9月2日

The Global AI Revolution is here - Become an AI & ML BlackBelt!!!

2019年4月9日

Live Hack Sessions At DataHack Summit 2018 - Register before prices go up

2018年9月27日

New Course Launch : Computer Vision Using Deep Learning (60% OFF Today)

2018年8月31日

Don't miss these amazing speakers at DataHack Summit 2018, Bengaluru

2018年8月27日

Celebrating 72nd Independence Day | Exciting Discounts On Courses & Conference

2018年8月11日

Best Courses In Data Science- Few Hours Left To Enroll at 45% Discount

2018年7月31日

Early Bird Offer Ends Tomorrow - DataHack Summit 2018 (Where Humans Meet AI)

2018年7月30日

6 reasons every student should participate in Student DataFest 2018

2018年6月1日

社区洞察

其他会员也浏览了

Kaggle Vs Real-world Projects

Fidel Vetino Working with "Arrays"in Spark 3.5

Complete Hackathon Guide for Data Science Competitions

2018 Annual Review?—?The Year of Exponential Productivity

12 Winning Tips to Clinch Your First Win in Data Science Competitions

MachineHack Project Weekly: The Great Real Estate Data Challenge

Insights 2 Innovations Lab Program Spotlight: John-Michael Louis

Roadmap to Data Science By Mr. Kahenya

DataHack season is here ~ Grab $8000 cash prizes in Machine Learning Hackathons

KDnuggets Blog Contest: Automated Data Science and Machine Learning