登录查看更多内容

12 Winning Tips to Clinch Your First Win in Data Science Competitions

Kunal Jain

Founder & CEO @ Analytics Vidhya | AI Evangelist | Author | Blogger | Keynote Speaker

发布日期: 2016年7月22日

Introduction

So, what are you doing this weekend ? We have an amazing opportunity you wouldn't want to miss (if you are crazy about machine learning). Brace yourself up, the action is about to get begin.

The Smart Recruits is round the corner. To help you make last minute strategic plans, we thought of sharing these winning tips with you all. These tips will provide you a unique perspective to help you build better ML models.

Competition conduced on Datahack are fast paced (lives for only 48 - 72 hours) so that you start to think, act and respond faster to solve business problems. We want you to be fast and efficient. So, if you are determined to walk with our pace, very soon you'll reach another milestone in your life. Stay with us!

Follow the tips below shared by 4 past Datahack champions.

Tips by 4 DataHack Winners

Nalin Pasricha, DataHack Rank 1, Mumbai

Nalin is an investment banker turned data scientist who currently works as an independent consultant.

He has participated in 17 hackathons at DataHack. He won Data Hackathon 3.x and emerged as the 1st Runner Up in Black Friday DataHack. Check out his complete profile here.

Here’s what Nalin has to say:

Our mind works subconsciously at night on our problems in a very powerful manner. So I try to start work on the problem as early as possible so that my mind has at least one night to work subconsciously on the problem.

Read inspirational books or watch inspiring videos during a competition. I think it really helps your mind to go beyond its usual limits. I remember I was reading 'The Wright Brothers' by David McCullough during one hackathon. It’s the story of two brothers who were only bicycle manufacturers, they had not even attended college, they had no funding, and still they managed to make the world's first aeroplane, beating top scientists, universities etc. I did really well in the hackathon mainly because my mindset was changed due to this book.

Try to use a package or language that is new to you. It'll make you think differently and spur your creativity. I normally use R, but when I try to use Python instead I think I come up with unusual solutions.

Sudalai Rajkumar (SRK), DataHack Rank 2, Chennai

SRK is a Senior Data Scientist at Tiger Analytics. He is currently positioned at Kaggle Rank 23 and bestowed with Grandmaster Title on Kaggle. He is an inspiration for most of the aspiring data scientist in our community.

He has participated in 17 hackathons on DataHack. Two time winner of Mini DataHack and 2nd runner up for Black Friday Datahack. Check out his complete profile here.

Here's what SRK has to say:

Feature engineering - The first and foremost important thing. We need to concentrate a lot on this since this makes a huge difference in the scores.

Solid Validation Strategy - Without this, competitions are more or less like a gambling and so it is essential to have a proper local validation strategy. Public LB can be misleading at times.

Ensembling / Stacking - This is an important last step which helps us cover that extra mile at the end.

Rohan Rao, DataHack Rank 5, Bengaluru

Rohan is the Lead Data Scientist at AdWyze. He is currently positioned at Kaggle Rank 70 and holds the prestigious Kaggle Master title. He has represented and brought laurels to India in World Sudoku championships.

He has participated in 11 hackathons on DataHack. He's the winner of The Seer's Accuracy DataHack and stood as 1st runner up in Last Man Standing. Check out his complete profile here.

Here's what Rohan has to say:

Understand The Problem: Without understanding the problem statement, the data, the evaluation metric, most of your work is fruitless. Spend time in reading as much as possible about them. Only once you are very clear about the objective, you can proceed with exploration.I spend a good amount of time reading through and re-reading through all the available information. It usually helps me in figuring out an approach / direction before writing a single line of code.

Summarize / Visualize Data: Data Science competitions are driven by data. It’s all about the data. Sometimes you can have a great problem statement but noisy data. Sometimes you can have really clean data but a tricky evaluation metric. Sometimes you might have a good model, but with skewed outliers. While there are huge advancements being made to automate a lot of this, there is still a lot of value in exploring data yourself. Cleaning data, handling outliers, transforming data, engineering features, etc. are all winners. I've found these to be major factors in Machine Learning projects.Feature engineering is the most useful output of data exploration. I believe that if you find the right and useful features, you can build a single powerful model better than any ensemble.Remember the Garbage In Garbage Out philosophy, if you input noisy/unclean data into a model, no matter how powerful the model is, it will result in noisy output.

Validation Framework: A lot of people jump into building models by dumping data into the algorithms. While it is useful to get a sense of basic benchmarks, you need to take a step back and build a robust validation framework.Without validation, you are just shooting in the dark. You will be at the mercy of overfitting, leakage and other possible evaluation issues.By replicating the evaluation mechanism, you can make faster and better improvements by measuring your validation results along with making sure your model is robust enough to perform well on various subsets of the train/test data.

Shantanu Dutta, DataHack Rank 6, Kolkata

Shan is a Senior Associate at ACME. He is a self learned data scientist and specializes in BFSI and marketing. So, all this way, if you ever doubted that self learning can't make you a data scientist, you were wrong.

Shan has participated in 37 hackathons on DataHack. He won Date Your Data and Re-date Your Data competition. Check out his complete profile here.

Here's what Shan has to say.

Understand the Data: Do not worry about needing huge amounts of compute power, it is possible to do well in these competitions with moderate setups.Understand the data and generate a hypothesis. This part is important.

Preprocessing & Feature Engineering: Spend a considerable amount of the time in pre-processing and feature engineering. Have participated in many competitions, and it's never the case that any dataset is perfectly clean , there's always some sort of inherent noise in the dataset that'll be creating hiccups in models. It may be missing values, outliers etc. Be able to visualize the data at each level of extraction will avoid many frustrations at the end.

Algorithm Selection: Select the algorithm most suited for data. Have confidence on your handcrafted cross validation results.

Now, you have the winning potion. It's time to test your winning habit. Use these tips in our upcoming competition The Smart Recruits and shine out as a champion.

This competition is going to be intense and mind-boggling, you will have to fight and survive to reach the end. Be a Winner, and challenge all your limits this time.

REGISTER NOW

To know more about the competition VISIT HERE

Vishal Sriwastav

Certified Azure DP-100 | Senior Specialist |Analytics & BI |Data Science| Machine Learning at R1 RCM

8 年

it's really quite helpful..

Hafeezur Rehman

VP - Lead Software Engineer at Wells Fargo | Data Visualization | Analytics | AI/ML & Data Science Enthusiast

8 年

Excellent advise guyz.... its like treasure for me....

查看更多评论

要查看或添加评论，请登录

Kunal Jain的更多文章

6 Reasons Every Data Scientist Should Attend DataHack Summit 2019 (13 - 16 November, Bengaluru)

2019年9月21日

6 Reasons Every Data Scientist Should Attend DataHack Summit 2019 (13 - 16 November, Bengaluru)

India's Largest Applied Artificial Intelligence & Machine Learning Conference Are you ready to experience A.I.
5 Reasons why you should join AI & ML BlackBelt Plus Program

2019年9月2日

5 Reasons why you should join AI & ML BlackBelt Plus Program

Dear Readers, AI is helping us bring more things to the reality that we could once only imagine. In the last few years…

1 条评论
The Global AI Revolution is here - Become an AI & ML BlackBelt!!!

2019年4月9日

The Global AI Revolution is here - Become an AI & ML BlackBelt!!!

There are multiple elements that go into becoming an AI expert. Data Science, Machine Learning and Deep Learning are…
Live Hack Sessions At DataHack Summit 2018 - Register before prices go up

2018年9月27日

Live Hack Sessions At DataHack Summit 2018 - Register before prices go up

DataHack Summit 2018 is getting bigger and mightier! We have just announced new & exciting 1-hour live hack sessions to…
New Course Launch : Computer Vision Using Deep Learning (60% OFF Today)

2018年8月31日

New Course Launch : Computer Vision Using Deep Learning (60% OFF Today)

Great news for all our community members! Analytics Vidhya has launched one of the best & comprehensive online course…

1 条评论
Don't miss these amazing speakers at DataHack Summit 2018, Bengaluru

2018年8月27日

Don't miss these amazing speakers at DataHack Summit 2018, Bengaluru

Most Advanced Conference On Artificial Intelligence & Machine Learning Experience Artificial Intelligence in action…

1 条评论
Celebrating 72nd Independence Day | Exciting Discounts On Courses & Conference

2018年8月11日

Celebrating 72nd Independence Day | Exciting Discounts On Courses & Conference

India is ready for the biggest Data Science revolution. Are you ready to be a top data scientist? It's time to…
Best Courses In Data Science- Few Hours Left To Enroll at 45% Discount

2018年7月31日

Best Courses In Data Science- Few Hours Left To Enroll at 45% Discount

Without data, you’re just another person with an opinion – W. Edwards Deming The role of a data scientist is the most…

1 条评论
Early Bird Offer Ends Tomorrow - DataHack Summit 2018 (Where Humans Meet AI)

2018年7月30日

Early Bird Offer Ends Tomorrow - DataHack Summit 2018 (Where Humans Meet AI)

Experience Artificial Intelligence in action like never before with DataHack Summit 2018, which will bring together…
6 reasons every student should participate in Student DataFest 2018

2018年6月1日

6 reasons every student should participate in Student DataFest 2018

Student DataFest 2018 has got off to a thunderous start. We have seen over thousands of students enrolled in the…

4 条评论

See all articles

12 Winning Tips to Clinch Your First Win in Data Science Competitions

Kunal Jain

Founder & CEO @ Analytics Vidhya | AI Evangelist | Author | Blogger | Keynote Speaker

Introduction

Tips by 4 DataHack Winners

Nalin Pasricha, DataHack Rank 1, Mumbai

Sudalai Rajkumar (SRK), DataHack Rank 2, Chennai

Rohan Rao, DataHack Rank 5, Bengaluru

Shantanu Dutta, DataHack Rank 6, Kolkata

REGISTER NOW

Kunal Jain的更多文章

社区洞察

其他会员也浏览了

Kaggle Vs Real-world Projects

2018 Annual Review?—?The Year of Exponential Productivity

Winning Strategies for ML Competitions from Past Winners

What can you do to stand out when trying to become a Data Scientist?

Looking forward to Data Science Week 2020, Virtual Edition!

Roadmap to Data Science By Mr. Kahenya

DataHack season is here ~ Grab $8000 cash prizes in Machine Learning Hackathons

??Journey into the Depths of CRISP-DM! ??

Pushing me out of my comfort zone

A Chapter Ends. A New Chapter Begins.

Introduction

Tips by 4 DataHack Winners

Nalin Pasricha, DataHack Rank 1, Mumbai

Sudalai Rajkumar (SRK), DataHack Rank 2, Chennai

Rohan Rao, DataHack Rank 5, Bengaluru

Shantanu Dutta, DataHack Rank 6, Kolkata

REGISTER NOW

Kunal Jain的更多文章

6 Reasons Every Data Scientist Should Attend DataHack Summit 2019 (13 - 16 November, Bengaluru)

5 Reasons why you should join AI & ML BlackBelt Plus Program

The Global AI Revolution is here - Become an AI & ML BlackBelt!!!

Live Hack Sessions At DataHack Summit 2018 - Register before prices go up

New Course Launch : Computer Vision Using Deep Learning (60% OFF Today)

Don't miss these amazing speakers at DataHack Summit 2018, Bengaluru

Celebrating 72nd Independence Day | Exciting Discounts On Courses & Conference

Best Courses In Data Science- Few Hours Left To Enroll at 45% Discount

Early Bird Offer Ends Tomorrow - DataHack Summit 2018 (Where Humans Meet AI)

6 reasons every student should participate in Student DataFest 2018

社区洞察

其他会员也浏览了

Kaggle Vs Real-world Projects

2018 Annual Review?—?The Year of Exponential Productivity

Winning Strategies for ML Competitions from Past Winners

What can you do to stand out when trying to become a Data Scientist?

Looking forward to Data Science Week 2020, Virtual Edition!

Roadmap to Data Science By Mr. Kahenya

DataHack season is here ~ Grab $8000 cash prizes in Machine Learning Hackathons

??Journey into the Depths of CRISP-DM! ??

Pushing me out of my comfort zone

A Chapter Ends. A New Chapter Begins.