Analytics of Data Scientists in Kaggle
State of Machine Learning and Data Science 2020

Analytics of Data Scientists in Kaggle

Kaggle has recently published a report on the Kaggle users on various aspects. The trend shows analysis of people working as Data scientists and using Kaggle for competitions, learnings and related. So let's quickly jump to the story and understand what are the major points that come out of the survey.

Country contribution the most:

India continues to be one of the biggest contributor to the community of data scientists in Kaggle. Also it indirectly says how adaptive we are to the latest technologies and trend (here

data science) 

No alt text provided for this image

The age of data scientists:

The below graphs shows how young the field is. Most of the data scientists being in the range of 25-29 yrs of age.

No alt text provided for this image

The best IDE :

Who have been working as data scientists or ML engineers/enthusiasts could very easily guess the best IDE that would have topped the table. Yes, its JupyterLabe, followed by Visual Studio and PyCharm.

No alt text provided for this image

The most used algorithms :

Coming to the technology part, Linear Regression and Logistic Regression are still being the most common algorithms being used within the community. Followed by Trees and Ensemble methods. So don't be upset of you are still not very well versed with the new methods and algorithms

No alt text provided for this image

The most used frameworks:

On the framework side Sckit-learn , Tensorflow and Keras being the most used, PyTorch is catching up very fast.

No alt text provided for this image

Which cloud platforms outperform:

On the cloud front, AWS is still the market leader followed by GCP and Azure Cloud. The difference between GCP and Azure is not substantial and I believe Microsoft is used widely in the industries like banking.

No alt text provided for this image

The best Data Visualization Tool:

For the telling the stories Tableau still tops the table with ~39% shared. Microsoft Powe BI being free of cost still is at the 2nd position. 

No alt text provided for this image

Gender Bias :

One upsetting one, female contribution is still lacking with a big margin.

No alt text provided for this image

The most used learning platforms :

And finally something for data science/ML enthusiasts. You can see what is the learning platform Kaggle users have used the most with below graph.

No alt text provided for this image


After being founded around in 2010 Kaggle has come a long way. And now it is seen as the 'must' platform for the data scientists/enthusiasts. If you want to get into the field my recommendation is to use Kaggle. It has variety of datasets and different competitions which people across the world participate. You can understand how people are solving business problems using machine learning and related technologies.

This survey done by Kaggle is very useful and gives a very good insight in crisp and clear manner for the data science practitioners and enthusiasts.

Details of the survey can be found under https://www.kaggle.com/kaggle-survey-2020 .


- Raja Saurabh Tiwari

要查看或添加评论,请登录

Raja Saurabh Tiwari的更多文章

  • The Hidden Cost of AI

    The Hidden Cost of AI

    Artificial Intelligence (AI) is revolutionizing industries, enhancing automation, and creating new possibilities for…

    3 条评论
  • Agentic AI - My take

    Agentic AI - My take

    Introduction In recent months, Agentic AI has emerged as a focal point in the technology sector, captivating both…

    16 条评论
  • Large Language Models vs Small Language Models

    Large Language Models vs Small Language Models

    Before directly jumping to LLM, a quick recap on AI and Machine Learning. We all have been seeing the below image which…

    2 条评论
  • So what makes a good data science profile

    So what makes a good data science profile

    Let's start with some stats Data science was named the fastest-growing job in 2017 by LinkedIn, and in 2018 Glassdoor…

    3 条评论
  • Don't let your fear win

    Don't let your fear win

    Once Krishna and Balarama got late playing in the forest. They decided to rest in there over the night and thought to…

    1 条评论
  • Data Lake & Data Mesh

    Data Lake & Data Mesh

    Global data creation is projected to exceed 180 zettabytes in the next five years. It was always a struggle to create a…

  • Text Analysis - Word Cloud

    Text Analysis - Word Cloud

    Text Analysis : Text analysis one of the richest area in the Machine Learning space. Text analysis is the process of…

  • Machine Learning (Without CODE)

    Machine Learning (Without CODE)

    Machine learning is very fascinating for data science practitioners and everyone and there's a continuous effort…

    2 条评论
  • Statistics vs. Visualization (#Data Science)

    Statistics vs. Visualization (#Data Science)

    Understanding the statistical properties of the data is one of the key aspect of data science or Machine Learning…

  • AutoML - first glance

    AutoML - first glance

    "Machine Learning and AI attempts to automate manual work..

社区洞察

其他会员也浏览了