Analytics of Data Scientists in Kaggle
Raja Saurabh Tiwari
Vice President @ Citi | Java , Cloud, ML Solutions | Gen AI enthusiast | Wildlife Photography
Kaggle has recently published a report on the Kaggle users on various aspects. The trend shows analysis of people working as Data scientists and using Kaggle for competitions, learnings and related. So let's quickly jump to the story and understand what are the major points that come out of the survey.
Country contribution the most:
India continues to be one of the biggest contributor to the community of data scientists in Kaggle. Also it indirectly says how adaptive we are to the latest technologies and trend (here
data science)
The age of data scientists:
The below graphs shows how young the field is. Most of the data scientists being in the range of 25-29 yrs of age.
The best IDE :
Who have been working as data scientists or ML engineers/enthusiasts could very easily guess the best IDE that would have topped the table. Yes, its JupyterLabe, followed by Visual Studio and PyCharm.
The most used algorithms :
Coming to the technology part, Linear Regression and Logistic Regression are still being the most common algorithms being used within the community. Followed by Trees and Ensemble methods. So don't be upset of you are still not very well versed with the new methods and algorithms
The most used frameworks:
On the framework side Sckit-learn , Tensorflow and Keras being the most used, PyTorch is catching up very fast.
Which cloud platforms outperform:
On the cloud front, AWS is still the market leader followed by GCP and Azure Cloud. The difference between GCP and Azure is not substantial and I believe Microsoft is used widely in the industries like banking.
The best Data Visualization Tool:
For the telling the stories Tableau still tops the table with ~39% shared. Microsoft Powe BI being free of cost still is at the 2nd position.
Gender Bias :
One upsetting one, female contribution is still lacking with a big margin.
The most used learning platforms :
And finally something for data science/ML enthusiasts. You can see what is the learning platform Kaggle users have used the most with below graph.
After being founded around in 2010 Kaggle has come a long way. And now it is seen as the 'must' platform for the data scientists/enthusiasts. If you want to get into the field my recommendation is to use Kaggle. It has variety of datasets and different competitions which people across the world participate. You can understand how people are solving business problems using machine learning and related technologies.
This survey done by Kaggle is very useful and gives a very good insight in crisp and clear manner for the data science practitioners and enthusiasts.
Details of the survey can be found under https://www.kaggle.com/kaggle-survey-2020 .
- Raja Saurabh Tiwari