Top Algorithms & Methods used by Data Scientists
Gregory Piatetsky-Shapiro
Part-time philosopher, Retired, Data Scientist, KDD and KDnuggets Founder, was LinkedIn Top Voice on Data Science & Analytics. Currently helping Ukrainian refugees in MA.
Algorithms are a key aspect of Data Science, and many recent KDnuggets posts looked at popular algorithms, including The 10 Algorithms Machine Learning Engineers Need to Know or 10 Algorithm Categories for A.I., Big Data, and Data Science.
But which algorithms are actually used by Data Scientists?
This was the question asked in a recent KDnuggets Poll, and here are the top 10 algorithms:
Fig. 1: Top 10 algorithms used by Data Scientists, and their share of respondents.
See full table of all algorithms in KDnuggets Post: https://www.kdnuggets.com/2016/09/poll-algorithms-used-data-scientists.html
The average respondent used 8.1 algorithms, a big increase vs a similar poll in 2011.
Comparing with 2011 Poll Algorithms for data analysis / data mining we note that the top methods are still Regression, Clustering, Decision Trees/Rules, and Visualization. The biggest relative increases, measured by (pct2016 /pct2011 - 1) are for
- Boosting, up 40% to 32.8% share in 2016 from 23.5% share in 2011
- Text Mining, up 30% to 35.9% from 27.7%
- Visualization, up 27% to 48.7% from 38.3%
- Time series/Sequence analysis, up 25% to 37.0% from 29.6%
- Anomaly/Deviation detection, up 19% to 19.5% from 16.4%
- Ensemble methods, up 19% to 33.6% from 28.3%
- SVM, up 18% to 33.6% from 28.6%
- Regression, up 16% to 67.1% from 57.9%
Most popular among new options added in 2016 are:
- K-nearest neighbors, 46% share
- PCA, 43%
- Random Forests, 38%
- Optimization, 24%
- Neural networks - Deep Learning, 19%
- Singular Value Decomposition, 16%
The biggest declines are for
- Association rules, down 47% to 15.3% from 28.6%
- Uplift modeling, down 36% to 3.1% from 4.8% (that is a surprise, given strong results published)
- Factor Analysis, down 24% to 14.2% from 18.6%
- Survival Analysis, down 15% to 7.9% from 9.3%
See full results, including usage of different algorithms type by employment, Algorithm usage bias by Employment, and full table for all 29 algorithms and methods on KDnuggets:
Top Algorithms Used by Data Scientists
https://www.kdnuggets.com/2016/09/poll-algorithms-used-data-scientists.html
Senior Data Scientist - Global Markets Gen AI @ Bank of America Merrill Lynch | Gen AI production | Agentic | Micro services
8 年Survival Analysis, down 15% to 7.9% from 9.3% ??? ... why SA is not preferred ?.
Senior Dev TL CIB
8 年Interesting however
There isn't such a thing, called "Data Science". This name is just a marketing denomination for a new pseudo-science.
Information Technology Architect - Data, Infrastructure, Cyber, Software - Views expressed are my own and do not represent my employer.
8 年If data science really works, then data science should be able to be automated so that the right algorithms are chosen automatically rather than this being a manual process by a data scientist. I have some skepticism about the hype over data science, because much of the process needs automation, which is more of a software architecture exercise than a data science exercise. Algorithms are great, but there need to be frameworks that automate the tedious work. Most data scientist spend an inordinate amount of time viewing descriptive statistics, testing out various methods, and doing essentially manual work to import data, cleanse it, determine clustering approaches, and search for appropriate training models. Add to this, that many models require extensive modification when feedback is introduced that varies the model horizontal and vertical (row/column) dynamics. This all rightly need to be automated in software framework in my view.
Advanced Analytics Professional
8 年Surprised Neural Networks doesn't find a place!