登录查看更多内容

40 Techniques Used by Data Scientists

Vincent Granville

AI/LLM Disruptive Leader | GenAI Tech Lab

发布日期: 2016年7月5日

These techniques cover most of what data scientists and related practitioners are using in their daily activities, whether they use solutions offered by a vendor, or whether they design proprietary tools. When you click on any of the 40 links below, you will find a selection of articles related to the entry in question. Most of these articles are hard to find with a Google search, so in some ways this gives you access to the hidden literature on data science, machine learning, and statistical science. Many of these articles are fundamental to understand the technique in question, and come with further references and source code.

Starred techniques (marked with a *) belong to what I call deep data science, a branch of data science that has little if any overlap with closely related fields such as machine learning, computer science, operations research, mathematics, or statistics. Even classical machine learning and statistical techniques such as clustering, density estimation, or tests of hypotheses, have model-free, data-driven, robust versions designed for automated processing (as in machine-to-machine communications), and thus also belong to deep data science. However, these techniques are not starred here, as the standard versions of these techniques are more well known (and unfortunately used) than the deep data science equivalent.

To learn more about deep data science, click here. Note that unlike deep learning, deep data science is not the intersection of data science and artificial intelligence; however, the analogy between deep data science and deep learning is not completely meaningless, in the sense that both deal with automation.

Also, to discover in which contexts and applications the 40 techniques below are used, I invite you to read the following articles:

21 data science systems used by Amazon to operate its business
24 Uses of Statistical Modeling

Finally, when using a technique, you need to test its performance. Read this article about 11 Important Model Evaluation Techniques Everyone Should Know.

The 40 data science techniques

Linear Regression
Logistic Regression
Jackknife Regression *
Density Estimation
Confidence Interval
Test of Hypotheses
Pattern Recognition
Clustering - (aka Unsupervised Learning)
Supervised Learning
Time Series
Decision Trees
Random Numbers
Monte-Carlo Simulation
Bayesian Statistics
Naive Bayes
Principal Component Analysis - (PCA)
Ensembles
Neural Networks
Support Vector Machine - (SVM)
Nearest Neighbors - (k-NN)
Feature Selection - (aka Variable Reduction)
Indexation / Cataloguing *
(Geo-) Spatial Modeling
Recommendation Engine *
Search Engine *
Attribution Modeling *
Collaborative Filtering *
Rule System
Linkage Analysis
Association Rules
Scoring Engine
Segmentation
Predictive Modeling
Graphs
Deep Learning
Game Theory
Imputation
Survival Analysis
Arbitrage
Lift Modeling
Yield Optimization
Cross-Validation
Model Fitting

The number of techniques is higher than 40 because we updated the article, and added additional ones.

▲ ▲ ▲ Sivakumar Velayampakkam

Technical Project Manager at iLink Digital

6 年

Interesting and thank for sharing

Diogo Broner

Sócio Fundador da Yatahey Consultoria | Mentor PUC angels | Marketing |

7 年

Nilton Kazuyuki U.

1 次回应

Andrea Gallelli

PhD, Statistical Officer at Eurostat, European Commission

8 年

Interesting list. Even if I find that some of them are sub-categories of others (like all the regressions are model fitting techniques). But my question is: is it better to be very good at a few of them, or to have some familiarity with many?

.CARLOS TAPIA.

DIRECTOR: SERVICE QUALITY INSTITUTE CONFERENCIAS Y CAPACITACIóN: VENTAS, SERVICIO AL CLIENTE Y CX. Certificado#51403318

8 年

Many models for handling the data in one site, good references.

Tanya Huggins

Oracle Alliance Manager @ AST LLC | Board Certified Specialist

8 年

Sharing this, this is a great deep dive in to an area many of my partners are looking for guidance in. Than you Vincent for pulling this together.

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

40 Techniques Used by Data Scientists

Vincent Granville

AI/LLM Disruptive Leader | GenAI Tech Lab

更多精彩文章

社区洞察

其他会员也浏览了

Is Data Science Easy Or AI: Unveiling The Truth Behind The Buzz!

Data Science Connect | Microsoft

24 Ultimate Data Science (ML) projects to work on in 2022

24 Ultimate Data Science (ML) projects to work on in 2022.

17 Data Analytics Books You Should Read in 2022

Vector Indexing plus Knowledge Graphs with Neo4j

Responsible Data Science Framework: Techniques, Algorithms, and Fairness for Insightful Analysis and Ethical Practices

The Role of Machine Learning in Data Science

New LLM & RAG Courses and Certifications

2024年11月14日

Optimizing AI Systems: Fintech Case Study

2024年11月5日

LLM, RAG, GPT & GenAI: Free Certifications and Courses from Leading Experts

2024年11月1日

Building a GenAI/LLM app on AWS with Anthropic Claude

2024年10月28日

AI/RAG Tutorial: Building Enterprise-Grade, Secure, Scalable Data APIs

2024年10月22日

AI, GenAI, LLM, Prompt Engineering, NLP: Review of the Ecosystem

2024年10月18日

New Book: Building Disruptive AI & LLM Technology from Scratch

2024年10月15日

Building an Enterprise-Grade Agentic RAG

2024年10月14日

Databases For AI, GenAI & RAG/LLMs: Vendor Comparison

2024年10月9日

Building a Ranking System to Enhance Prompt Results: The New PageRank for RAG/LLM

2024年10月8日