?? Day 96 of 365: Detecting Outliers in Multivariate Data ??
Ajinkya Deokate
Data Scientist | Researcher | Author | Public Speaking Expert @PlanetSpark | Freelancer
Hey everyone!
Welcome to Day 96 of our #365DaysOfDataScience journey! ??
On Day 96, we’re focusing on outliers—those data points that don’t quite fit the trend. Today, we’ll explore how to detect them in multivariate data using visualizations and clustering techniques like DBSCAN.
?? What We’ll Be Exploring Today:
- Scatter matrix plots: Useful for spotting outliers visually across multiple features.
- Clustering techniques (DBSCAN): An unsupervised method that can help detect outliers based on how densely packed data points are.
?? Learning Resources:
1. Watch: [Multivariate Outlier Detection](https://www.youtube.com/) (YouTube).
2. Read: Scikit-learn docs for [DBSCAN](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) for outlier detection.
?? Today’s Task:
- Load a multivariate dataset.
- Use scatter plots to visualize potential outliers in different feature combinations.
- Apply DBSCAN clustering to detect and highlight outliers in your dataset.
- Analyze the outliers—what do they tell you about the data? Should they be removed, or could they hold important information?
I’ll be doing this alongside you, and we can compare how well DBSCAN detects outliers in different datasets. Let’s dig in and see what those outliers are hiding! ??
Happy Learning and See you Soon!
***