Better Data for Better Machine Learning
Ibrahim Sobh - PhD
?? Senior Expert of Artificial Intelligence, Valeo Group | LinkedIn Top Voice | Machine Learning | Deep Learning | Data Science | Computer Vision | NLP | Developer | Researcher | Lecturer
Usually, Data is messy! Data could be unbalanced, mislabeled with missed or incorrect values. The first step to getting the dataset cleaned is to understand and analyze it.
Facets contains two robust visualizations to aid in understanding and analyzing machine learning datasets.
- Facets Overview: Get a sense of the shape of each feature of your dataset. Uncover several uncommon and common issues such as unexpected feature values, missing feature values for a large number of observation, training/serving skew and train/test/validation set skew.
- Facets Dive: Explore individual observations. Exploring the relationship between data points across all of the different features of a dataset. Each individual item in the visualization represents a data point. Position items by "faceting" or bucketing them in multiple dimensions by their feature values. Enables the detection of classifier failure, identification of systematic errors, evaluating ground truth and potential new signals for ranking.
Regards