Deepchecks for Data and Model Validation
How do you ensure you are building “good” models??
Well, it depends a lot of course on the “data” and also on “how” you build the model.
While you’re in the research phase, you’ll probably want to validate your data, look for potential methodological problems, and/or validate your model and evaluate it. Here is where Deepchecks, an open-source python package, comes in handy.?
Deepchecks is the leading tool for testing and for validating your machine learning models and data, and it enables doing so with minimal effort. Deepchecks accompanies you through various validation and testing needs such as verifying your data’s integrity, inspecting its distributions, validating data splits, evaluating your model and comparing between different models.
Load Data, Split Train-Val, and Train a Simple Model
For the purpose of this guide we’ll use the simple iris dataset and train a simple random forest model for multi-class classification:
Define a Dataset Object
Next, we’ll need to initialize the Dataset object, stating the relevant metadata about the dataset (e.g. the name for the label column, which features are categorical, etc.).
Check out the Dataset’s attributes to see which additional special columns can be declared and used (e.g. date column, index column).
Single Dataset Integrity Suite
The suite is composed of various checks such as: String Length Out Of Bounds, Outlier Sample Detection, Mixed Nulls, etc...
Each check may contain conditions (which will result in pass/fail/warning/error , represented by ? / ? / ! / ? ) as well as other outputs such as plots or tables. Here’s what the results of the conditions look like:
Note: Suites, checks and conditions can all be modified. Read more about custom suites .
领英推荐
Run a Deepchecks Check
If you want to run a specific check, you can just import it and run it directly. See for example:
Of course, this was just one example. Check out the Tabular Checks examples in the docs or the API Reference for more info about the existing checks and their parameters.
The check produces visualizations (drift score bar & distribution figure), as well as a result value (JSON). See both of these below:
Given below we are going to see the output of running a Train Test Label Drift check. The Train Test Label Drift check calculates label drift between the train dataset and test dataset, using statistical measures.
You will also find the output with a Drift score, which is a measure for the difference between two distributions, in this check - the test and train distributions.
The check shows the drift score and distributions for the label.
and also inspect the result value which has a check dependent structure:
Deepchecks provides us with numerous tools to help us navigate our development and make better feature engineering and model selection decisions, by easily raising critical issues related to data drift, overfitting, leakage, feature importance and model calibration readily accessible.
And this is just what deepchecks can do out of the box, with the prebuilt checks and suites! There is a lot more potential in the way the package lends itself to easy customization and creation of checks and suites tailored to your needs. We will touch upon some such advanced uses in future guides.
To explore all the checks and validation in Deepchecks, go try it out yourself !
Don’t forget to ? their Github repo , it’s really a big deal for open-source-led companies like Deepchecks.
Scientist at NATIONAL INSTITUTE OF RESEARCH IN TUBERCULOSIS (NIRT), ICMR, CHENNAI-31
2 年Very Glad Aishwaya. Goddess mariamman blessings
CEO at Deepchecks | Forbes 30 Under 30 | Open Source ML Validation package
2 年Aishwarya Srinivasan great to have you on board! May this piece be the first of many =]
Data Scientist | Automation | Mechatronic | Biomédica
2 年Felicitaciones Aishwarya. Seguimos avanzando. Un fuerte abrazo.
?????????????? ???????? ?????? Beauty| Weight Loss| Home Workout| Skin Care| Hairstyle | fitnes | healthy food #beauty #weightloss #homeworkout #skincare #hairstyle #fitness #healthyfood
2 年Very nice Aishwarya Srinivasan