How to Converting Pandas Column of Comma-Separated Strings Into Dummy Variables?
How to Make a Simple and Quick Dummy Operations for a Pandas Column from Comma Separated Strings. How can Use it in Scikit-learn Pipeline.
1- Defining the Problem and Options
Most data scientists face the same problems. When performing exploratory data analysis and then preparing the data for machine learning, one of the problems is “how to handle comma-separated strings”. You can choose to apply a dummy process to all data, but this has some undesirable consequences.
For example Data Leakage. If you want to avoid data leakage, you have to do all the steps step by step. So these take time and increase complexity. So we cover it in the article “How to Make a Simple and Quick Dummy Operations for a Pandas Column from Comma Separated Strings”.
2- Inheritance
To understand how we can write our own custom transformers with scikit-learn, we first have to get a little familiar with the concept of inheritance in Python. You can get more information from the link below.
3- scikit-learn Sample Code [“OneHotEncoder”, “OrdinalEncoder”]
How to Converting Pandas Column of Comma-Separated Strings Into Dummy Variables?
You are creating an instance called ‘ohe’ of the class ‘OneHotEncoder’ using its class constructor and passing it the argument ‘ignore’ for its parameter ‘handle_unknown’ and the argument ‘False’ for its parameter ‘sparse’. The OneHotEncoder class has methods such as ‘fit’, ‘transform’.
4- Let’s Defining the GetDummies Class in Python Programming Language using Object Oriented Programming Approach
领英推荐
Code Page: How to Converting Pandas Column of Comma-Separated Strings Into Dummy Variables? (github.com)
Here Sample Notebook applied one Column GetDummies:
4–1 We have to create a custom transformer for to include this logic into a pipeline.
4–2 Sample Code fot using our GetDummies Class
As you can see, Our dummy operations is computed only based on train data. Then it is re-used to impute the missing columns. It is data-leakage-proof.
Here is a good article that explains how to create a custom transformer.
If you like it, don’t forget to follow and like-comment.
Big Data Analyst || Problem Solver || Project Manager || Instructor
2 年Great effort!??
Data Analysis || Data Scientist || Statistician || Machine Learning
2 年great job