Converting cells in Pandas is Easy?
Aniruddha Kumar
Data Engineer @FloData Analytics || Runner-Up at Space Hackathon'23 (ISRO) | Orator | Final Year Under-Graduate CSE-AI&ML at VITB ||
We all want to do a lot of practice, but as we move forward it becomes tough for us to "Level UP! ". Why? Because we become used to that procedure.
Hoooo !! But in my case, you can say it is fortunate that whenever I start doing something, some new kind of error will arise and I gotta sort it out from scratch.
So, Here's the story.....
I was doing this course from Coursera called "Data Management and Visualization" by Wesleyan University So, they provided a link to the dataset of "GAPMINDER" (you can visit it here: GitHub Link)
And, my first dataset is something else. Man! it had so many blank cells and when I tried dataframe.dtypes(), it told me that, all columns are in object form.
But, as we are aware, to have a statistical analysis of the data, we need it in int64 or float form.
So, I tried using .to_numerical and astype() functions but none of them worked. Because pd.to_numerical can't parse the string, and you know what these blank cells were strings, not the data itself. How did I get to know that?
Why not continue your reading?
So, to change those "strings", I used astype("string").astype(int64) but that raised an error as "base 10: "----"' which was basically about all those blank spaces in the .csv file.
Now, I was upset and called a few of my friends, Eeman Majumder told me that I could use the on_bad_line= function, available inside the read_csv file.
After using this and a few others, I got to know there are no bad values, I was thinking that those blank cells are bad lines or something.
Now, comes Rahul Mandviya Bhaiya's chance, which made me observe that all those blank cells are "String".....
Yes, after talking to him, I got to know that these blank cells in .csv are strings, like WTH!!!!!
And, that's why pd.to_numerical and astype("string").astype(int64) weren't working.
Hooooo !! So, what to do now?
Somehow, Pushpendra Kushwaha came with ._convert and it worked, but how?
This function lets you convert everything which is incomprehensible or comprehensible to string, Float, Int, and all.
Now, the Time to do research is here, I searched about the ._convert, so it is a better version is pandas.DataFrame.convert_objects, but it is deprecated to
._convert has been broken into these three functions, Duh !! It wasn't working and yeah! this was the function being used by instructors.
So, I thought there has to be some other function, and well, when I was compiling my thoughts here, I got to know about pandas.DataFrame.convert_dtypes.
Definitely, go read about it, this function is a much better version of ._convert and can understand all those columns' data and change it to the respective datatype.
Now, After completing this task, I was pondering "Aren't there any other methods to do this?"
And, In a meeting, Vaasu Bisht made me aware of the looping techniques:
Easy method and can be used anytime you want, with no need for external functions.
Conclusion: I learned about the history of ._convert and got to know about these looping and conditional statements.