Some cool Data Science tricks, most of Pandas
Soumyabrata Roy
Generative AI | Machine Learning | AI Engineer @ Deloitte ???? Ex-Cognizant ?? YouTuber @ DataDrivenDecision ?? Writer @ Medium
This week I have learned quite a few data tricks which I would like to share with you all. It might help you in your own data analysis.
for the analysis, I will be using the COVID vaccination data from Kaggle: https://www.kaggle.com/gpreda/covid-world-vaccination-progress
1st Trick: How to find missing data % in the whole database?
#code data.isnull().sum()*100/len(data) #output country 0.000000 iso_code 0.000000 date 0.000000 total_vaccinations 37.453558 people_vaccinated 44.355530 people_fully_vaccinated 62.360674 daily_vaccinations_raw 47.342098 daily_vaccinations 2.572163 total_vaccinations_per_hundred 37.453558 people_vaccinated_per_hundred 44.355530 people_fully_vaccinated_per_hundred 62.360674 daily_vaccinations_per_million 2.572163 vaccines 0.000000 source_name 0.000000 source_website 0.000000 dtype: float64
You can see that, out of the total dataset, the total vaccination null values are 37% and daily vaccination per million is almost 2%.
2nd Trick: How to find the correlations among different variables?
# use this code for the correlations data.corr()
and you could also visualize the data using the seaborn heatmap
# you can ignore cmap and annot sns.heatmap(data.corr(), cmap='summer',annot=True)
3rd Trick: Visualize the correlations of a single column (total vaccination) with other columns.
It is very handy to understand the correlations between the target and independent variables
data.corr()['total_vaccinations'][1:].plot(kind='bar')
4th Trick: Filter out columns based on a particular data type
# here I have selected data type as object data.select_dtypes('object').columns #output Index(['country', 'iso_code', 'date', 'vaccines', 'source_name', 'source_website'], dtype='object')
Here we can see that the date is present as object-type data. To get more information from the date data, we need to convert it to datetime. Let's do that using pandas
data['date'] = pd.to_datetime(data['date'])
5th Trick: Replace some data with a particular value
# replacing AFG country code with Afgan value data.replace('AFG','Afgan') # to make this changes permanent, we need to put inplace = True data.replace('AFG','Afgan', inplace=True)
6th Trick: what are *args and *kwargs in python functions?
We often see this in different python methods, functions. Basically *args represent unlimited inputs in a form of tuple and *kwargs represent unlimited inputs in a form of a dictionary. Below examples will clarify the points
# I have declared a function with *args as an input and print 5% of sum def output(*args): print(sum(args)*.5) # output output(1,2,3,4) 5.0 # Now let's print out only *args def output(*args): print(args) #output output(1,2,3,4) (1, 2, 3, 4) # at present it is a tuple. Here you can add any number of inputs as you want
Let's talk about **kwargs
# define a function def output_kwargs(**kwargs): print(kwargs) # if you enter inputs here, it's looks like output_kwargs(a=1,b=2,c=3) # output {'a': 1, 'b': 2, 'c': 3} # It is a python dictionary. Now if we want we could do anything out of this dictionary
I hope you like the tricks and methods. I will catch up with you very soon.