Some cool Data Science tricks, most of Pandas
pic courtesy: sid balachandran unplash

Some cool Data Science tricks, most of Pandas

This week I have learned quite a few data tricks which I would like to share with you all. It might help you in your own data analysis.

for the analysis, I will be using the COVID vaccination data from Kaggle: https://www.kaggle.com/gpreda/covid-world-vaccination-progress

1st Trick: How to find missing data % in the whole database?

#code
data.isnull().sum()*100/len(data)

#output

country                                 0.000000
iso_code                                0.000000
date                                    0.000000
total_vaccinations                     37.453558
people_vaccinated                      44.355530
people_fully_vaccinated                62.360674
daily_vaccinations_raw                 47.342098
daily_vaccinations                      2.572163
total_vaccinations_per_hundred         37.453558
people_vaccinated_per_hundred          44.355530
people_fully_vaccinated_per_hundred    62.360674
daily_vaccinations_per_million          2.572163
vaccines                                0.000000
source_name                             0.000000
source_website                          0.000000
dtype: float64

You can see that, out of the total dataset, the total vaccination null values are 37% and daily vaccination per million is almost 2%.

2nd Trick: How to find the correlations among different variables?

# use this code for the correlations
data.corr()

and you could also visualize the data using the seaborn heatmap

# you can ignore cmap and annot
sns.heatmap(data.corr(), cmap='summer',annot=True)

3rd Trick: Visualize the correlations of a single column (total vaccination) with other columns.

It is very handy to understand the correlations between the target and independent variables

data.corr()['total_vaccinations'][1:].plot(kind='bar')

4th Trick: Filter out columns based on a particular data type

# here I have selected data type as object
data.select_dtypes('object').columns

#output

Index(['country', 'iso_code', 'date', 'vaccines', 'source_name',
       'source_website'],
      
dtype='object')

Here we can see that the date is present as object-type data. To get more information from the date data, we need to convert it to datetime. Let's do that using pandas

data['date'] =  pd.to_datetime(data['date'])

5th Trick: Replace some data with a particular value

# replacing AFG country code with Afgan value
data.replace('AFG','Afgan')

# to make this changes permanent, we need to put inplace = True
data.replace('AFG','Afgan', inplace=True)

6th Trick: what are *args and *kwargs in python functions?

We often see this in different python methods, functions. Basically *args represent unlimited inputs in a form of tuple and *kwargs represent unlimited inputs in a form of a dictionary. Below examples will clarify the points

# I have declared a function with *args as an input and print 5% of sum
def output(*args):
  print(sum(args)*.5)

# output
output(1,2,3,4)
5.0

# Now let's print out only *args
def output(*args):


  print(args)

#output
output(1,2,3,4)

(1, 2, 3, 4)
# at present it is a tuple. Here you can add any number of inputs as you want

Let's talk about **kwargs

# define a function
def output_kwargs(**kwargs):
  print(kwargs)

# if you enter inputs here, it's looks like
output_kwargs(a=1,b=2,c=3)

# output
{'a': 1, 'b': 2, 'c': 3}

# It is a python dictionary. Now if we want we could do anything out of this dictionary

I hope you like the tricks and methods. I will catch up with you very soon.

要查看或添加评论,请登录

Soumyabrata Roy的更多文章

  • Offline BI Dashboard You can send through Email

    Offline BI Dashboard You can send through Email

    Today data is the new oil. People who have data are always is in a competitive advantage over their peers.

  • Diabetes Predictor ML App

    Diabetes Predictor ML App

    Hello all. just see the diabetes predictor application in action.

  • What is Data Analysis? My intake from Google Data analysis course

    What is Data Analysis? My intake from Google Data analysis course

    I got the opportunity to take the new google data analytics course. This is the gist of my understanding of the data…

  • Creating Currency Converter | Python

    Creating Currency Converter | Python

    A currency converter could be really useful to anyone who needs to monitor currency fluctuations and it's measuring. In…

  • Create your own VE with Python

    Create your own VE with Python

    VE (virtual environment) could really help you in executing your products or services with different requirements. You…

    5 条评论
  • Automate the Google search using Python

    Automate the Google search using Python

    Whenever we need help, we take simple help from Google. With 5.

    2 条评论
  • Tableau the Best Tool for Data Visualization

    Tableau the Best Tool for Data Visualization

    We all know the data is important. Today millions of data are created every second and if we can analyze it, use it for…

    2 条评论
  • My Data Science Week 8

    My Data Science Week 8

    Ok. now you know the theory behind Binomial distribution, Poissons distribution from my previous article.

  • My Data Science Week 7

    My Data Science Week 7

    Hi There. In this week you will find more about the different distributions in Probability and about the…

  • My Data Science Week 6

    My Data Science Week 6

    Hi there, how are you doing today? Today I'm going to tell you more about the data and how you can use it to find out…

社区洞察

其他会员也浏览了