How to Properly Analyze Your Personal LinkedIn Data With?Python

How to Properly Analyze Your Personal LinkedIn Data With?Python

Hands-On Tutorial To Extract And Analyse Your Linkedin?Data.

In this tutorial, you will learn the proper ways to extract your personal data from your Linkedin account and use Python to analyze and draw useful insights from it.

First things first…

If you don’t have a Linkedin account, please run as fast as you can to Linkedin page to create one.

Actually it’s not a good habit to not have a Linkedin account in this modern world…lol

Linkedin is one of the biggest social network out there, and the chances are you are proud Linkedin member (if not create one now-please).

Linkedin gives you access you to your data and you can download and analyze this data to draw insights from it.

Download the Jupyter Notebook for this tutorial


This project is part of the Python Crash Course

Downloading Your Linkedin Account?Data:

Linkedin has a clear guide as to how to download your data. I have included the most important part of this guide below for your reference:

You can initiate a download of your LinkedIn data from your Settings & Privacy page.

To request a download of your data:

  1. Click the Me icon(the icon that shows your profile) at the top of your LinkedIn homepage.
  2. Select Settings & Privacy from the dropdown.
  3. Click the Data Privacy on the left rail.
  4. Under the How LinkedIn uses your data section, click Get a copy of your data.
  5. Select the data that you’re looking for and Request archive.

You’ll have the option to either select a specific category of information or a larger download. If you select a specific type of content, you’ll receive an email within minutes with a link. If you select the larger data archive, you’ll receive an email within 24 hours. Use the link provided in the email to download the information you requested. The data archive will be available for download for 72 hours.

Select the data that you want to?analyse.

You can select the type of data you are interested in downloading for your analysis. In this tutorial, we will be using the Connections data. Feel free to also analyse anything else that you are curious about.

No alt text provided for this image


You can easily download your data as soon as it ready.

No alt text provided for this image


Import Libraries and Load?Dataset

We will load our downloaded data using Pandas and store the data in a variable called lkd(use any variable name of your choice). We will then see the first 20 records of our linkedin data.


import pandas as pd


lkd=pd.read_csv('/content/Connections.csv')


lkd.head(20)        
No alt text provided for this image

As we can see, the above is an output of my Linkedin connections.

Let’s see how many connections I have on Linkedin:


lkd.shape        
No alt text provided for this image

So as you can, as at when am writing this tutorial, I have 8000+ connections on Linkedin, that’s not bad though.

Insights

Timeline: How is my Connection activity overtime?

Let’s find out how my Connection activities has been so far. That’s how I have been sending connections and receiving connections over time.


lkd=lkd.sort_values(by='Connected On')


import plotly.express as px


px.line(lkd.groupby(by='Connected On').count().reset_index(),
        x = 'Connected On',
        y = 'First Name',
        labels = {'First Name':'Number of Connections'},
        title = 'Connection Timeline'))

time.show()        
No alt text provided for this image


As we can see, there are spikes in my connection activities, there are some periods I had a lot of connections(e.g. on 17-April-2020) while there some days that my connections dropped (e.g on 10-Jun-2020).

Now that we know our connection activities over time, we will then try to find out where these connections are working.

I hope one of my connections is working at Tesla, I need to drive one of the new Tesla models…lol

Companies: Where do my Connections work?

Now let’s find out where these connections that we have connected with on Linkedin are working.

We will analyse the Company column in order to identify the companies our connections are working at.

lkd['Company']        


No alt text provided for this image

This does not give us clarity of what we want.

Let’s therefore use the groupby() function to group our data by the Company and use the count() function to count how many of our Connections work in the various companies.

group_company=lkd.groupby(by='Company').count().reset_index(
group_company)        
No alt text provided for this image

This gives us the count of how many Connections are working in a particular company. It’s by default using ascending order-ascending=True (from the smallest count to the highest count). So example, I have 1 connection working at Vsolv Engineering India Pvt Ltd.

Let’s reverse the table. That is, by seeing the companies with highest count first.

Let’s sort these values in descending order instead by using the sort_values() function by setting ascending=False. We will sort it by the count of the ‘Connected On’ column.

You can also sort it by any other variable like First Name, Last Name or Position.

group_company=group_company.sort_values(by = 'Connected On',ascending=False).reset_index(drop=True)


group_company        
No alt text provided for this image

This one looks better. We can see that most of my connections are working at Tata Consultancy Services followed by Amazon, Accenture, Cognizant and so on…

Visualisation

Let’s use Plotly to visualise our data for better insights.

fig=px.bar(group_company[:100]
       x = 'Company',
       y = 'Connected On',
       labels = {'Connected On':'Number of Connections'},
       width = 1000,
       height = 900,
       title = 'Bar graph for companies that my Connections are working at.'
       
       
       )


fig,        
No alt text provided for this image

As we can see, Plotly has also make it more clear the various companies that my connections are working.

Now from the plot, we can see all the companies that our connections are working at and also how many are working in each company. For instance, I can see that most of my connections are working at Tata Consultancy Services and so on…

Let’s use Tree Plot in Plotly to have a better visualisation of our connections company analysis.

NB: If you get the any error, try to upgrade your plotly using the command below:

pip install — upgrade plotly

Plotting Tree Map

fig=px.treemap(group_company[:200], path = ['Company','Position'],
      
       values = 'Connected On',
       labels = {'Connected On':'Number of Connections'},
       width = 1000,
       height = 900,
       title = 'Tree Map for companies that my Connections are working at.'
       
       
       )


fig        
No alt text provided for this image

Treemap gives us a better view. The size of each company box represents the size of the connections working at that particular company.

When you plot the Tree Map, you can hover on the boxes to have a better view of the individual companies and the number of connections working there.

Positions: Which Positions do my connections hold?

Let’s now try to find out which specific positions our connections are occupying.

lkd['Position'].value_counts()        
No alt text provided for this image

From the above, we can see that most of my connections are Data Scientist, Data Analyst?,Software Engineer, Senior Data Scientist?,Machine Learning Engineer.

There is a break after Machine Learning Engineer, so I can’t really see all of my connections. Let’s do something about it.

From below, am going to count all the number of positions and find the percentage of each each position, and also give a condition to make the selection (e.g. I can find all the positions that are having more than 20% connections).

lkd['Position'].value_counts()/len(lkd)*100 > 0.20        
No alt text provided for this image

Am not interested in the True or False, so let me add the above two codes in one so that I will get the actual count of the positions and the job title.

Let’s combine the two codes from above…

lkd['Position'].value_counts()[lkd['Position'].value_counts()/len(lkd)*100 > 0.20]        
No alt text provided for this image

Now it looks much better!!

Let’s visualise this with Plotly.

fig = px.bar(lkd.groupby(by='Position').count().sort_values(by='First Name', ascending=False)[:50].reset_index()
       x='Position',
       y='Connected On',
       labels={'Connected On': 'Number Of Connections'},
        title= 'The various Positions occupied by my LinkedIn Connections'
      )


fig        
No alt text provided for this image


We can see the various positions and the number of connections holding these positions.

What if we get into the?cloud?

Let’s use a WordCloud to have a better view.

WordCloud Visualisation

from wordcloud import WordClou
import matplotlib.pyplot as plt
%matplotlib


def CreateWordCloud(text):
  wordcloud=WordCloud(width=1000, height=900,
                      background_color='black',
                      min_font_size = 10,
                      colormap = 'Set2').generate(text)


  fig=plt.figure(figsize=(15,10))
  plt.imshow(wordcloud,interpolation='bilinear')


  plt.show()
  return fig        
No alt text provided for this image

We define a function called CreateWordCloud, which will take in a text and generate a wordcloud based on the text data fed to it.

This looks quite better right? Yeah I think so too.

Conclusion:

Now in this tutorial, we have used the Connections data and analyse it and drawn some insights from it.

You can download any different type of your Linkedin data and perform similar analysis.

If you enjoyed this tutorial, please give it a thumps up. That’s enough appreciation to make my day.

Thanks in advance.

Ettore Sala Danna

Data Analyst @NEXiD || Alumni JEMIB || Interested in Growth Marketing

2 年

Super! I'll try it. ??????

回复
Vaibhav Jain

Data Scientist | GenAI | NLP | CPG, Retail, Defence

3 年

Ohh my God! Awesome stuff again! You have earned a lifetime follower for sure. Thanks ??

要查看或添加评论,请登录

Briit ????????的更多文章

社区洞察

其他会员也浏览了