How to Properly Analyze Your Personal LinkedIn Data With?Python
Hands-On Tutorial To Extract And Analyse Your Linkedin?Data.
In this tutorial, you will learn the proper ways to extract your personal data from your Linkedin account and use Python to analyze and draw useful insights from it.
First things first…
If you don’t have a Linkedin account, please run as fast as you can to Linkedin page to create one.
Actually it’s not a good habit to not have a Linkedin account in this modern world…lol
Linkedin is one of the biggest social network out there, and the chances are you are proud Linkedin member (if not create one now-please).
Linkedin gives you access you to your data and you can download and analyze this data to draw insights from it.
This project is part of the Python Crash Course
Downloading Your Linkedin Account?Data:
Linkedin has a clear guide as to how to download your data. I have included the most important part of this guide below for your reference:
You can initiate a download of your LinkedIn data from your Settings & Privacy page.
To request a download of your data:
You’ll have the option to either select a specific category of information or a larger download. If you select a specific type of content, you’ll receive an email within minutes with a link. If you select the larger data archive, you’ll receive an email within 24 hours. Use the link provided in the email to download the information you requested. The data archive will be available for download for 72 hours.
Select the data that you want to?analyse.
You can select the type of data you are interested in downloading for your analysis. In this tutorial, we will be using the Connections data. Feel free to also analyse anything else that you are curious about.
You can easily download your data as soon as it ready.
Import Libraries and Load?Dataset
We will load our downloaded data using Pandas and store the data in a variable called lkd(use any variable name of your choice). We will then see the first 20 records of our linkedin data.
import pandas as pd
lkd=pd.read_csv('/content/Connections.csv')
lkd.head(20)
As we can see, the above is an output of my Linkedin connections.
Let’s see how many connections I have on Linkedin:
lkd.shape
So as you can, as at when am writing this tutorial, I have 8000+ connections on Linkedin, that’s not bad though.
Insights
Timeline: How is my Connection activity overtime?
Let’s find out how my Connection activities has been so far. That’s how I have been sending connections and receiving connections over time.
lkd=lkd.sort_values(by='Connected On')
import plotly.express as px
px.line(lkd.groupby(by='Connected On').count().reset_index(),
x = 'Connected On',
y = 'First Name',
labels = {'First Name':'Number of Connections'},
title = 'Connection Timeline'))
time.show()
As we can see, there are spikes in my connection activities, there are some periods I had a lot of connections(e.g. on 17-April-2020) while there some days that my connections dropped (e.g on 10-Jun-2020).
Now that we know our connection activities over time, we will then try to find out where these connections are working.
I hope one of my connections is working at Tesla, I need to drive one of the new Tesla models…lol
Companies: Where do my Connections work?
Now let’s find out where these connections that we have connected with on Linkedin are working.
We will analyse the Company column in order to identify the companies our connections are working at.
lkd['Company']
This does not give us clarity of what we want.
Let’s therefore use the groupby() function to group our data by the Company and use the count() function to count how many of our Connections work in the various companies.
领英推荐
group_company=lkd.groupby(by='Company').count().reset_index(
group_company)
This gives us the count of how many Connections are working in a particular company. It’s by default using ascending order-ascending=True (from the smallest count to the highest count). So example, I have 1 connection working at Vsolv Engineering India Pvt Ltd.
Let’s reverse the table. That is, by seeing the companies with highest count first.
Let’s sort these values in descending order instead by using the sort_values() function by setting ascending=False. We will sort it by the count of the ‘Connected On’ column.
You can also sort it by any other variable like First Name, Last Name or Position.
group_company=group_company.sort_values(by = 'Connected On',ascending=False).reset_index(drop=True)
group_company
This one looks better. We can see that most of my connections are working at Tata Consultancy Services followed by Amazon, Accenture, Cognizant and so on…
Visualisation
Let’s use Plotly to visualise our data for better insights.
fig=px.bar(group_company[:100]
x = 'Company',
y = 'Connected On',
labels = {'Connected On':'Number of Connections'},
width = 1000,
height = 900,
title = 'Bar graph for companies that my Connections are working at.'
)
fig,
As we can see, Plotly has also make it more clear the various companies that my connections are working.
Now from the plot, we can see all the companies that our connections are working at and also how many are working in each company. For instance, I can see that most of my connections are working at Tata Consultancy Services and so on…
Let’s use Tree Plot in Plotly to have a better visualisation of our connections company analysis.
NB: If you get the any error, try to upgrade your plotly using the command below:
pip install — upgrade plotly
Plotting Tree Map
fig=px.treemap(group_company[:200], path = ['Company','Position'],
values = 'Connected On',
labels = {'Connected On':'Number of Connections'},
width = 1000,
height = 900,
title = 'Tree Map for companies that my Connections are working at.'
)
fig
Treemap gives us a better view. The size of each company box represents the size of the connections working at that particular company.
When you plot the Tree Map, you can hover on the boxes to have a better view of the individual companies and the number of connections working there.
Positions: Which Positions do my connections hold?
Let’s now try to find out which specific positions our connections are occupying.
lkd['Position'].value_counts()
From the above, we can see that most of my connections are Data Scientist, Data Analyst?,Software Engineer, Senior Data Scientist?,Machine Learning Engineer.
There is a break after Machine Learning Engineer, so I can’t really see all of my connections. Let’s do something about it.
From below, am going to count all the number of positions and find the percentage of each each position, and also give a condition to make the selection (e.g. I can find all the positions that are having more than 20% connections).
lkd['Position'].value_counts()/len(lkd)*100 > 0.20
Am not interested in the True or False, so let me add the above two codes in one so that I will get the actual count of the positions and the job title.
Let’s combine the two codes from above…
lkd['Position'].value_counts()[lkd['Position'].value_counts()/len(lkd)*100 > 0.20]
Now it looks much better!!
Let’s visualise this with Plotly.
fig = px.bar(lkd.groupby(by='Position').count().sort_values(by='First Name', ascending=False)[:50].reset_index()
x='Position',
y='Connected On',
labels={'Connected On': 'Number Of Connections'},
title= 'The various Positions occupied by my LinkedIn Connections'
)
fig
We can see the various positions and the number of connections holding these positions.
What if we get into the?cloud?
Let’s use a WordCloud to have a better view.
WordCloud Visualisation
from wordcloud import WordClou
import matplotlib.pyplot as plt
%matplotlib
def CreateWordCloud(text):
wordcloud=WordCloud(width=1000, height=900,
background_color='black',
min_font_size = 10,
colormap = 'Set2').generate(text)
fig=plt.figure(figsize=(15,10))
plt.imshow(wordcloud,interpolation='bilinear')
plt.show()
return fig
We define a function called CreateWordCloud, which will take in a text and generate a wordcloud based on the text data fed to it.
This looks quite better right? Yeah I think so too.
Conclusion:
Now in this tutorial, we have used the Connections data and analyse it and drawn some insights from it.
You can download any different type of your Linkedin data and perform similar analysis.
If you enjoyed this tutorial, please give it a thumps up. That’s enough appreciation to make my day.
Thanks in advance.
Data Analyst @NEXiD || Alumni JEMIB || Interested in Growth Marketing
2 年Super! I'll try it. ??????
Data Scientist | GenAI | NLP | CPG, Retail, Defence
3 年Ohh my God! Awesome stuff again! You have earned a lifetime follower for sure. Thanks ??