登录查看更多内容

How to Properly Analyze Your Personal LinkedIn Data With?Python

Briit ????????

12k Followers, 500+ Connections : Data Science and AI

发布日期: 2021年11月27日

Hands-On Tutorial To Extract And Analyse Your Linkedin?Data.

In this tutorial, you will learn the proper ways to extract your personal data from your Linkedin account and use Python to analyze and draw useful insights from it.

First things first…

If you don’t have a Linkedin account, please run as fast as you can to Linkedin page to create one.

Actually it’s not a good habit to not have a Linkedin account in this modern world…lol

Linkedin is one of the biggest social network out there, and the chances are you are proud Linkedin member (if not create one now-please).

Linkedin gives you access you to your data and you can download and analyze this data to draw insights from it.

Download the Jupyter Notebook for this tutorial

This project is part of the Python Crash Course

Downloading Your Linkedin Account?Data:

Linkedin has a clear guide as to how to download your data. I have included the most important part of this guide below for your reference:

You can initiate a download of your LinkedIn data from your Settings & Privacy page.

To request a download of your data:

Click the Me icon(the icon that shows your profile) at the top of your LinkedIn homepage.
Select Settings & Privacy from the dropdown.
Click the Data Privacy on the left rail.
Under the How LinkedIn uses your data section, click Get a copy of your data.
Select the data that you’re looking for and Request archive.

You’ll have the option to either select a specific category of information or a larger download. If you select a specific type of content, you’ll receive an email within minutes with a link. If you select the larger data archive, you’ll receive an email within 24 hours. Use the link provided in the email to download the information you requested. The data archive will be available for download for 72 hours.

Select the data that you want to?analyse.

You can select the type of data you are interested in downloading for your analysis. In this tutorial, we will be using the Connections data. Feel free to also analyse anything else that you are curious about.

You can easily download your data as soon as it ready.

Import Libraries and Load?Dataset

We will load our downloaded data using Pandas and store the data in a variable called lkd(use any variable name of your choice). We will then see the first 20 records of our linkedin data.

import pandas as pd


lkd=pd.read_csv('/content/Connections.csv')


lkd.head(20)

As we can see, the above is an output of my Linkedin connections.

Let’s see how many connections I have on Linkedin:

lkd.shape

So as you can, as at when am writing this tutorial, I have 8000+ connections on Linkedin, that’s not bad though.

Insights

Timeline: How is my Connection activity overtime?

Let’s find out how my Connection activities has been so far. That’s how I have been sending connections and receiving connections over time.

lkd=lkd.sort_values(by='Connected On')


import plotly.express as px


px.line(lkd.groupby(by='Connected On').count().reset_index(),
        x = 'Connected On',
        y = 'First Name',
        labels = {'First Name':'Number of Connections'},
        title = 'Connection Timeline'))

time.show()

As we can see, there are spikes in my connection activities, there are some periods I had a lot of connections(e.g. on 17-April-2020) while there some days that my connections dropped (e.g on 10-Jun-2020).

Now that we know our connection activities over time, we will then try to find out where these connections are working.

I hope one of my connections is working at Tesla, I need to drive one of the new Tesla models…lol

Companies: Where do my Connections work?

Now let’s find out where these connections that we have connected with on Linkedin are working.

We will analyse the Company column in order to identify the companies our connections are working at.

lkd['Company']

This does not give us clarity of what we want.

Let’s therefore use the groupby() function to group our data by the Company and use the count() function to count how many of our Connections work in the various companies.

领英推荐

50 Days of Data Analysis: Analyzing Data with NumPy

Benjamin Bennett Alexander 1 个月前

Custom Tables, Listings, and Figures (TLFs) Using…

Kirk Paul Lafler 1 年前

Microsoft Excel + Python Integration: A Game-Changer…

Walter Shields 1 年前

group_company=lkd.groupby(by='Company').count().reset_index(
group_company)

This gives us the count of how many Connections are working in a particular company. It’s by default using ascending order-ascending=True (from the smallest count to the highest count). So example, I have 1 connection working at Vsolv Engineering India Pvt Ltd.

Let’s reverse the table. That is, by seeing the companies with highest count first.

Let’s sort these values in descending order instead by using the sort_values() function by setting ascending=False. We will sort it by the count of the ‘Connected On’ column.

You can also sort it by any other variable like First Name, Last Name or Position.

group_company=group_company.sort_values(by = 'Connected On',ascending=False).reset_index(drop=True)


group_company

This one looks better. We can see that most of my connections are working at Tata Consultancy Services followed by Amazon, Accenture, Cognizant and so on…

Visualisation

Let’s use Plotly to visualise our data for better insights.

fig=px.bar(group_company[:100]
       x = 'Company',
       y = 'Connected On',
       labels = {'Connected On':'Number of Connections'},
       width = 1000,
       height = 900,
       title = 'Bar graph for companies that my Connections are working at.'
       
       
       )


fig,

As we can see, Plotly has also make it more clear the various companies that my connections are working.

Now from the plot, we can see all the companies that our connections are working at and also how many are working in each company. For instance, I can see that most of my connections are working at Tata Consultancy Services and so on…

Let’s use Tree Plot in Plotly to have a better visualisation of our connections company analysis.

NB: If you get the any error, try to upgrade your plotly using the command below:

pip install — upgrade plotly

Plotting Tree Map

fig=px.treemap(group_company[:200], path = ['Company','Position'],
      
       values = 'Connected On',
       labels = {'Connected On':'Number of Connections'},
       width = 1000,
       height = 900,
       title = 'Tree Map for companies that my Connections are working at.'
       
       
       )


fig

Treemap gives us a better view. The size of each company box represents the size of the connections working at that particular company.

When you plot the Tree Map, you can hover on the boxes to have a better view of the individual companies and the number of connections working there.

Positions: Which Positions do my connections hold?

Let’s now try to find out which specific positions our connections are occupying.

lkd['Position'].value_counts()

From the above, we can see that most of my connections are Data Scientist, Data Analyst?,Software Engineer, Senior Data Scientist?,Machine Learning Engineer.

There is a break after Machine Learning Engineer, so I can’t really see all of my connections. Let’s do something about it.

From below, am going to count all the number of positions and find the percentage of each each position, and also give a condition to make the selection (e.g. I can find all the positions that are having more than 20% connections).

lkd['Position'].value_counts()/len(lkd)*100 > 0.20

Am not interested in the True or False, so let me add the above two codes in one so that I will get the actual count of the positions and the job title.

Let’s combine the two codes from above…

lkd['Position'].value_counts()[lkd['Position'].value_counts()/len(lkd)*100 > 0.20]

Now it looks much better!!

Let’s visualise this with Plotly.

fig = px.bar(lkd.groupby(by='Position').count().sort_values(by='First Name', ascending=False)[:50].reset_index()
       x='Position',
       y='Connected On',
       labels={'Connected On': 'Number Of Connections'},
        title= 'The various Positions occupied by my LinkedIn Connections'
      )


fig

We can see the various positions and the number of connections holding these positions.

What if we get into the?cloud?

Let’s use a WordCloud to have a better view.

WordCloud Visualisation

from wordcloud import WordClou
import matplotlib.pyplot as plt
%matplotlib


def CreateWordCloud(text):
  wordcloud=WordCloud(width=1000, height=900,
                      background_color='black',
                      min_font_size = 10,
                      colormap = 'Set2').generate(text)


  fig=plt.figure(figsize=(15,10))
  plt.imshow(wordcloud,interpolation='bilinear')


  plt.show()
  return fig

We define a function called CreateWordCloud, which will take in a text and generate a wordcloud based on the text data fed to it.

This looks quite better right? Yeah I think so too.

Conclusion:

Now in this tutorial, we have used the Connections data and analyse it and drawn some insights from it.

You can download any different type of your Linkedin data and perform similar analysis.

If you enjoyed this tutorial, please give it a thumps up. That’s enough appreciation to make my day.

Thanks in advance.

Total Data Science

7,270 位关注者

Ettore Sala Danna

Data Analyst @NEXiD || Alumni JEMIB || Interested in Growth Marketing

2 年

Super! I'll try it. ??????

Vaibhav Jain

Data Scientist | GenAI | NLP | CPG, Retail, Defence

3 年

Ohh my God! Awesome stuff again! You have earned a lifetime follower for sure. Thanks ??

4 次回应

查看更多评论

要查看或添加评论，请登录

Briit ????????的更多文章

OpenAI o1-mini: All You Need To Know.

2024年9月25日

OpenAI o1-mini: All You Need To Know.

Author : Nancy Ticharwa (CEO) A Leap Forward in Cost-Efficient AI Reasoning in STEM In the rapidly evolving landscape…

2 条评论
The Latest Breakthroughs in Generative AI

2024年8月23日

The Latest Breakthroughs in Generative AI

In recent months, the world of artificial intelligence has witnessed a seismic shift with the emergence of cutting-edge…
The Top Data Scientist ? BootCamp

2024年8月18日

The Top Data Scientist ? BootCamp

I am literally blown away by this !! Nancy Ticharwa who is a former Data Scientist at Google and Uber and a lead Data…

2 条评论
The Exact Steps To Become A DATA ANALYST :: Based On My 15 years of Experience @ Microsoft

2022年9月22日

The Exact Steps To Become A DATA ANALYST :: Based On My 15 years of Experience @ Microsoft

After spending 15 years of my life working in the industry as a Data Analyst and moving on to become a Data Scientist…

3 条评论
The Exact Steps I used to Become A Data Scientist @Microsoft

2022年6月29日

The Exact Steps I used to Become A Data Scientist @Microsoft

In this article, I share my personal experience of how I became a Data Scientist at Microsoft. In my previous article…
How To Get A High Paying Data Science Job

2022年6月25日

How To Get A High Paying Data Science Job

Over the past few years, Data Science has become one of the most in demand jobs that command high salaries in the…
Full Stack Data Scientist Vs Full Stack Software?Engineer

2022年2月5日

Full Stack Data Scientist Vs Full Stack Software?Engineer

Should You Become A Full Stack Data Scientist OR A Full Stack Software Engineer The Definitions Full Stack Software…
How To Become A Full Stack Data Scientist In 2023

2022年1月16日

How To Become A Full Stack Data Scientist In 2023

Who is a Full Stack Data Scientist and How Do You Become One In 2022? 2022 is here and Data Science still remains the…
Will AutoML Affect Data Science Jobs In 2022 And Beyond?

2022年1月3日

Will AutoML Affect Data Science Jobs In 2022 And Beyond?

The new fear arising in the domain of Data Science is whether data scientists will eventually automate themselves out…
The Data Science Mindset: 6 Principles To Build Healthy Data-Driven Skills & Organizations

2021年12月21日

The Data Science Mindset: 6 Principles To Build Healthy Data-Driven Skills & Organizations

INTRODUCTION Five years ago, the McKinsey Global Institute (MGI) released Big data: The next frontier for innovation…

See all articles

How to Properly Analyze Your Personal LinkedIn Data With?Python

Briit ????????

12k Followers, 500+ Connections : Data Science and AI

Hands-On Tutorial To Extract And Analyse Your Linkedin?Data.

Download the Jupyter Notebook for this tutorial

Downloading Your Linkedin Account?Data:

Select the data that you want to?analyse.

Import Libraries and Load?Dataset

Insights

Timeline: How is my Connection activity overtime?

Companies: Where do my Connections work?

领英推荐

Visualisation

Let’s use Plotly to visualise our data for better insights.

Let’s use Tree Plot in Plotly to have a better visualisation of our connections company analysis.

Positions: Which Positions do my connections hold?

WordCloud Visualisation

Conclusion:

Total Data Science

7,270 位关注者

Briit ????????的更多文章

社区洞察

其他会员也浏览了

SQL & Python Pandas: A Beginner's Tutorial Using the Titanic Dataset

Introduction to Pandas Series and DataFrames: Building Blocks of Data Handling in Python

Categorizing Age into Ranges in SQL, Python, R, Power BI, & MS Excel

03. Unleashing the Power of Lists: Versatile Tools for Data Management and Manipulation in Python

Extract Large Datasets from Salesforce using Python

What Makes Python a Great Pick for Data Analysis?

Exploring Data Visualization with Python or Google Sheets: Creating Stunning Visuals for Insights

Embedded Power BI Reports in Fabric Notebooks (DP-600 Tip)

Correlation Matrix, analyzing data using Power BI and Python: a comparative analysis.

Hands-On Tutorial To Extract And Analyse Your Linkedin?Data.

Download the Jupyter Notebook for this tutorial

Downloading Your Linkedin Account?Data:

Select the data that you want to?analyse.

Import Libraries and Load?Dataset

Insights

Timeline: How is my Connection activity overtime?

Companies: Where do my Connections work?

领英推荐

Visualisation

Let’s use Plotly to visualise our data for better insights.

Let’s use Tree Plot in Plotly to have a better visualisation of our connections company analysis.

Positions: Which Positions do my connections hold?

WordCloud Visualisation

Conclusion:

Total Data Science

7,270 位关注者

Briit ????????的更多文章

OpenAI o1-mini: All You Need To Know.

The Latest Breakthroughs in Generative AI

The Top Data Scientist ? BootCamp

The Exact Steps To Become A DATA ANALYST :: Based On My 15 years of Experience @ Microsoft

The Exact Steps I used to Become A Data Scientist @Microsoft

How To Get A High Paying Data Science Job

Full Stack Data Scientist Vs Full Stack Software?Engineer

How To Become A Full Stack Data Scientist In 2023

Will AutoML Affect Data Science Jobs In 2022 And Beyond?

The Data Science Mindset: 6 Principles To Build Healthy Data-Driven Skills & Organizations

社区洞察

其他会员也浏览了

SQL & Python Pandas: A Beginner's Tutorial Using the Titanic Dataset

Introduction to Pandas Series and DataFrames: Building Blocks of Data Handling in Python

Categorizing Age into Ranges in SQL, Python, R, Power BI, & MS Excel

03. Unleashing the Power of Lists: Versatile Tools for Data Management and Manipulation in Python

Extract Large Datasets from Salesforce using Python

What Makes Python a Great Pick for Data Analysis?

Exploring Data Visualization with Python or Google Sheets: Creating Stunning Visuals for Insights

Embedded Power BI Reports in Fabric Notebooks (DP-600 Tip)

Correlation Matrix, analyzing data using Power BI and Python: a comparative analysis.