登录查看更多内容

Scrapping Twitter Data with Tweepy API.

Dhruv Bhatia

Data Science & Analytics - Executive @Mindshare

发布日期: 2021年4月25日

Python library called tweepy (https://www.tweepy.org/) to access tweets in real-time.

Now you have to create a Twitter account if you don’t already have one. Applying for a developer account is an easy process and the application asks a few basic questions like the reason for the application, etc. And the approval of the developer account generally takes 2-3 days.

Step 1: Click here ( https://developer.twitter.com/ )

Step 2: Click “Apply”

Step 3: Click “Apply for a developer account”

Step 4: Login with your Twitter Account

Step 5: Fill the “form”

Note: Don't forget to check your inbox as you will be receiving approval email.

After creating the app, access the credentials:

Congrats!! You have now created a Twitter App and can now own the credentials as well.

Lets move to next step now...

In your Python Notebook install tweepy library - pip install tweepy

Use the following commands to get started with scraping of twitter data

import tweepy

import pandas as pd

import numpy as np

Now you will be needing all four authentication keys to access twitter API. This is done to connect as:

#OAth Handler or jump server/ reverse proxy server

consumer_key = "70**************wH"
consumer_key_secret = "Mk**********************************************hr"

#From proxy server we need to connect

access_token = "13******************************5l"
access_token_secret = "xR*************************************Ke"

Now the next step is to be connected to jump server of twitter, this can be done with the following command line:

auth = tweepy.OAuthHandler(consumer_key, consumer_key_secret)

As with the above command, jump server has been connected, the next step is to connect the jump server of twitter with web server of twitter, as follow:

auth.set_access_token(access_token, access_token_secret)

Kudos!! You have done great. Now we can easily connect to API Strong Server of Twitter

api = tweepy.API(auth)

In next step we will taking example of scraping tweets of cryptocurrency (bitcoin) in language mode as English, with taking count of total tweets as 200. Note: you can change the inputs as per your preference.

public_tweets = api.search('bitcoin', count=200, lang='en', tweet_mode='extended')

#Will now create a pandas dataframe as follows:

df = pd.DataFrame(data=[tweet.full_text for tweet in public_tweets], columns=['original_Tweets'])
df.head(10)

Other relevant information such as

Length of each tweet

ID of individual tweet

Date when a particular tweet has been tweeted

Source of the tweet

Number of Likes on a particular tweet

Re-Tweets any individual tweet received from users

related to the scrapped tweets can be extracted as following command lines:

df['len'] = np.array([len(tweet.full_text) for tweet in public_tweets])
df['ID'] = np.array([tweet.id for tweet in public_tweets])
df['Date'] = np.array([tweet.created_at for tweet in public_tweets])
df['Source'] = np.array([tweet.source for tweet in public_tweets])
df['Likes'] = np.array([tweet.favorite_count for tweet in public_tweets])
df['RTs'] = np.array([tweet.retweet_count for tweet in public_tweets])
display(df.head(10))

Congratulations!! We have now scrapped real time 200 "bitcoin" related tweets. Now its time to store the data frame in CSV file for further usage.

df.to_csv('bitcointweetsscraping.csv')

With this we have came to an end with Web Scraping of Twitter Data with Python. Hope you have enjoyed learning this skill and will use it further to enhance it. #happyupskilling

upGrad

3 年

Hi Dhruv, Congratulations on your excellent success and good luck for more progress. Thank you.

1 次回应

Samiksha Adsule

3 年

Thanks for sharing it's very helpful and will try this ????

1 次回应

Prathish Kumar Arunachalam

Delivery Manager | Aspiring Data Scientist | Deep Learning | Oracle

3 年

Good work Dhruv...keep going...

1 次回应

查看更多评论

要查看或添加评论，请登录

Dhruv Bhatia的更多文章

Setup Autoscaling

2021年6月27日

Setup Autoscaling

In real life you're load on the websites can change over time. So for example, say your users are doing shopping…

2 条评论
REGex Software Services - Master Class on Git & GitHub

2021年6月13日

REGex Software Services - Master Class on Git & GitHub

Name: DHRUV BHATIA College Name: International Institute of Information and Technology, Bangalore Summary: GitHub is a…

6 条评论
From Being A Sales Rep To Being A Data Scientist.

2021年2月2日

From Being A Sales Rep To Being A Data Scientist.

Having a career transition from Business Developer /Sales Specialist to Data Scientist. Being in Business Development…

37 条评论

Scrapping Twitter Data with Tweepy API.

Dhruv Bhatia

Data Science & Analytics - Executive @Mindshare

Dhruv Bhatia的更多文章

社区洞察

其他会员也浏览了

Streamlit: A Game Changer for Data Science Applications

Dynamic Web Scraping with Python, Pandas and DuckDB

Spark 3.0 : Adaptive Query Execution & Dynamic Partition Pruning

AI-Powered Search: Building a Semantic Search Engine with MongoDB and Python

Automating Weather Data Processing with Airflow, Docker, and Python

Categorizing Age into Ranges in SQL, Python, R, Power BI, & MS Excel

Python Data Types & Data Structures

Extract Large Datasets from Salesforce using Python

Building and Deploying a Flight Tracking Application: A Data-Centric Approach with Python, Docker, Postgres, and Airflow by Fidel Vetino

How to Connect Python to Google Sheets

Dhruv Bhatia的更多文章

Setup Autoscaling

REGex Software Services - Master Class on Git & GitHub

From Being A Sales Rep To Being A Data Scientist.

社区洞察

其他会员也浏览了

Streamlit: A Game Changer for Data Science Applications

Dynamic Web Scraping with Python, Pandas and DuckDB

Spark 3.0 : Adaptive Query Execution & Dynamic Partition Pruning

AI-Powered Search: Building a Semantic Search Engine with MongoDB and Python

Automating Weather Data Processing with Airflow, Docker, and Python

Categorizing Age into Ranges in SQL, Python, R, Power BI, & MS Excel

Python Data Types & Data Structures

Extract Large Datasets from Salesforce using Python

Building and Deploying a Flight Tracking Application: A Data-Centric Approach with Python, Docker, Postgres, and Airflow by Fidel Vetino

How to Connect Python to Google Sheets