Scrapping Twitter Data with Tweepy API.
Python library called tweepy (https://www.tweepy.org/) to access tweets in real-time.
Now you have to create a Twitter account if you don’t already have one. Applying for a developer account is an easy process and the application asks a few basic questions like the reason for the application, etc. And the approval of the developer account generally takes 2-3 days.
Step 1: Click here ( https://developer.twitter.com/ )
Step 2: Click “Apply”
Step 3: Click “Apply for a developer account”
Step 4: Login with your Twitter Account
Step 5: Fill the “form”
Note: Don't forget to check your inbox as you will be receiving approval email.
After creating the app, access the credentials:
Congrats!! You have now created a Twitter App and can now own the credentials as well.
Lets move to next step now...
In your Python Notebook install tweepy library - pip install tweepy
Use the following commands to get started with scraping of twitter data
import tweepy
import pandas as pd
import numpy as np
Now you will be needing all four authentication keys to access twitter API. This is done to connect as:
#OAth Handler or jump server/ reverse proxy server
- consumer_key = "70**************wH"
- consumer_key_secret = "Mk**********************************************hr"
#From proxy server we need to connect
- access_token = "13******************************5l"
- access_token_secret = "xR*************************************Ke"
Now the next step is to be connected to jump server of twitter, this can be done with the following command line:
- auth = tweepy.OAuthHandler(consumer_key, consumer_key_secret)
As with the above command, jump server has been connected, the next step is to connect the jump server of twitter with web server of twitter, as follow:
- auth.set_access_token(access_token, access_token_secret)
Kudos!! You have done great. Now we can easily connect to API Strong Server of Twitter
- api = tweepy.API(auth)
In next step we will taking example of scraping tweets of cryptocurrency (bitcoin) in language mode as English, with taking count of total tweets as 200. Note: you can change the inputs as per your preference.
- public_tweets = api.search('bitcoin', count=200, lang='en', tweet_mode='extended')
#Will now create a pandas dataframe as follows:
- df = pd.DataFrame(data=[tweet.full_text for tweet in public_tweets], columns=['original_Tweets'])
- df.head(10)
Other relevant information such as
Length of each tweet
ID of individual tweet
Date when a particular tweet has been tweeted
Source of the tweet
Number of Likes on a particular tweet
Re-Tweets any individual tweet received from users
related to the scrapped tweets can be extracted as following command lines:
- df['len'] = np.array([len(tweet.full_text) for tweet in public_tweets])
- df['ID'] = np.array([tweet.id for tweet in public_tweets])
- df['Date'] = np.array([tweet.created_at for tweet in public_tweets])
- df['Source'] = np.array([tweet.source for tweet in public_tweets])
- df['Likes'] = np.array([tweet.favorite_count for tweet in public_tweets])
- df['RTs'] = np.array([tweet.retweet_count for tweet in public_tweets])
- display(df.head(10))
Congratulations!! We have now scrapped real time 200 "bitcoin" related tweets. Now its time to store the data frame in CSV file for further usage.
- df.to_csv('bitcointweetsscraping.csv')
With this we have came to an end with Web Scraping of Twitter Data with Python. Hope you have enjoyed learning this skill and will use it further to enhance it. #happyupskilling
Hi Dhruv, Congratulations on your excellent success and good luck for more progress. Thank you.
Thanks for sharing it's very helpful and will try this ????
Delivery Manager | Aspiring Data Scientist | Deep Learning | Oracle
3 年Good work Dhruv...keep going...