Scrapping Twitter Data with Tweepy API.

No alt text provided for this image
 Python library called tweepy (https://www.tweepy.org/) to access tweets in real-time.

Now you have to create a Twitter account if you don’t already have one. Applying for a developer account is an easy process and the application asks a few basic questions like the reason for the application, etc. And the approval of the developer account generally takes 2-3 days.

Step 1: Click here ( https://developer.twitter.com/ )

Step 2: Click “Apply”

Step 3: Click “Apply for a developer account

No alt text provided for this image

Step 4: Login with your Twitter Account

Step 5: Fill the “form”

Note: Don't forget to check your inbox as you will be receiving approval email.

No alt text provided for this image

After creating the app, access the credentials:

No alt text provided for this image

Congrats!! You have now created a Twitter App and can now own the credentials as well.

No alt text provided for this image

Lets move to next step now...

In your Python Notebook install tweepy library - pip install tweepy

Use the following commands to get started with scraping of twitter data

import tweepy
import pandas as pd
import numpy as np

Now you will be needing all four authentication keys to access twitter API. This is done to connect as:

#OAth Handler or jump server/ reverse proxy server

  • consumer_key = "70**************wH"
  • consumer_key_secret = "Mk**********************************************hr"

#From proxy server we need to connect

  • access_token = "13******************************5l"
  • access_token_secret = "xR*************************************Ke"

Now the next step is to be connected to jump server of twitter, this can be done with the following command line:

  • auth = tweepy.OAuthHandler(consumer_key, consumer_key_secret)

As with the above command, jump server has been connected, the next step is to connect the jump server of twitter with web server of twitter, as follow:

  • auth.set_access_token(access_token, access_token_secret)

Kudos!! You have done great. Now we can easily connect to API Strong Server of Twitter

  • api = tweepy.API(auth)

In next step we will taking example of scraping tweets of cryptocurrency (bitcoin) in language mode as English, with taking count of total tweets as 200. Note: you can change the inputs as per your preference.

  • public_tweets = api.search('bitcoin', count=200, lang='en', tweet_mode='extended')

#Will now create a pandas dataframe as follows:

  • df = pd.DataFrame(data=[tweet.full_text for tweet in public_tweets], columns=['original_Tweets'])
  • df.head(10)
No alt text provided for this image

Other relevant information such as

Length of each tweet
ID of individual tweet
Date when a particular tweet has been tweeted
Source of the tweet
Number of Likes on a particular tweet
Re-Tweets any individual tweet received from users

related to the scrapped tweets can be extracted as following command lines:

  • df['len'] = np.array([len(tweet.full_text) for tweet in public_tweets])
  • df['ID']  = np.array([tweet.id for tweet in public_tweets])
  • df['Date'] = np.array([tweet.created_at for tweet in public_tweets])
  • df['Source'] = np.array([tweet.source for tweet in public_tweets])
  • df['Likes'] = np.array([tweet.favorite_count for tweet in public_tweets])
  • df['RTs']  = np.array([tweet.retweet_count for tweet in public_tweets])
  • display(df.head(10))
No alt text provided for this image

Congratulations!! We have now scrapped real time 200 "bitcoin" related tweets. Now its time to store the data frame in CSV file for further usage.

  • df.to_csv('bitcointweetsscraping.csv')

With this we have came to an end with Web Scraping of Twitter Data with Python. Hope you have enjoyed learning this skill and will use it further to enhance it. #happyupskilling

No alt text provided for this image







Hi Dhruv, Congratulations on your excellent success and good luck for more progress. Thank you.

Thanks for sharing it's very helpful and will try this ????

Prathish Kumar Arunachalam

Delivery Manager | Aspiring Data Scientist | Deep Learning | Oracle

3 年

Good work Dhruv...keep going...

要查看或添加评论,请登录

Dhruv Bhatia的更多文章

  • Setup Autoscaling

    Setup Autoscaling

    In real life you're load on the websites can change over time. So for example, say your users are doing shopping…

    2 条评论
  • REGex Software Services - Master Class on Git & GitHub

    REGex Software Services - Master Class on Git & GitHub

    Name: DHRUV BHATIA College Name: International Institute of Information and Technology, Bangalore Summary: GitHub is a…

    6 条评论
  • From Being A Sales Rep To Being A Data Scientist.

    From Being A Sales Rep To Being A Data Scientist.

    Having a career transition from Business Developer /Sales Specialist to Data Scientist. Being in Business Development…

    37 条评论

社区洞察

其他会员也浏览了