Crypto Tweets Fetch using Flume & Hadoop (PRACTICAL)

Crypto Tweets Fetch using Flume & Hadoop (PRACTICAL)

Simran:?Hey! I am new to investing in cryptocurrency.

Me:?Nice! At least you started investing! That’s good!

Simran:?But I feel this crypto market highly depends on news, being from technical backgrounds, can we do something?

No alt text provided for this image

Me:?Yeah! Sure I guess we can do something. I have heard of Apache Flume which is an awesome application used for logging big data, we can analyze the tweets of Elon Musk?? and get something.

Simran:?That sounds interesting. Can you tell me in brief how I can also analyze them?

Me:?Sure! So let’s start!

Me:?So,?basically we will start streaming data from Twitter, in order to get tweets from Twitter, we will need set up a Twitter application, we need to pick keywords related to cryptocurrency Doge ??, and then we need to run Hadoop and Flume.

Simran:?As far as I remember from your last medium article on?Hadoop (WordCounter in Hadoop! (Windows PRACTICAL) | by Shubham Kumar Gupta | Jan, 2022 | Medium)?was to handle big data but what this Flume now?

Me:

Flume

Apache Flume is a distributed, reliable, and available software for efficiently collecting, aggregating and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows.

No alt text provided for this image

can stream live logs from different cloud sources like social platforms such as Facebook, Twitter, etc. These streamed data can be passed to Hive and Hadoop for further analysis.

No alt text provided for this image


Flume accepts data from a source and stores it in the channel. Reading speed is generally faster than writing speed, so we need a buffer to match the read-write pace. Then these data are passed and stored in hdfs.

Simran: Can you tell me how to do this straight away ??? Practically

Me: ???Sure! Let's start! first, let's create a Twitter Application

Twitter Application

No alt text provided for this image


i) So, First we need to visit?https://apps.twitter.com/

ii) We need to give the name and click Get Keys

iii) Now you will get?API_KEY,?API_KEY_SECRET, &?BEARER_ACCESS_TOKEN

iv) But you need some more, so let's click on setup OAuth, you can choose v2 and provide your description,?T&C URL,?privacy policy URL, Now you can click generate to get?ACCESS_TOKEN, and?ACCESS_TOKEN_SECRET.

v) Now, it may happen that tweets you are fetching is way more than the limits set by Twitter, so apply for?Elevated twitter developer

No alt text provided for this image

Me:?Cool! Now, Let’s see how to set up Apache Flume.

Flume Setup

i) Download Apache Flume : [DOWNLOAD LINK?]

ii) Extract the tar file?

tar -xvf flume.tar.gz        

?or using WinRAR

iii) Inside the conf folder, Rename?flume-conf.properties.template?to?flume-conf.properties

iv) Write this inside the?flume-conf.properties?file

v) we need to set a path?FLUME_HOME =D:\apache\flumeand, append to the path?D:\apache\flume\bin


	TwitterAgent.sources = Twitt
	TwitterAgent.channels = MemChannel
	TwitterAgent.sinks = HDFS
	TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
	TwitterAgent.sources.Twitter.channels = MemChannel

	TwitterAgent.sources.Twitter.consumerKey= API_KEY
	TwitterAgent.sources.Twitter.consumerSecret= API_SECRET_KEY
	TwitterAgent.sources.Twitter.accessToken= ACCESS_TOKEN
	TwitterAgent.sources.Twitter.accessTokenSecret= ACCESS_TOKEN_SECRET

	TwitterAgent.sources.Twitter.keywords= elon musk, doge, doge coin, bitcoin, crypto, forex, tesla, coin, rocket, ether, mining
	

	TwitterAgent.sinks.HDFS.channel = MemChannel
	TwitterAgent.sinks.HDFS.type = hdfs
	TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/flume_tweets
	TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
	TwitterAgent.sinks.HDFS.hdfs.writeformat=Text
	TwitterAgent.sinks.HDFS.hdfs.batchSize=1000
	TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600
	TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
	TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
	TwitterAgent.channels.MemChannel.type = memory
	TwitterAgent.channels.MemChannel.capacity = 10000
	TwitterAgent.channels.MemChannel.transactionCapacity = 1000
        

vi) Here, You can see in sources we mentioned Twitter, we named our channel as MemChannel, we mentioned jar file needed to be used, and put all tokens here.

vii) Now, we named our sink and put the path for the sink in HDFS, We set the type of output stream of data type to be text.

viii) We set the batch size(number of tweets that should be in a batch), capacity(number of events stored in the channel), and transaction capacity(number of events the channel accepts )

Now, to Fetch all tweets related to cryptocurrency Doge, we will be to use keywords like

TwitterAgent.sources.Twitter.keywords= elon musk, doge, doge coin, bitcoin, crypto, forex, tesla, coin, rocket, ether, mining

Simar:?Blah Blah! When we will get results? ??

Me: ??,?Not to worry we have to run it now.

Steps to run

i) Run?

start-all.cmd        

ii) From the terminal you have to just run this command (I'm in this location D:\apache\flume)

 bin\flume-ng agent --conf conf --conf-file conf/flume-conf.properties -property "flume.root.logger=INFO,console" -n TwitterAgent        

iii) Now we can go to the path which we set in flume-conf his path, i.e?flume_tweets?using command?

hdfs dfs -ls /flume_tweets        

we can see which files are there in this directory

iv) Now we can read using the cat command?

hdfs dfs -cat /flume_tweets/FlumeData        


No alt text provided for this image


Me:?Tada! We got our tweets! Now, let's move into analyzing them properly!

No alt text provided for this image


Now, we have tweets related to crypto, now we can analyze them using google NLP to get further information! So time for another medium blog till then,

Thank You for reading this!

Simran:?Thank you for this will wait for your next blog.

要查看或添加评论,请登录

Shubham Kumar Gupta的更多文章

社区洞察

其他会员也浏览了