Create Spark Streaming Receiver

Create Spark Streaming Receiver

This is an example where I'm trying to collect streaming data that is not supported by default on Spark Solution. My example is relative to Wikimedia EventStreams that is based on Server-Sent Events (SSE) protocol, for more details https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams

Based on the documentation, we can implement that using a class that extend Receiver https://spark.apache.org/docs/2.4.2/api/java/org/apache/spark/streaming/receiver/Receiver.html

As a basic example code relative to our project, we have

Aucun texte alternatif pour cette image

On our implementation, we need to implement:

  • A constructor that define the storage level
  • onStart, that will have all the logic required to extract and publish the information. Publication is done used the method "store"
  • onStop, in case if there is a requirement to clean/stop something (JDBC connection, Network connection, ...) .

The example to use our receiver is the following

Aucun texte alternatif pour cette image

An example where we are processing RDD to get some statistics on the wiki that have been updated

Aucun texte alternatif pour cette image

The result will be something like this for 1 second window

Aucun texte alternatif pour cette image

All source code is available on this link https://github.com/nihed/SparkStreamingWiki/blob/master/src/main/java/com/nihed/App.java


Asma Dhaouadi, PHD

Post-doctoral researcher at the Litis laboratory - INSA de Rouen

4 年

Thank you, i'm interested. i'll see it in detail.

Nihed MBAREK

Senior Solutions Architect specializing in big data implementation and profitability

4 年

Yamen Bousrih ?a te concerne ?a :D

要查看或添加评论,请登录

Nihed MBAREK的更多文章

社区洞察

其他会员也浏览了