Create Spark Streaming Receiver
Nihed MBAREK
Senior Solutions Architect specializing in big data implementation and profitability
This is an example where I'm trying to collect streaming data that is not supported by default on Spark Solution. My example is relative to Wikimedia EventStreams that is based on Server-Sent Events (SSE) protocol, for more details https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams
Based on the documentation, we can implement that using a class that extend Receiver https://spark.apache.org/docs/2.4.2/api/java/org/apache/spark/streaming/receiver/Receiver.html
As a basic example code relative to our project, we have
On our implementation, we need to implement:
- A constructor that define the storage level
- onStart, that will have all the logic required to extract and publish the information. Publication is done used the method "store"
- onStop, in case if there is a requirement to clean/stop something (JDBC connection, Network connection, ...) .
The example to use our receiver is the following
An example where we are processing RDD to get some statistics on the wiki that have been updated
The result will be something like this for 1 second window
All source code is available on this link https://github.com/nihed/SparkStreamingWiki/blob/master/src/main/java/com/nihed/App.java
Post-doctoral researcher at the Litis laboratory - INSA de Rouen
4 年Thank you, i'm interested. i'll see it in detail.
Senior Solutions Architect specializing in big data implementation and profitability
4 年Yamen Bousrih ?a te concerne ?a :D