Spark RDDs Vs DataFrames vs SparkSQL - Part 3: Web Server Log Analysis
This is the third tutorial on the Spark RDDs Vs DataFrames vs SparkSQL blog post series. The first one is available here and the second one is here. In the first part, we saw how to retrieve, sort and filter data. In the second part, on the other hand, we saw how to work with multiple tables. In this tutorial, we will see how to analyze web server log . If you like this tutorial series, check also my other recent blog posts on Spark on Analyzing the Bible and the Quran using Spark and Spark DataFrames: Exploring Chicago Crimes. The data and the notebooks can be downloaded from my GitHub repository.
Article for this blog post is available here.
All five parts, more than 100 pages, are available in pdf format here