Getting Started with Apache Zeppelin with Airbnb
Link to the full blog post here.
I've been playing around with Apache Zeppelin for a few months now (not so much playing as just frustration initially to get it working). After consistently using it a bit, I find it incredibly useful for data visualization and business intelligence purposes.
Apache Zeppelin is self described as "a web-based notebook that enables interactive data analytics". Imagine it as an IPython notebook for interactive visualizations but supporting more languages than just Python to munge your data for visualization. Ultimately after getting Pyspark working on it, I find it incredibly useful for displaying business data and analytics. Right now it only has a couple graph options which include bar graphs, line graphs, pie charts, and scatter plots. Currently it's also in incubation mode at Apache and open-sourced!
On a business and company level, I've found that it is probably the best way to introduce a new visualization tool when the interpreter language can be written in Spark SQL. SQL queries come naturally to all analysts and most product managers, so this can potentially introduce everyone to creating their own data visualizations if the data is loaded and formatted accordingly. Therefore anyone who knows SQL now can play around with visualizations with a lot more ease.
Ultimately it looks and functions a bit like Tableau minus the cost of thousands of dollars for a Tableau license. Yet for Zeppelin, as the data scales, hopefully the speed and functionality of Zeppelin scale linearly when running it on a cluster with Spark. I believe the end goal is to run huge amounts of data through it and potentially visualize billions of data-points with Spark doing most of the heavy lifting.
Well how does it work?! I'll show a quick demo of the install and then some initial code. Here's the link to the current Zeppelin github repo.
Read more at the full blog post here