Experimenting with Tableau

Experimenting with Tableau

A big part of the data science process is visualizing data. Visualizations are created when exploring a new dataset, visualizing experiments and results, and again later when explaining results to stakeholders. There are many tools to do this, but I've recently been experimenting with Tableau. 

Why visualize your data?

My favorite example is Anscombe's quartet, which visualizes four datasets that have nearly identical descriptive stats (mean, variance, linear regression, etc.) but are actually quite different.

My second favorite example is Minard's map of Napoleon's 1812 Russia campaign. The two-dimensional visualization shows 6 dimensions of data. Using this, you can very quickly see the devastating affect the cold weather had on Napoleon's army on their way to Moscow. Compare this to looking at an Excel sheet of 1000+ data points.

Here's the same visualization made 150 years later in D3.js.

Which visualization software should I use?

There's a lot of different software options and the answer depends on a lot of things. I think these three questions can help narrow down your search:

  • How flexible is the software? (or, can it do everything you need it to?)
  • How productive will you be using it? (or, how fast will your speed-to-insight be?)
  • What's your budget? (or, which offering returns the highest value after subtracting the TCO?)

Here's a pyramid summarizing the spectrum of data visualization tools:

My personal experience is limited to Excel and a project each using Tableau, R (ggplot2 library), and Python (Matplotlib and Seaborn).

Tell me about Tableau

I learned about Tableau as part of Udacity's Data Analyst Nanodegree program. Tableau's considered the 'cream of the crop' of data visualization tools. It started with three guys from Stanford and is now a 3500-person company on a mission to help people see and understand data.

After 3 days of using the software here are some of my observations:

Pros:

  • One can quickly explore data with a drag-and-drop interface with almost no training.
  • Visualizations are elegant, interactive, and easy to build on.
  • Basic data transformations can be done easily (joins, unions, string-splits, etc. Basically anything there's a Google Sheets function for).

Cons:

  • When I got stuck trying to do something very specific, it wasn't easy to find the answer via Google. This is compared to when I was using ggplot2 (an R library) and could find most answers I needed in under 10 seconds of Googling. That being said, there's extensive video documentation in the Tableau knowledge base, so maybe most answers are already there.
  • Tableau's not free. Depending on the organization's needs, the total cost of ownership can add up. However, the benefits might far outweigh the cost if using Tableau enables you to glean insights that might otherwise have taken longer.
  • It was a bit harder to keep track of my exploration as I went along compared to using an R-markdown document or a Jupyter notebook. This also means it might be harder for someone else to reproduce my work or walk through my thinking.

Show me what Tableau can do

A month ago I set about exploring peer-to-peer loan data from a company named Prosper using R and the ggplot2 library. I used the same dataset to play around in Tableau because I was already familiar with the variables and could quickly get to gleaning insights.

R + ggplot2: a baseline

Here's my favorite plot from my exploration in R:

The code for that looks like this:

ggplot(subset(loanData, ListingYear > 2005),
       aes(x= CreditScoreAverage,
           y=BorrowerRate/10)) + 
  geom_jitter(alpha=0.1, aes(color = LoanOriginalAmount)) + 
  scale_colour_gradientn(colours=rainbow(4),
                         name = 'Loan Amount ($)') +
  geom_smooth() + 
  coord_cartesian(xlim=c(500,850)) + 
  facet_wrap(~ListingYear) + 
  ggtitle("Borrower Rate by Credit Score, 
          showing loan amount, faceted by year") + 
  labs(x= "Average Credit Score", y= "Borrower Interest Rate") + 
  scale_y_continuous(labels = scales::percent) + 
  guides(fill=guide_legend(title="Loan Amount"))

Tableau: first attempt

With 3 hours of Tableau experience, I was able to create the following visualization, similar to the above, in 15 seconds of drag-and-dropping:

My favorite Tableau feature is showcasing data in interactive "stories". The purpose is that any reader can interact with the data and go deeper into the data without the creator needing to be present. My first shot at a Tableau story took me about an hour to make and includes 13 "sheets" of data inside 3 "dashboards". I hosted it on Tableau Public and you can interact with it here. Be warned, it's pretty ugly.

Tableau: second attempt

Afterward, I decided to polish up the story and focus on features highly correlated with lower borrower interest rates.

This attempt is far more visually appealing than the first. You can interact with the second story on Tableau Public, here.

Tableau's full potential

I've only been using Tableau a short time and haven't used it to the full extent of its abilities. Here's an example of what one could expect to create with a few more days of practice using Tableau:

You can interact with that story on ourworldindata.org. Try clicking a country you want to learn more about and clicking the 'play' button. Scroll through the publication for more visualizations sans interactivity.

Conclusion

Data visualization is an important part of understanding your data and gleaning insights from it. There are many ways to visualize data, including using software like Tableau. No matter what you use, remember that data by itself is far less valuable than the actionable insights that you pull out of it and visualizing data is an easy way to pull out actionable insights.

要查看或添加评论,请登录

Mark Meleka的更多文章

社区洞察

其他会员也浏览了