To plot and what to plot?
Manu Nellutla
Manager, Digital Integration Architect @ KPMG US | Leading Digital Transformation efforts
As a green data science enthusiast, I try to visually depict my data interpretations a bit more. There are a lot of plots to support a story. Then how do you know what to use and when?
What sort of plots tells what story? I am still trying to find my footing here. While doing a Kaggle mini-course on Data Visualization, I came across a nice little blurb to start my notes on when to use what plots. The plots here used are from "seaborn" library. But the emphasis is on - what kind of charts means what?
This is by no means a complete list or absolute one. But a good starting point.
From: https://www.kaggle.com/alexisbcook/choosing-plot-types-and-custom-styles
Since it's not always easy to decide how to best tell the story behind your data, we've broken the chart types into three broad categories to help with this.
Trends - A trend is defined as a pattern of change.
- sns.lineplot - Line charts are best to show trends over a period of time, and multiple lines can be used to show trends in more than one group.
Relationship - There are many different chart types that you can use to understand relationships between variables in your data.
- sns.barplot - Bar charts are useful for comparing quantities corresponding to different groups.
- sns.heatmap - Heatmaps can be used to find color-coded patterns in tables of numbers.
- sns.scatterplot - Scatter plots show the relationship between two continuous variables; if color-coded, we can also show the relationship with a third categorical variable.
- sns.regplot - Including a regression line in the scatter plot makes it easier to see any linear relationship between two variables.
- sns.lmplot - This command is useful for drawing multiple regression lines, if the scatter plot contains multiple, color-coded groups.
- sns.swarmplot - Categorical scatter plots show the relationship between a continuous variable and a categorical variable.
Distribution - We visualize distributions to show the possible values that we can expect to see in a variable, along with how likely they are.
- sns.distplot - Histograms show the distribution of a single numerical variable.
- sns.kdeplot - KDE plots (or 2D KDE plots) show an estimated, smooth distribution of a single numerical variable (or two numerical variables).
- sns.jointplot - This command is useful for simultaneously displaying a 2D KDE plot with the corresponding KDE plots for each individual variable.
I would greatly appreciate any good suggestions and constructive feedback.