What is Matplotlib?
Tushar Mittal
Engagement Manager | Customer Success | BI & Analytics Expert | Strategy & Consulting | Driving Growth for Fortune 500?? | Ex-BCG, McKinsey | H1B ready
Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+. One of its greatest benefits is that it allows us visual access to huge amounts of data in easily digestible visuals. Matplotlib consists of several plots like line, bar, scatter, histogram, etc.
- Histogram: Matplotlib can be used to create histograms. A histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a set of continuous data. This allows the inspection of the data for its underlying distribution (e.g., normal distribution), outliers, skewness, etc. Usually, it has bins, where every bin has a minimum and maximum value. Each bin also has a frequency between x and infinite.
#!/usr/bin/env python import numpy as np import matplotlib.mlab as mlab import matplotlib.pyplot as plt # example data mu = 100 # mean of distribution sigma = 15 # standard deviation of distribution x = mu + sigma * np.random.randn(10000) num_bins = 20 # the histogram of the data n, bins, patches = plt.hist(x, num_bins, normed=1, facecolor='blue', alpha=0.5) # add a 'best fit' line y = mlab.normpdf(bins, mu, sigma) plt.plot(bins, y, 'r--') plt.xlabel('Smarts') plt.ylabel('Probability') plt.title(r'Histogram of IQ: $\mu=100$, $\sigma=15$') # Tweak spacing to prevent clipping of ylabel plt.subplots_adjust(left=0.15) plt.show()
- Bar Chart: Matplotlib can be used to create a bar chart. A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A bar graph shows comparisons among discrete categories. One axis of the chart shows the specific categories being compared, and the other axis represents a measured value. Some bar graphs present bars clustered in groups of more than one, showing the values of more than one measured variable.
import matplotlib.pyplot as plt; plt.rcdefaults() import numpy as np import matplotlib.pyplot as plt objects = ('Python', 'C++', 'Java', 'Perl', 'Scala', 'Lisp') y_pos = np.arange(len(objects)) performance = [10,8,6,4,2,1] plt.bar(y_pos, performance, align='center', alpha=0.5) plt.xticks(y_pos, objects) plt.ylabel('Usage') plt.title('Programming language usage') plt.show()
Matplotlib charts can be horizontal, to create a horizontal bar chart:
import matplotlib.pyplot as plt; plt.rcdefaults() import numpy as np import matplotlib.pyplot as plt objects = ('Python', 'C++', 'Java', 'Perl', 'Scala', 'Lisp') y_pos = np.arange(len(objects)) performance = [10,8,6,4,2,1] plt.barh(y_pos, performance, align='center', alpha=0.5) plt.yticks(y_pos, objects) plt.xlabel('Usage') plt.title('Programming language usage') plt.show()
You can compare two data series using this Matplotlib code:
import numpy as np import matplotlib.pyplot as plt # data to plot n_groups = 4 means_frank = (90, 55, 40, 65) means_guido = (85, 62, 54, 20) # create plot fig, ax = plt.subplots() index = np.arange(n_groups) bar_width = 0.35 opacity = 0.8 rects1 = plt.bar(index, means_frank, bar_width, alpha=opacity, color='b', label='Frank') rects2 = plt.bar(index + bar_width, means_guido, bar_width, alpha=opacity, color='g', label='Guido') plt.xlabel('Person') plt.ylabel('Scores') plt.title('Scores by person') plt.xticks(index + bar_width, ('A', 'B', 'C', 'D')) plt.legend() plt.tight_layout() plt.show()
- ScatterPlot: A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram) is a type of plot or mathematical diagram using cartesian coordinates to display values for typically two variables for a dataset. If the points are coded (color/shape/size), one additional variable can be displayed. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.
Matplotlib has a built-in function to create scatterplots called scatter(). The position of a point depends on its two-dimensional value, where each value is a position on either the horizontal or vertical dimension.
import numpy as np import matplotlib.pyplot as plt # Create data N = 500 x = np.random.rand(N) y = np.random.rand(N) colors = (0,0,0) area = np.pi*3 # Plot plt.scatter(x, y, s=area, c=colors, alpha=0.5) plt.title('Scatter plot pythonspot.com') plt.xlabel('x') plt.ylabel('y') plt.show()
Data can be classified into several groups. The code below demonstrates:
import numpy as np import matplotlib.pyplot as plt # Create data N = 60 g1 = (0.6 + 0.6 * np.random.rand(N), np.random.rand(N)) g2 = (0.4+0.3 * np.random.rand(N), 0.5*np.random.rand(N)) g3 = (0.3*np.random.rand(N),0.3*np.random.rand(N)) data = (g1, g2, g3) colors = ("red", "green", "blue") groups = ("coffee", "tea", "water") # Create plot fig = plt.figure() ax = fig.add_subplot(1, 1, 1, axisbg="1.0") for data, color, group in zip(data, colors, groups): x, y = data ax.scatter(x, y, alpha=0.8, c=color, edgecolors='none', s=30, label=group) plt.title('Matplot scatter plot') plt.legend(loc=2) plt.show()
- Line Plot: A line chart or line plot or line graph or curve chart is a type of chart that displays information as a series of data points called 'markers' connected by straight line segments. It is a basic type of chart common in many fields. It is similar to a scatter plot except that the measurement points are ordered (typically by their x-axis value) and joined with straight line segments. A line chart is often used to visualize a trend in data over intervals of time – a time series – thus the line is often drawn chronologically.
A line chart can be created using the Matplotlib plot() function. While we can just plot a line, we are not limited to that. We can explicitly define the grid, the x-axis and y-axis scale and labels, title and display options.
from pylab import * t = arange(0.0, 2.0, 0.01) s = sin(2.5*pi*t) plot(t, s) xlabel('time (s)') ylabel('voltage (mV)') title('Sine Wave') grid(True) show()
If you want to plot using an array (list), you can execute this script:
from pylab import * t = arange(0.0, 20.0, 1) s = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] plot(t, s) xlabel('Item (s)') ylabel('Value') title('Python Line Chart: Plotting numbers') grid(True) show()
The statement: t = arange(0.0, 20.0, 1), defines start from 0, plot 20 items (length of our array) with steps of 1.
If you want to plot multiple lines in one chart, simply call the plot() function multiple times.
from pylab import * t = arange(0.0, 20.0, 1) s = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] s2 = [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23] plot(t, s) plot(t, s2) xlabel('Item (s)') ylabel('Value') title('Python Line Chart: Plotting numbers') grid(True) show()
In case you want to plot them in different views in the same window you can use the following:
import matplotlib.pyplot as plt from pylab import * t = arange(0.0, 20.0, 1) s = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] s2 = [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23] plt.subplot(2, 1, 1) plt.plot(t, s) plt.ylabel('Value') plt.title('First chart') plt.grid(True) plt.subplot(2, 1, 2) plt.plot(t, s2) plt.xlabel('Item (s)') plt.ylabel('Value') plt.title('Second chart') plt.grid(True) plt.show()
The plt.subplot() statement is key here. The subplot() command specifies numrows, numcols and fignum.
If you want thick lines or set the color, use: plot(t, s, color="red", linewidth=2.5, linestyle="-")