Plotting and Visualisation: Matplotlib Basics
Are your charts feeling a little tired?

Plotting and Visualisation: Matplotlib Basics

I don’t know about you but I have been using the same types of plots in the past few months. I thought that this would be a great opportunity to level up my own skills and share some knowledge.

This is an interactive tutorial so I advise you to follow along, although you can read this to the very end without opening your favorite IDE. Albeit at the cost of brain gains.

If you would like more information about graphing and plotting all the information here can be found in ‘Python for Data Analysis’ written by Wes McKinney.

Initialise your environment and make sure you import random, NumPy and Matplotlib.

You will create the most basic graph ever by using the following code:

The ‘data’ variable now contains an array of numbers between 0-9.
Plotting the array inside of the ‘data’ variable yield the above plot.

This is a great start but you can do so much more with Matplotlib.

Initialising the following will create a figure object with the specified size.

It is important to note that the figure object needs to be initialised in the same code block as your actually plots to display anything.

Before looking below, can you infer what the arguments being passed to .add_subplot() might do?

The first two numbers determine the number of rows and columns and the last number will determine the subplots position. The position iterates through the rows first and then columns.

Seeing this figure made up of 3 subplots might make the code a lot easier to understand.

Let’s add some actual plots now and see how this affects the figure object.

Remember what I said previously about figure objects NEEDING to be initialised in each block where subplots exist.

Since we only specify one subplot here, we actually only get a single subplot.

If you execute the third line without the figure object or subplot object, an array is returned.

Another thing to consider here is that when we add several subplots inside a single figure object. The axes can get quite messy. This is where the ‘sharey’ and ‘sharex’ arguments can help us.

You are now able to access each individual subplot using indexing AND the figure object is much tidier.

There are more things we can do to these subplots to increase readability and customise the way it is rendered.

This completely eliminates any height and width since we set it to 0.

This presents us with another issue which is overlapping xticklabels. You will deal with this shortly but let’s add some data first.

We now have 4 subplots that look very ugly.

4 ugly plots

If you look carefully, the axis are not overlapping again even though we added whitespace.

Why is this happening?

You have actually introduced some redundancy since we have asked Python to create two sets of 2*2 matrices which equates to 8 total graphs.

Although, only 4 of those graphs have any actual data.

Here is how you overcome that issue.

Notice that we are accessing the axes variable using array indexing and we have also declared colors using hex colors too.

Alpha tells Python how much opacity to add. The lower this number between 0 and 1, the lower the opacity.

You have also somehow fixed the overlapping axes? How did that happen when we did not specifically address this.

Using a smaller figsize will generally result in more overlapping labels.

This is not always the case but something to consider when using 0 hspace or wspace and sharing axes.

To further demonstrate array indexing and limit the data being displayed in our subplots we will add titles and set a limit on our x-axis using ‘set_xlim()’

You will now be able to see, with further clarity, which array indexing is associated with which title and hopefully notice that we can use ‘.set_xlim()’ to truncate our axes.

Extremely useful when you have trailing distributions and do not want to remove data, but only highlight a segment. (I wish I knew about this earlier)

Those of you with keen eyes will notice that axes[1, 1] has a number of new shiny declarations that change how the graph looks.

Before this, I declared the following:

This allows us to have the same ‘random’ each time we run. Not absolutely necessary but I did not like my graphs jumping everywhere.

Technically, you need an array of 30 numbers and even something like this will work:

data = np.arange(30)

Here is the rendered subplots with the fancy linestyle and markers.

The 4th subplot has the following three things I want to draw attention to.

linestyle, which tells Python how to generate the line plot itself.

marker, which tells Python how to mark each individual data point so we can see where they sit on the line.

drawstyle, tells Python how to render the line between each data point.

We can get some goofy things like this when we play with it:

It accepts the following:

'default', 'steps-mid', 'steps-pre', 'steps-post', 'steps'

I dont know why this exists besides some applied math use cases. If you work in Data and have used this before, I would love to hear from you about this.

The next thing I want to show you is that we can completely change the values on the xticklabels to anything we want as long as we map them to real values.

Here is an example:

And the corresponding graph:

If we did decide to take out an item from either list, Python will generate an error.

Penultimately, I want to show you how to plot multiple graphs on a single graph.

It is actually very easy compared to what we just did.

The first two values being passed to ax.text() are the x and y positions that correspond to the graph.

Hopefully you should be able to figure out how that works when looking at this.

I also know that you can add shapes to your graph.

Why you would want to do this is beyond me but alas, I am sure someone has a very legit reason for doing so.

Take a look online for some very creative applications. Someone even made the firefox logo.

I digress, the actual code is easy to implement:

We determine the shape and then pass it to add_path().

This now renders the following:

I hope you found this useful, next week we will be looking at Seaborn and Pandas.


Krishna Kaushik

On a Journey to ML Mastery ???? | Eager to Explore Generative AI ?? | Open Source Contributor @GSSOC'24 ?? | AI & Data Science Devotee ??

6 个月

Thanks for sharing ??

要查看或添加评论,请登录

Kasim Ali ??的更多文章

  • Escape the 'freak-off' and become 1 in 8 billion

    Escape the 'freak-off' and become 1 in 8 billion

    1. The Monty Hall Problem: Can You Escape P-Diddy’s Freak-Off? The Monty Hall Problem is a classic probability puzzle…

  • Get Paid and Eat Your Cake Too?

    Get Paid and Eat Your Cake Too?

    In this week's newsletter, I want to address a few things as well as thank you all. This week I received well over…

  • datawithkasim #12

    datawithkasim #12

    This week's edition is all about A/B testing, hypothesis tests, derivatives and limits. This week’s datawithkasim…

  • datawithkasim #11

    datawithkasim #11

    These past two weeks I have had two main thoughts continue to come back to me. How do I level up my coding ability and…

  • Essential Math: Types of Numbers

    Essential Math: Types of Numbers

    I have been studying math at the Open university for quite some time now and I had my reservations about studying at an…

  • Expect the unexpected with Error Handling.

    Expect the unexpected with Error Handling.

    I wish I had known about error handling when I wrote my first text-based dungeon crawler. It would have made taking the…

  • Mastering Data Types

    Mastering Data Types

    You might be thinking that I couldn't have picked a more boring subject that data types to write about. I am inclined…

  • qualfications != skill

    qualfications != skill

    When I began my Data Science journey in 2022, I was teaching English, Math, and Science in South Korea. My students…

  • Unlocking the Data Science Path: My Journey and Your Roadmap

    Unlocking the Data Science Path: My Journey and Your Roadmap

    I have noticed that my posts specifically related to Data Science get several times the impression that my other posts…

    2 条评论
  • Fixing your environment.

    Fixing your environment.

    If you are here because you read my previous newsletter on building a guided ML project. There are a few things you…

社区洞察

其他会员也浏览了