Machine Learning From Scratch [Part 2]

Machine Learning From Scratch [Part 2]

This is part two of Machine Learning from Scratch. You're about to follow a straight forward and short tutorial about plotting a technical bar chart using Python, Pyplot, and a statistics tool called Decile.

This article, along with many others, is also available in my personal blog:

https://brunocamps.com/2019/11/machine-learning-from-scratch-part-2/

In this lesson, you'll learn how to:

  • Work with collections library and Counter module
  • Work with bucketed lists and deciles
  • Plot bar charts at an advanced level with histograms
  • Generate a line chart (X and Y axis) from the lists
  • Generate a bar chart

We'll keep studying data visualization with Pyplot. Visualizing data is a good part of a data scientist or machine learning engineer. The data itself is not that valuable - we must be smart enough to analyze it and display in an understandable way.

As we've seen in part 1, Pyplot is an easy and fast library to plot your data, but it certainly has its limitations.

Now, let's jump straight into our next task.

Let's now declare a list of grades that will be our data object this time and also import the Counter module from the Collections library.

from collections import Counter
grades = [83, 95, 91, 87, 70, 0, 85, 82, 100, 67, 73, 77, 0]

Also, we need to import Pyplot. Assuming that you're using the jupyter notebook from the previous lesson, you just need to run the cell where you imported the module.

Now, let's declare our histogram using Counter. Let's bucket all grades by decile and put 100 with the 90s. Also, let's print our histogram variable and check out its content.

A decile is a descriptive statistics' concept which "is any of the nine values that divide the sorted data into ten equal parts so that each part represents 1/10 of the sample or population".

To determine our decile from the grades, we'll use the Counter, which is a dict subclass for counting hashable items. It returns its elements as dictionary values.

#Bucket grades by decile, but put 100 in with the 90s
histogram = Counter(min(grade // 10 * 10, 90) for grade in grades)
print(histogram)

We want the minimum value of the iteration (grade // 10 * 10, 90). We're using // to return only the integer of the division.

You've probably observed the output of our histogram:

Counter({80: 4, 90: 3, 70: 3, 0: 2, 60: 1})

That is what a decile looks like.

Now, let's print our histogram and see what it looks like.

plt.bar([x + 5 for x in histogram.keys()],
       #Shift bar right by 5
       histogram.values(),
       #give each bar its correct height
       10,
       #Give each bar a width of 10
       edgecolor=(0, 0, 0))

#x-axis from -5 to 105
#y-axis from 0 to 5
plt.axis([-5, 105, 0, 5])

plt.xticks([10 * i for i in range(11)])
#x-axis labels at 0, 10, ..., 100
plt.xlabel("Decile")
plt.ylabel("# of Students")
plt.title("Distribution of Exam 1 Grades")
plt.show()

That's how our distribution of the grades will look like:

N?o foi fornecido texto alternativo para esta imagem

Statistics play a significant role in machine learning. Sometimes, pure statistics will satisfy your project's objective. There is a huge discussion about whether statistics tools are machine learning or not - and that's merely a discussion.

We should be concerned about objective goals for our machine learning projects - no matter how you call it (AI, Data Science, Statistics…). It doesn't matter if you're running a basic linear regression or a hardcore deep learning framework, you must deliver practical results.

By the end of this article, you've had more contact with Python handling data and visual demonstrations using Pyplot. In the next article (Part 3), we'll jump into Numpy, which is widely used for numerical computing.

If you missed Part 1, here it is:

https://www.dhirubhai.net/pulse/machine-learning-from-scratch-part-1-bruno-campos/


要查看或添加评论,请登录

Bruno C.的更多文章

  • Startups should be aggressive

    Startups should be aggressive

    Yes, startups should have a decent amount of pressure to succeed. This article is featured in my newsletter:…

    1 条评论
  • WWDC 2023 – Intro to Spatial Computing

    WWDC 2023 – Intro to Spatial Computing

    Here we are, learning more about spatial computing! You can check the video below: https://developer.apple.

  • Making sense of WWDC 2023

    Making sense of WWDC 2023

    I’ve been a longtime enthusiast of the Apple platform and the turning point for me was when Steve Jobs launched the…

  • The next software frontier

    The next software frontier

    To understand this topic, we must first take a step back. We’ve been living in the mobile app world for more than a…

  • Precisamos estar prontos para ciberataques

    Precisamos estar prontos para ciberataques

    Eu penso que o estado da ciberseguran?a é bem similar ao estado de vinte anos atrás: · Os “sistemas seguros” est?o…

  • Machine Learning - Part One

    Machine Learning - Part One

    About this article: you'll read about concepts and applications of Machine Learning. I will present you the building…

  • Machine Learning From Scratch [Part 1]

    Machine Learning From Scratch [Part 1]

    This is part one of Machine Learning from Scratch In this lesson, you'll learn how to: Import a module from a bigger…

  • A Hands-On Approach to Machine Learning (part 1)

    A Hands-On Approach to Machine Learning (part 1)

    You can also read this article in my personal blog!…

    1 条评论
  • How To Handle Meetings

    How To Handle Meetings

    In most cases, it's not always the most popular person who gets the job done. From all my experiences in the business…

    1 条评论
  • Bad Players Will Be Thrown Away

    Bad Players Will Be Thrown Away

    The internet and social media set new standards in the whole commercial process. People are more likely to buy from…

社区洞察

其他会员也浏览了