登录查看更多内容

Machine Learning(ML) - How to Find Mean, Mode and Median Without Using Python Library ??

Patrick Olumba

Software Engineer at Turing | Android Developer | Full Stack Developer | ML Engineer | IT | Financial Crime | AML

发布日期: 2024年3月25日

+ 关注

Hey yeah…. ?? Welcome to my space.

Today I want to show you how you can find the mean, mode or median without using any external python library.

In Machine Learning (and in mathematics) there are often three values that interests us:

Mean — The average value
Median — The mid point value
Mode — The most common value

Thank goodness! python libraries helps us to perform certain operations faster than we would normally write from scratch.

But hey yeah… Have you ever think about what is really going on behind the scene of this libraries?

What is going on behind the scene of numpy.mean()????

Think about libraries like numpy that can calculate the mean and medianand scipy that can calculate the mode with just few lines of code as well.

Lets begin…

Example: Below are the scores of students who participated in volleyball game last week.

studentScores = [44, 55, 22, 25, 13, 13, 23, 13, 56, 90, 39]

Find the average value of the students scores, the most common score and mid score.

MEAN

The mean value is the average value.

To calculate the mean, find the sum of all values, and divide the sum by the number of values:

Remember, we are doing this without external library…

def printMean(n):
    mean = 0 
    totalScore = 0 

    for score in range(len(n)):
        totalScore += n[score] 

        lenOfScores = len(n)
        mean = totalScore / lenOfScores

    return mean

def main():
    studentScores = [44, 55, 22, 25, 13, 13, 23, 13, 56, 90, 39]

    averageValue = printMean(studentScores)
    print('The average value of the student score is {}'.format(averageValue))
    
    // The average value of the student score is 35.72727272727273

# mean variable holds the average score

# totalScore variable holds the sum of each score

# Inside the for loop, each score from the list is added to totalScore.

# lenOfScores is holding the total number of student score (In this case, we have eleven scores)

# Finally, we divided the totalScore by the lenOfScores and then save the result to mean variable.

Median

The median value is the value in the middle. The list or array must be sorted to get accurate result.

It is easier to point out the median on a small dataset if the number of all data is odd.

Example:

millage = [49, 45, 89]

# After sorting the list.
[45, 49, 89]

# We can easily spot the median
[49]

What if the total number of dataset is even?

Let's take a look at another example.

Example:

领英推荐

Unveiling the Power of Python: Data Science and…

JMDA Analytic Pvt Ltd 1 年前

Introduction to NumPy

Rany ElHousieny, PhD??? 1 年前

Feature Engineering with Python, Data and ML Pipelines…

Rami Krispin 3 个月前

millage = [49, 45, 89, 100, 200, 99]

# After sorting the list.
[45, 49, 89, 99, 100, 200]

# To get the medium here, we have to get the two numbers in the middle.
# Add them and divide the sum by 2
# Median will be 25.0
[25.0]

Now lets create a function to do this calculation for us.

def printMedian(n):
    lenOfScores = len(n)

    if lenOfScores % 2 == 0:
        return n[lenOfScores]
    else:
        m1 = n[lenOfScores//2]
        m2 = n[lenOfScores//2 - 1]
        mid = (m1 + m2) / 2

        return mid


def main():
    studentScores = [44, 55, 22, 25, 13, 13, 23, 13, 56, 90, 39]

    mediumValue = printMedium(studentScores)
    print('The midium value of the student score is {}'.format(averageValue))
    
    // The medium value of the student score is 25.0

lenOfScores is holding the total number of student score (In this case, we have eleven scores)

Next, we use conditional statement to check if the length of the given dataset is even or odd…

If even, return the number at the middle of the dataset by dividing the list into halves.

if odd, get the first and second numbers from the middle of the dataset and save them to variable m1 and m2 respectively.

Sum m1 and m2 and then divide the total by 2, saving the result to the variable mid.

That’s it…..??

finally…

Mode

The mode is the number that occurs most often within a set of numbers.

For this its a little different as we will be import an inbuilt fuction to help us a little bit… LOL

We will import Counter from collections library which is a built-in module in Python 2 and 3. This module will help us count duplicate elements in a list.

def printMode(n):
    data = Counter(n)

    lenOfScores = len(n)
    get_mode = dict(data)

    mode = [k for k, v in get_mode.items() if v == max(list(data.values()))]

    if len(mode) == lenOfScores:
        get_mode = "No reoccurring scores found"
    else:
        get_mode = "The most occurring score is " + ', '.join(map(str, mode))

    return get_mode


def main():
    studentScores = [44, 55, 22, 25, 13, 13, 23, 13, 56, 90, 39]

    print(printMode(studentScores))
    
    
    # Result
    The most occurring score is 13

lenOfScores is holding the total number of student score (In this case, we have eleven scores)

Using the dict, we converted the data to dictionary and save it to get_mode.

We then initialize a list called mode with a For Loop to compare all the dict values (Number of scores) to the max of all dict values (count of most occurring scores) and it returns all the elements equal to max count.

If the elements returned are equal to the number of total elements in a list then we print out ‘No reoccurring scores found’, else we print out the modes returned.

?? Oh yes!

We are done performing some ML statistics without using external library like numpy and scipy.

I hope you learnt something from this.

I will leave y’all with one advice…..

[Learn the behind the scene sometimes…. it builds your proficiency and sharpens you programming algorithm skills.]

Don’t forget to follow, like and subscribe!

See ya next time ??

Patrick Olumba的更多文章

Love what you do: Here is the history of how Tellit came about.

2024年2月21日

Love what you do: Here is the history of how Tellit came about.

I am an app developer and I have recently launched a unique platform called ‘Tellit’ designed to empower writers…

Machine Learning(ML) - How to Find Mean, Mode and Median Without Using Python Library ??

Patrick Olumba

Software Engineer at Turing | Android Developer | Full Stack Developer | ML Engineer | IT | Financial Crime | AML

MEAN

Median

领英推荐

Mode

Patrick Olumba的更多文章

社区洞察

其他会员也浏览了

Exploring Linear Algebra with Python and NumPy

Modular Markov Chain Monte Carlo in Python

NaN, NaT and None - What's the difference?

The Rolling Hurst Exponent in Python (Trading)

Innovating Data Science with Python, Julia, and Rust

NumPy

NumPy (Python Library) Overview + Some code

???????? ?????????????? ?????????????? 5

Machine Learning 101 All Algorithms in python (Linear Regression)

Basics of NumPy

MEAN

Median

领英推荐

Mode

Patrick Olumba的更多文章

Love what you do: Here is the history of how Tellit came about.

社区洞察

其他会员也浏览了

Exploring Linear Algebra with Python and NumPy

Modular Markov Chain Monte Carlo in Python

NaN, NaT and None - What's the difference?

The Rolling Hurst Exponent in Python (Trading)

Innovating Data Science with Python, Julia, and Rust

NumPy

NumPy (Python Library) Overview + Some code

???????? ?????????????? ?????????????? 5

Machine Learning 101 All Algorithms in python (Linear Regression)

Basics of NumPy