Machine Learning(ML) - How to Find Mean, Mode and Median Without Using Python Library ??

Machine Learning(ML) - How to Find Mean, Mode and Median Without Using Python Library ??

Hey yeah…. ?? Welcome to my space.

Today I want to show you how you can find the mean, mode or median without using any external python library.


In Machine Learning (and in mathematics) there are often three values that interests us:

  • Mean — The average value
  • Median — The mid point value
  • Mode — The most common value

Thank goodness! python libraries helps us to perform certain operations faster than we would normally write from scratch.

But hey yeah… Have you ever think about what is really going on behind the scene of this libraries?


What is going on behind the scene of numpy.mean()????

Think about libraries like numpy that can calculate the mean and medianand scipy that can calculate the mode with just few lines of code as well.



Lets begin…


Example: Below are the scores of students who participated in volleyball game last week.

studentScores = [44, 55, 22, 25, 13, 13, 23, 13, 56, 90, 39]

Find the average value of the students scores, the most common score and mid score.

MEAN

The mean value is the average value.

To calculate the mean, find the sum of all values, and divide the sum by the number of values:

Remember, we are doing this without external library…

def printMean(n):
    mean = 0 
    totalScore = 0 

    for score in range(len(n)):
        totalScore += n[score] 

        lenOfScores = len(n)
        mean = totalScore / lenOfScores

    return mean

def main():
    studentScores = [44, 55, 22, 25, 13, 13, 23, 13, 56, 90, 39]

    averageValue = printMean(studentScores)
    print('The average value of the student score is {}'.format(averageValue))
    
    // The average value of the student score is 35.72727272727273        

# mean variable holds the average score

# totalScore variable holds the sum of each score

# Inside the for loop, each score from the list is added to totalScore.

# lenOfScores is holding the total number of student score (In this case, we have eleven scores)

# Finally, we divided the totalScore by the lenOfScores and then save the result to mean variable.


Median

The median value is the value in the middle. The list or array must be sorted to get accurate result.

It is easier to point out the median on a small dataset if the number of all data is odd.

Example:

millage = [49, 45, 89]

# After sorting the list.
[45, 49, 89]

# We can easily spot the median
[49]        


What if the total number of dataset is even?

Let's take a look at another example.

Example:

millage = [49, 45, 89, 100, 200, 99]

# After sorting the list.
[45, 49, 89, 99, 100, 200]

# To get the medium here, we have to get the two numbers in the middle.
# Add them and divide the sum by 2
# Median will be 25.0
[25.0]        


Now lets create a function to do this calculation for us.

def printMedian(n):
    lenOfScores = len(n)

    if lenOfScores % 2 == 0:
        return n[lenOfScores]
    else:
        m1 = n[lenOfScores//2]
        m2 = n[lenOfScores//2 - 1]
        mid = (m1 + m2) / 2

        return mid


def main():
    studentScores = [44, 55, 22, 25, 13, 13, 23, 13, 56, 90, 39]

    mediumValue = printMedium(studentScores)
    print('The midium value of the student score is {}'.format(averageValue))
    
    // The medium value of the student score is 25.0        

lenOfScores is holding the total number of student score (In this case, we have eleven scores)

Next, we use conditional statement to check if the length of the given dataset is even or odd…

If even, return the number at the middle of the dataset by dividing the list into halves.

if odd, get the first and second numbers from the middle of the dataset and save them to variable m1 and m2 respectively.

Sum m1 and m2 and then divide the total by 2, saving the result to the variable mid.


That’s it…..??

finally…


Mode

The mode is the number that occurs most often within a set of numbers.

For this its a little different as we will be import an inbuilt fuction to help us a little bit… LOL

We will import Counter from collections library which is a built-in module in Python 2 and 3. This module will help us count duplicate elements in a list.

def printMode(n):
    data = Counter(n)

    lenOfScores = len(n)
    get_mode = dict(data)

    mode = [k for k, v in get_mode.items() if v == max(list(data.values()))]

    if len(mode) == lenOfScores:
        get_mode = "No reoccurring scores found"
    else:
        get_mode = "The most occurring score is " + ', '.join(map(str, mode))

    return get_mode


def main():
    studentScores = [44, 55, 22, 25, 13, 13, 23, 13, 56, 90, 39]

    print(printMode(studentScores))
    
    
    # Result
    The most occurring score is 13        

lenOfScores is holding the total number of student score (In this case, we have eleven scores)

Using the dict, we converted the data to dictionary and save it to get_mode.

We then initialize a list called mode with a For Loop to compare all the dict values (Number of scores) to the max of all dict values (count of most occurring scores) and it returns all the elements equal to max count.

If the elements returned are equal to the number of total elements in a list then we print out ‘No reoccurring scores found’, else we print out the modes returned.


?? Oh yes!


Finally!!!


We are done performing some ML statistics without using external library like numpy and scipy.


I hope you learnt something from this.


I will leave y’all with one advice…..

[Learn the behind the scene sometimes…. it builds your proficiency and sharpens you programming algorithm skills.]


Don’t forget to follow, like and subscribe!


See ya next time ??



要查看或添加评论,请登录

Patrick Olumba的更多文章

社区洞察

其他会员也浏览了