登录查看更多内容

How to code math equations?

Bunyamin Ergen

Artificial Intelligence Engineer @eTa??n | Multi-Agent AI Systems / Agentic AI, LLM, State-of-the-Art Technologies, Multi-Modal Learning, Speech-to-Text, Computer Vision and Adversarial Machine Learning

发布日期: 2022年11月11日

+ 关注

Turn into math equations and formulas to python code

I will explain step by step how to code a math equation. in order; Basic Intermediate Advanced Quantum :) (Maybe)

in order; simple math operations, equations with one unknown,variance and standard deviation, confidence interval and Pearson correlation coefficient. Last but not least, simple linear regression equation.

CONTENTS

1. Start with Basics

2. Intermediate

2.a. Variance

2.b. Standard Deviation (Sample)

2.c. Confidence Interval

2.c.a Confidence Level Value (Z-score)

3. Simple Linear Regression

3.a. β1 (slope)

3.a.a. Pearson correlation coefficient

3.b. β0 (intercept)

3.c R-Squared (Coefficient of determination)

3.c.a RSS (Residual Sum of Squares)

3.c.b TSS (Total Sum of Squares)

3.d Adjusted R-Squared

Resources

1. Start with Basics

I’m getting straight to the point, assuming everyone knows the Python math operators.

You can refer to 1st link in the Resources for more details.

x + 3 = 2

x plus 3 equals to 2

solution simple right ?

x = 2 - 3

let’s try another equation. the following equation is twice of x plus 1 is equal to 5.

2x + 1 = 5

So let’s leave x alone. it looks like someone who wants to be alone. :)

So:

2x + 1 -1 = 5–1

2x = 5–1

2x = 4

x = 4/2


x = 4/2

2. Intermediate

Let’s jump from boring simple operations to some more fun stuff. :)

Variance , Standard Deviation , Confidence Interval , Pearson Correlation Coefficient

2.a. Variance

Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. More mathematical sentence; Subtract the mean from each value, Square each of the resulting values, then sum all values.

S2 = sample variance (S exponent 2)

Xi = the value of the one observation ( x sub i )

x? = the mean value of all observations ( x bar)

n = the number of observations

Σ = the summation symbol i.e. sum of observations (The upper case letter sigma)

Below subtract all values from the mean with a for loop.

And square all the values and add them all.

Finally, subtract 1 from the sample number and divide by the total number we find.


from numpy import rando
	

	var_sample = random.normal(size = 11)
	

	var_n = len(var_sample)
	

	var_mean = sum(var_sample) / len(var_sample)
	

	var = sum((var_i - var_mean)**2 for var_i in var_sample) / (var_n - 1)
	

	
	"""

	I am adding the stages of the above code to the below so that it can be followed easily.
	
	var =                           for var_i in var_sample                
	var =      var_i - var_mean     for var_i in var_sample                 
	var =     (var_i - var_mean)**2 for var_i in var_sample                 
	var = sum((var_i - var_mean)**2 for var_i in var_sample)               
	var = sum((var_i - var_mean)**2 for var_i in var_sample) / (var_n-1)   

	"""

2.b. Standard Deviation (Sample)

Standard deviation is a measure of the amount of variation or dispersion of a set of values.

Standard Deviation (Sample) Equation / Formula

s = sample standard deviation

N = the number of observations

x? = the observed values of a sample item

Σ = the summation symbol i.e. sum of observations (The upper case letter sigma)

If look closely at the equation in the link above, can see that the standard deviation is actually the square root of the variance.


std = (sum((var_i - var_mean)**2 for var_i in var_sample) / (var_n - 1))**0.5
	

# or
	

var**0.5 # exponent 0.5 means square root
	

# or
	

import math
	

math.sqrt(var)

2.c. Confidence Interval

Confidence interval (CI) is a range of estimates for an unknown parameter.

A confidence interval is computed at a designated confidence level;

the 95% confidence level is most common, but other levels, such as 90% or 99%, are sometimes used.

The confidence level represents the long-run proportion of corresponding CIs that contain the true value of the parameter.

For example, out of all intervals computed at the 95% level, 95% of them should contain the parameter’s true value.

More mathematical sentence;

Confidence Interval is the sample mean minus/plus the z-score multiply the standard deviation divided by the square root of the sample size.

No alt text provided for this image — Confidence Interval Equation / Formula

CI = confidence interval

x? = sample mean

z = confidence level value (z-score)

s = sample standard deviation

n = sample size


from numpy import rando

# sample xx = random.normal(size = 29)

# mean of xx_mean = sum(x)/len(x)

# size of samplen = len(x)m

Standard Deviation of x (s = sample standard deviation)

I explained how to find the standard deviation above.

But let’s go over it again.

All values are subtracted from the mean and the found values are squared.

Then add them all up and divide by the number of elements minus 1.

Then calculate the square root of the resulting number.


x_std = (sum((x_i - x_mean)**2 for x_i in x) / (n - 1))**0.5

2.c.a Confidence Level Value (Z-score)

Confidence Level Value, standard score or z score is the number of standard deviations by which the value of a raw score

(i.e., an observed value or data point) is above or below the mean value of what is being observed or measured.

Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores.

How is the Z score calculated? If we’re going to take the confidence interval as 95%, which can be taken as 99%, at 90%, the choice is up to you.

Then let’s look at the red area in the image above. That’s our safe zone. :)

That region represents 95%, that is, 0.95.

So what is the area left and right?

If the whole area is 1, subtract 0.95 from 1, we get these two areas,

so 1–0.95 = 0.5.

And if we divide that by two, we get the areas and on the right.

That’s 0.5/2 = 0.025.

Let’s do this with the formula.


(1-0.95)/2

Let’s find the z-score.

Focus on blue area in the image below.

If the bottom of the whole curve is 1 and the area on the right is 0.025,

then the blue area on the left is 1–0.025.


1 - 0.025
# 0.9755

If we find the number we found above from the table and add the numbers on the axis, we get the following number.


1.9 + 0.06
# 1.96

Now that we have found all the unknowns, let’s put them in the equation and implamate.

Here implement the plus/minus expression in the equation to both sides of the distribution.

We use addition in the equation for the positive side and subtraction for the negative side.

look at the equation again


CI_pos = x_mean + (1.96 * (x_std/(n**0.5))
	

CI_neg = x_mean - (1.96 * (x_std/(n**0.5))))

3. Simple Linear Regression

Simple linear regression is a method used to predict the dependent variable with the help of the independent variable

when there is a linear relationship between a single independent variable and the dependent variable.

Simple Linear Regression Equation / Formula

Y = dependent variable

β0 = intercept

β1 = slope

X = independent variable

? = random error

Rami Krispin 1 个月前

Exploring Linear Algebra with Python and NumPy

Rany ElHousieny, PhD??? 1 年前

Power of NumPy: A Fundamental Python Library for…

Akash Jha 1 年前

Of course, since want find the values to be predicted, should the equation as follows.

The only difference is, as you know, that hats which is mean predicted value and “ i “ letter means each value.

For more information, you can refer to the projection matrix.

Create a dataset with a linear relation. Can implement this from the datasets class in the sklearn module.


from sklearn import dataset
	

x , y  = datasets.make_regression(random_state = 17)

Now we have a data set of independent variables and dependent variable.

Since we will implement simple linear regression, let’s take one of the independent variables and assign it as x.


x = x[0]

3.a. β1 (slope)

Dive into solving the equation!

we have x and y. And we know that they are in a linear relation.

then let’s find other unknowns

first β1 i.e. slope

Let’s explain the above unknowns.

r = Pearson correlation coefficient

Sy = standard deviation of y

Sx = standard deviation of x

Let’s start with the simple first and find the standard deviations of x and y.

Since I explained how to find the standard deviation above, I’m going directly to the solution.


y_mean = sum(y)/len(y)
	

x_mean = sum(x)/len(x)
	

Sy = (sum((y_i - y_mean)**2 for y_i in y) / (len(y) - 1))**0.5)
	

Sx = (sum((x_i - x_mean)**2 for x_i in x) / (len(x) - 1))**0.5)

3.a.a. Pearson correlation coefficient

Now that we have found the standard deviations of x and y, we can move on to r, i.e. Pearson correlation coefficient.

Pearson Correlation Coefficient Equation / Formula

r = correlation coefficient

xi = values of the x-variable in a sample ( x sub i )

x? = mean of the values of the x-variable

yi = values of the y-variable in a sample ( y sub i )

? = mean of the values of the y-variable

Let’s explain equation.There is a vector operation the numerator of this fraction. The first Σ (upper case sigma) sign means summation symbol, that is, we will summation the results. Let’s take the first parenthesis for the numerator, subtract the mean of x from each x value, Then move on to the next parenthesis, subtract the mean of y from each y value. Then we will have two new arrays , Then multiply this two array , And then summation them all , And we will have found the numerator of the fraction of this equation. Let’s implement the code for the numerator part first to avoid confusion.


import numpy as np
	

r_up = sum(np.array([(x_i - x_mean) for x_i in x]) * np.array([(y_i - y_mean) for y_i in y]))

Now let’s do the denominator part of the equation:

for the denominator, we’ll subtract the mean of x from each x value and square the resulting values and summation them all . implement the same for y. multiply two array each other. Then take the square root of the result.


r_down = (sum([(x_i - x_mean) ** 2 for x_i in x]) * sum([(y_i - y_mean) ** 2 for y_i in y])) ** 0.5

Now that have the numerator and denominator part of the equation, solve the equation.


r = r_up / r_down

If you want, you can code directly without creating a variable as follows. Personally, I like the following process more. I don’t like creating variables for even the slightest thing.


r = sum(np.array([(x_i - x_mean) for x_i in x]) * np.array([(y_i - y_mean) for y_i in y])) / (sum([(x_i - x_mean)**2 for x_i in x]) * sum([(y_i - y_mean)**2 for y_i in y]))**0.5

Yes, now that we’ve found all the variables, let’s find our main operation, β1, i.e. the slope. check equation again.


B1 = r * (Sy/Sx)

3.b. β0 (intercept)

Now solve for β0


B0 = y_mean - (B1 * x_mean)

3.c R-Squared (Coefficient of determination)

Now let’s find R-Squared, i.e. Coefficient of determination.

In statistics, the coefficient of determination, denoted R2 or r2 and pronounced “R squared”, is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).

It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing of hypotheses,

on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.

R-Squared (Coefficient of determination) Equation / Formula

R2 = coefficient of determination

RSS = sum of squares of residuals

TSS = total sum of squares

3.c.a RSS (Residual Sum of Squares)

Let’s go step by step and find the RSS first.

RSS = residual sum of squares

y_i = each value of the dependent variable

f(x_i) = predicted values of the dependent variables


# find the predicted values first
y_pred = [B0 + (B1 * x_i) for x_i in x]
	

RSS = sum((np.array(y) - np.array(y_pred)) ** 2)

3.c.b TSS (Total Sum of Squares)

Now TSS

TSS = total sum of squares

n = number of observations

y_i = each value in a sample ( y sub i )

? = mean value of a sample ( y bar )


TSS = sum((y_i - y_mean)**2 for y_i in y)

Now that all the unknowns have been revealed,

find the R-Squared, i.e. Coefficient of determination


R2 = 1 - RSS/TSS

3.d Adjusted R-Squared

Let’s also find the adjusted R2 value to get a more accurate result.

R2 = Sample R-squared

n = sample size

p = number of independent variable

Since we are implementing the simple linear regression equation we naturally have one independent variable so p = 1.


n = len(x
	

p = 1

	
R2_adf = 1 - ((1 - R2) * (n - 1)) / (n - p - 1))

As seen, actually need two things to code an equation.

First, understand the equation; meanings of unknowns and how to solve them.

Second, encode the equation according to the python syntax.

Of course, it is necessary to implement this all simple linear codes to a function or class. But maybe next time. :)

The next step can be multiple linear regression and other algorithms.

Let’s stay up until morning and code.

Then we’ll have a good sleep and maybe solve the quantum equations in our dreams. :)

Please note that some equations are solved in more than one way, just pick one and go for it.

Note: You can access all of the above codes in a single file from the link below.

https://github.com/bunyaminergen/how_to_code_math_equations/blob/main/math_equations.py

Thank you very much.

Bunyamin Ergen

— — — — — — — — — — — — — — — — — —

?? www.bunyaminergen.com

linkedin.com/bunyaminergen

github.com/bunyaminergen

kaggle.com/bunyaminergen

instagram.com/bunyaminergen

facebook.com/bunyaminergenoffical

twitter.com/bergenoffical

youtube.com/bunyaminergen

— — — — — — — — — — — — — — — — — —

Resources

https://en.wikibooks.org/wiki/Python_Programming/Basic_Math#Order_of_Operations

https://en.wikibooks.org/wiki/Python_Programming/Math

https://en.wikipedia.org/wiki/Equation

https://en.wikipedia.org/wiki/Standard_deviation

https://en.wikipedia.org/wiki/Algebra

https://en.wikipedia.org/wiki/Linear_algebra

https://en.wikipedia.org/wiki/Algorithm

https://tr.wikipedia.org/wiki/G%C3%BCven_aral%C4%B1%C4%9F%C4%B1

https://en.wikipedia.org/wiki/Confidence_interval

https://www.hec.ca/en/cams/help/topics/The_summation_symbol.pdf

Hikmet Burak ?zcan

Research Assistant at IZTECH

2 年

Without giving it a second thought, I archived it. I appreciate your work.

2 次回应

Enes ?ztürk

Data Scientist @Colendi | Data Scientist Mentor @Miuul

2 年

You have made a very good content, I liked it very much??

2 次回应

查看更多评论

要查看或添加评论，请登录

FoodCLIP Released !

2024年6月1日
Advancements and Challenges in Multimodal Machine Learning

2024年1月17日
ChatGPT API'nin Fonksiyon ?zelli?i ile Uygulamalar?n?z? Bir üst Seviyeye Nas?l Ta??rs?n?z ?

2023年9月25日
RFC, IETF ve IANA ??

2023年9月24日
Python Geli?tirme ??in En ?yi Se?enek nedir ?

2023年9月15日
ASCII Art Nedir?

2023年9月12日
LinkedIn Collaborative Articles ve Skills Pages

2023年9月11日
Ne zaman ba?lam???m ?

2023年9月8日
Django REST Framework ile API Geli?tirme: Temel Rehber

2023年9月7日
S?k kulland???n?z komutlar i?in: Scratch Files, Snippets, Live Templates, Terminal Alias'lar? ve PowerShell Profile.ps1

2023年8月26日

查看全部

How to code math equations?

Bunyamin Ergen

Artificial Intelligence Engineer @eTa??n | Multi-Agent AI Systems / Agentic AI, LLM, State-of-the-Art Technologies, Multi-Modal Learning, Speech-to-Text, Computer Vision and Adversarial Machine Learning

1. Start with Basics

2. Intermediate

2.a. Variance

2.b. Standard Deviation (Sample)

2.c. Confidence Interval

2.c.a Confidence Level Value (Z-score)

3. Simple Linear Regression

领英推荐

3.a. β1 (slope)

3.a.a. Pearson correlation coefficient

3.b. β0 (intercept)

3.c R-Squared (Coefficient of determination)

3.c.a RSS (Residual Sum of Squares)

3.c.b TSS (Total Sum of Squares)

3.d Adjusted R-Squared

Resources

更多精彩文章

社区洞察

其他会员也浏览了

My Review on Book "Scientific Computing with Python Second Edition" by Packt Publication

Top 5 Python Libraries Every Developer Should Know

Basics of NumPy

NumPy Basics Operation & Function

Hyperoperations Implementation in Python, Part 1.

The Algorithm

Machine Learning 101 All Algorithms in python (Linear Regression)

Numpy

Python For Kids (Part 22: Float Primitive Data Type)

10 Essential Python Libraries for Data Science in 2023

1. Start with Basics

2. Intermediate

2.a. Variance

2.b. Standard Deviation (Sample)

2.c. Confidence Interval

2.c.a Confidence Level Value (Z-score)

3. Simple Linear Regression

领英推荐

3.a. β1 (slope)

3.a.a. Pearson correlation coefficient

3.b. β0 (intercept)

3.c R-Squared (Coefficient of determination)

3.c.a RSS (Residual Sum of Squares)

3.c.b TSS (Total Sum of Squares)

3.d Adjusted R-Squared

Resources

FoodCLIP Released !

2024年6月1日

Advancements and Challenges in Multimodal Machine Learning

2024年1月17日

ChatGPT API'nin Fonksiyon ?zelli?i ile Uygulamalar?n?z? Bir üst Seviyeye Nas?l Ta??rs?n?z ?

2023年9月25日

RFC, IETF ve IANA ??

2023年9月24日

Python Geli?tirme ??in En ?yi Se?enek nedir ?

2023年9月15日

ASCII Art Nedir?

2023年9月12日

LinkedIn Collaborative Articles ve Skills Pages

2023年9月11日

Ne zaman ba?lam???m ?

2023年9月8日

Django REST Framework ile API Geli?tirme: Temel Rehber

2023年9月7日

S?k kulland???n?z komutlar i?in: Scratch Files, Snippets, Live Templates, Terminal Alias'lar? ve PowerShell Profile.ps1

2023年8月26日

社区洞察

其他会员也浏览了

My Review on Book "Scientific Computing with Python Second Edition" by Packt Publication

Top 5 Python Libraries Every Developer Should Know

Basics of NumPy

NumPy Basics Operation & Function

Hyperoperations Implementation in Python, Part 1.

The Algorithm

Machine Learning 101 All Algorithms in python (Linear Regression)

Numpy

Python For Kids (Part 22: Float Primitive Data Type)

10 Essential Python Libraries for Data Science in 2023