Kolmogorov Smirnov Statistic program coded in python3
Ian Martin Ajzenszmidt
Information Technology and Services Professional and Political Scientist (Retired - Not Working since 1989).
ian@ian-Latitude-E7440:~$ python3 KolmogorovSmirnovStatistic.py
Kolmogorov-Smirnov Statistic Visualization
Value CDF1 CDF2 Difference
0.005 0.010 0.000 0.010
0.015 0.020 0.000 0.020
0.025 0.030 0.000 0.030
0.035 0.040 0.000 0.040
0.045 0.050 0.000 0.050
0.055 0.060 0.000 0.060
0.065 0.070 0.000 0.070
0.075 0.080 0.000 0.080
0.085 0.090 0.000 0.090
0.095 0.100 0.000 0.100
0.105 0.110 0.000 0.110
0.115 0.120 0.000 0.120
0.125 0.130 0.000 0.130
0.135 0.140 0.000 0.140
0.145 0.150 0.000 0.150
0.155 0.160 0.000 0.160
0.165 0.170 0.000 0.170
0.175 0.180 0.000 0.180
0.185 0.190 0.000 0.190
0.195 0.200 0.000 0.200
0.205 0.210 0.000 0.210
0.215 0.220 0.000 0.220
0.225 0.230 0.000 0.230
0.235 0.240 0.000 0.240
0.245 0.250 0.000 0.250
0.255 0.260 0.000 0.260
0.265 0.270 0.000 0.270
0.275 0.280 0.000 0.280
0.285 0.290 0.000 0.290
0.295 0.300 0.000 0.300
0.305 0.310 0.000 0.310
0.315 0.320 0.000 0.320
0.325 0.330 0.000 0.330
0.335 0.340 0.000 0.340
0.345 0.350 0.000 0.350
0.350 0.350 0.500 0.150
0.355 0.360 0.500 0.140
0.365 0.370 0.500 0.130
0.375 0.380 0.500 0.120
0.385 0.390 0.500 0.110
0.395 0.400 0.500 0.100
0.405 0.410 0.500 0.090
0.415 0.420 0.500 0.080
0.425 0.430 0.500 0.070
0.435 0.440 0.500 0.060
0.445 0.450 0.500 0.050
0.455 0.460 0.500 0.040
0.465 0.470 0.500 0.030
0.475 0.480 0.500 0.020
0.485 0.490 0.500 0.010
0.495 0.500 0.500 0.000
0.505 0.510 0.500 0.010
0.515 0.520 0.500 0.020
0.525 0.530 0.500 0.030
0.535 0.540 0.500 0.040
0.545 0.550 0.500 0.050
0.555 0.560 0.500 0.060
0.565 0.570 0.500 0.070
0.575 0.580 0.500 0.080
0.585 0.590 0.500 0.090
0.595 0.600 0.500 0.100
0.605 0.610 0.500 0.110
0.615 0.620 0.500 0.120
0.625 0.630 0.500 0.130
0.635 0.640 0.500 0.140
0.645 0.650 0.500 0.150
0.650 0.650 1.000 0.350
0.655 0.660 1.000 0.340
0.665 0.670 1.000 0.330
0.675 0.680 1.000 0.320
领英推荐
0.685 0.690 1.000 0.310
0.695 0.700 1.000 0.300
0.705 0.710 1.000 0.290
0.715 0.720 1.000 0.280
0.725 0.730 1.000 0.270
0.735 0.740 1.000 0.260
0.745 0.750 1.000 0.250
0.755 0.760 1.000 0.240
0.765 0.770 1.000 0.230
0.775 0.780 1.000 0.220
0.785 0.790 1.000 0.210
0.795 0.800 1.000 0.200
0.805 0.810 1.000 0.190
0.815 0.820 1.000 0.180
0.825 0.830 1.000 0.170
0.835 0.840 1.000 0.160
0.845 0.850 1.000 0.150
0.855 0.860 1.000 0.140
0.865 0.870 1.000 0.130
0.875 0.880 1.000 0.120
0.885 0.890 1.000 0.110
0.895 0.900 1.000 0.100
0.905 0.910 1.000 0.090
0.915 0.920 1.000 0.080
0.925 0.930 1.000 0.070
0.935 0.940 1.000 0.060
0.945 0.950 1.000 0.050
0.955 0.960 1.000 0.040
0.965 0.970 1.000 0.030
0.975 0.980 1.000 0.020
0.985 0.990 1.000 0.010
0.995 1.000 1.000 0.000
Kolmogorov-Smirnov Statistic: 0.350
ian@ian-Latitude-E7440:~$ cat KolmogorovSmirnovStatistic.py
# Function to calculate the empirical cumulative distribution function (ECDF)
def ecdf(data):
n = len(data)
sorted_data = sorted(data)
x = sorted_data
y = [i / n for i in range(1, n + 1)]
return x, y
# Generate two sample datasets
data1 = [(i + 0.5) / 100 for i in range(100)] # Uniform(0, 1)
data2 = [0.5 + 0.15 (2 (i % 2) - 1) for i in range(200)] # Normal(mean=0.5, std=0.15)
# Calculate the ECDFs for both datasets
x1, y1 = ecdf(data1)
x2, y2 = ecdf(data2)
# Precompute cumulative counts for efficiency
def precompute_cdf(data, all_x):
sorted_data = sorted(data)
cdf = []
count = 0
for val in all_x:
while count < len(sorted_data) and sorted_data[count] <= val:
count += 1
cdf.append(count / len(sorted_data))
return cdf
# Merge and deduplicate all x-values
all_x = sorted(set(x1 + x2)) # Union of x-values from both datasets
cdf1 = precompute_cdf(x1, all_x)
cdf2 = precompute_cdf(x2, all_x)
ks_statistic = max(abs(c1 - c2) for c1, c2 in zip(cdf1, cdf2))
# Visualization (simple ASCII-based output for no dependencies)
print("\nKolmogorov-Smirnov Statistic Visualization")
print(f"{'Value':<10}{'CDF1':<10}{'CDF2':<10}{'Difference':<10}")
for val, c1, c2 in zip(all_x, cdf1, cdf2):
diff = abs(c1 - c2)
print(f"{val:<10.3f}{c1:<10.3f}{c2:<10.3f}{diff:<10.3f}")
# Print KS statistic
print(f"\nKolmogorov-Smirnov Statistic: {ks_statistic:.3f}")
ian@ian-Latitude-E7440:~$
Kolmogorov Smirnov