Power BI Histograms with Python Visuals: Dynamic KPI Selection! ????
Discover a refined approach in Power BI, where 100+ dynamic KPIs redefine histograms without revolutionizing. We've found a smarter way, focusing on refining rather than reinventing. Dive into a realm where histograms meet finesse, overcoming Power BI constraints with ease. Our innovative method, guided by Six Sigma principles, dynamically adjusts bins and filters data precisely with IQR. Embrace a solution that transforms limitations into opportunities, elevating your insights seamlessly. It's a nuanced blend of dynamic KPIs and histogram mastery, offering a quick and effective solution for your data narratives. ??? #PowerBI #DataViz #Histograms #Innovation??? #PowerBI #Python #DataViz #DynamicKPIs #HistogramMastery #SixSigma #ScottsRule #DataInsights
Get acquainted with the key players in our dataset:
This meticulous setup ensures precise calculations tailored to each KPI's unique essence. Ready for an enlightening dive into data exploration? ?? #DataInsights #TableColumns #DataExploration
领英推荐
Explorers welcome! Share your insights and suggestions. Find the enchanted code below for your data adventures.
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:
# dataset = pandas.DataFrame(Employee, KPI, KPIFormat, Sum_Numerator, Sum_denominator,Period,Division)
# dataset = dataset.drop_duplicates()
# Paste or type your script code here:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
_kpi = list(dataset['KPI'].unique())[0]
dataset=dataset.rename(columns={dataset.columns[5]: 'Date'})
dataset['Sum_Numerator'] = dataset['Sum_Numerator'].apply(lambda x: x * 100 if 'KPIFormat' in dataset.columns and dataset['KPIFormat'].eq('Percent').all() else x)
condition = (dataset['Division'] == 1)
dataset[_kpi] = np.where(condition, dataset['Sum_Numerator']/dataset['Sum_denominator'], dataset['Sum_Numerator'])
dataset = dataset[dataset[_kpi].isna()==False]
# Filter the data based on IQR
Q1 = dataset[_kpi].quantile(0.25)
Q3 = dataset[_kpi].quantile(0.75)
IQR = Q3 - Q1
iqr_range = 1.5 # Adjust this value as needed
filtered_data = dataset[(dataset[_kpi] >= (Q1 - iqr_range * IQR)) & (dataset[_kpi] <= (Q3 + iqr_range * IQR))]
# Calculate the number of bins using Scott's rule
n = len(filtered_data)
bin_width = 3.5 * filtered_data[_kpi].std() / (n ** (1/3))
bins_ = max(int((filtered_data[_kpi].max() - filtered_data[_kpi].min()) / bin_width), 5)
# Calculate the histogram with bin edges
hist, bin_edges = np.histogram(filtered_data[_kpi], bins=bins_)
# Create the histogram with a KDE
sns.histplot(data=filtered_data
, x=_kpi
, kde=True
, bins=bin_edges
)
# Set the x-ticks to match the bin edges
plt.xticks(bin_edges)
# Calculate the mean and median of the filtered data
mean_val = filtered_data['Sum_Numerator'].sum()/filtered_data['Sum_denominator'].sum()
avg_val = filtered_data[_kpi].mean()
median_val = filtered_data[_kpi].median()
min_val = filtered_data[_kpi].min()
# Calculate the standard deviation
std_val = filtered_data[_kpi].std()
vsf_val = (std_val*6)/mean_val
# Calculate the sample size
# Calculate the sample size
sample_size = len(filtered_data)
Employees = len(filtered_data['Employee'].unique())
# Add mean and median lines to the plot
plt.axvline(mean_val, color='red', linestyle='dashed', linewidth=2, label=f"Mean: {mean_val:.2f}")
plt.axvline(median_val, color='blue', linestyle='dashed', linewidth=2, label=f"Median: {median_val:.2f}")
# Add -1 sigma line (LSS) and +1 sigma line (USS)
sigma_minus_1 = mean_val - std_val
sigma_minus_1_N = filtered_data[filtered_data[_kpi] <= sigma_minus_1].shape[0]
sigma_plus_1 = mean_val + std_val
sigma_plus_1_N = filtered_data[filtered_data[_kpi] >= sigma_plus_1].shape[0]
plt.axvline(sigma_minus_1, color='purple', linestyle='dashed', linewidth=2, label=f"LSS: {sigma_minus_1:.2f}")
plt.axvline(sigma_plus_1, color='purple', linestyle='dashed', linewidth=2, label=f"USS: {sigma_plus_1:.2f}")
# Fill the area below the -1 sigma line (LSS) with color
plt.fill_betweenx([0, plt.ylim()[1]], plt.xlim()[0], sigma_minus_1, color='red', alpha=0.2, label='Parts below -1 Sigma')
# Add an annotation for parts below -1 sigma
annotation_x_lss = sigma_minus_1 - ((sigma_minus_1 - min_val) / 2) # Adjust the position of the annotation
annotation_y_lss = 5 # Adjust the vertical position of the annotation
plt.text(annotation_x_lss, annotation_y_lss, f"Below -1 Sigma \n{sigma_minus_1_N} ({sigma_minus_1_N / sample_size:.2%})", fontsize=12, color='red')
# Fill the area above the +1 sigma line (USS) with color
plt.fill_betweenx([0, plt.ylim()[1]], sigma_plus_1, plt.xlim()[1], color='orange', alpha=0.2, label='Parts below -1 Sigma')
# Add an annotation for parts above +1 sigma
annotation_x_uss = sigma_plus_1 - ((sigma_plus_1 - mean_val) / 2) # Adjust the position of the annotation
annotation_y_uss = 5 # Adjust the vertical position of the annotation
plt.text(annotation_x_uss, annotation_y_uss, f"Above +1 Sigma \n{sigma_plus_1_N} ({sigma_plus_1_N / sample_size:.2%})", fontsize=12, color='orange')
# Set Title
plt.title(_kpi + ' Histogram', size=20, y=1.05)
# Add data labels to histogram bars
for p in plt.gca().patches:
plt.gca().annotate(f'{int(p.get_height())}', (p.get_x() + p.get_width() / 2., p.get_height()), ha='center', va='center', fontsize=10, color='black', xytext=(0, 10), textcoords='offset points')
# Manually set the legend labels to include Sample Size (N) and VSF
plt.legend(handles=[
plt.Line2D([0], [0], color='red', linestyle='dashed', linewidth=2, label=f"Mean: {mean_val:.2f}"),
plt.Line2D([0], [0], color='red', linestyle='dashed', linewidth=2, label=f"AVG: {avg_val:.2f}"),
plt.Line2D([0], [0], color='blue', linestyle='dashed', linewidth=2, label=f"Median: {median_val:.2f}"),
plt.Line2D([0], [0], color='purple', linestyle='dashed', linewidth=2, label=f"LSS: {sigma_minus_1:.2f}"),
plt.Line2D([0], [0], color='purple', linestyle='dashed', linewidth=2, label=f"USS: {sigma_plus_1:.2f}"),
plt.Line2D([0], [0], color='red', alpha=0.3, label='Parts below -1 Sigma', linewidth=2),
plt.Line2D([0], [0], color='orange', alpha=0.3, label='Parts above +1 Sigma', linewidth=2),
plt.Line2D([0], [0], color='red', label=f"Sample Size: {sample_size}"),
plt.Line2D([0], [0], color='blue', label=f"#Employee: {Employees}"),
plt.Line2D([0], [0], color='red', label=f"VSF: {vsf_val:.2f}"),
plt.Line2D([0], [0], color='red', label=f"STD: {std_val:.2f}")
])
plt.show()
Senior Power BI Developer | Data Visualization Expert | Six Sigma Yellow Belt | Greece
1 年Very good, I remember starting this visual for a QA report, but you have improved it a lot. It now shows all important information to get the full insight from the data. Good job!