Power BI Histograms with Python Visuals: Dynamic KPI Selection! ????

Power BI Histograms with Python Visuals: Dynamic KPI Selection! ????

Discover a refined approach in Power BI, where 100+ dynamic KPIs redefine histograms without revolutionizing. We've found a smarter way, focusing on refining rather than reinventing. Dive into a realm where histograms meet finesse, overcoming Power BI constraints with ease. Our innovative method, guided by Six Sigma principles, dynamically adjusts bins and filters data precisely with IQR. Embrace a solution that transforms limitations into opportunities, elevating your insights seamlessly. It's a nuanced blend of dynamic KPIs and histogram mastery, offering a quick and effective solution for your data narratives. ??? #PowerBI #DataViz #Histograms #Innovation??? #PowerBI #Python #DataViz #DynamicKPIs #HistogramMastery #SixSigma #ScottsRule #DataInsights

Table Columns Snapshot ????

Get acquainted with the key players in our dataset:

  • Date: Timestamps of each record.
  • Employee: Names of the workforce.
  • KPI: Names of Key Performance Indicators (KPIs).
  • KPI Format: Defines the final KPI format.
  • Numerator: Holds results for quantitative or the numerator for quality KPIs.
  • Denominator: 1 for quantitative or the divisor for quality KPIs.
  • Division: Guides KPI calculations; set to 1 for Sum(Numerator) / Sum(Denominator).

This meticulous setup ensures precise calculations tailored to each KPI's unique essence. Ready for an enlightening dive into data exploration? ?? #DataInsights #TableColumns #DataExploration



Explorers welcome! Share your insights and suggestions. Find the enchanted code below for your data adventures.

# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script: 

# dataset = pandas.DataFrame(Employee, KPI, KPIFormat, Sum_Numerator, Sum_denominator,Period,Division)
# dataset = dataset.drop_duplicates()

# Paste or type your script code here:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
_kpi = list(dataset['KPI'].unique())[0]

dataset=dataset.rename(columns={dataset.columns[5]: 'Date'})

dataset['Sum_Numerator'] = dataset['Sum_Numerator'].apply(lambda x: x * 100 if 'KPIFormat' in dataset.columns and dataset['KPIFormat'].eq('Percent').all() else x)

condition = (dataset['Division'] == 1)
dataset[_kpi] = np.where(condition, dataset['Sum_Numerator']/dataset['Sum_denominator'], dataset['Sum_Numerator'])

dataset = dataset[dataset[_kpi].isna()==False]

# Filter the data based on IQR
Q1 = dataset[_kpi].quantile(0.25)
Q3 = dataset[_kpi].quantile(0.75)
IQR = Q3 - Q1

iqr_range = 1.5  # Adjust this value as needed

filtered_data = dataset[(dataset[_kpi] >= (Q1 - iqr_range * IQR)) & (dataset[_kpi] <= (Q3 + iqr_range * IQR))]

# Calculate the number of bins using Scott's rule
n = len(filtered_data)
bin_width = 3.5 * filtered_data[_kpi].std() / (n ** (1/3))
bins_ = max(int((filtered_data[_kpi].max() - filtered_data[_kpi].min()) / bin_width), 5)

# Calculate the histogram with bin edges
hist, bin_edges = np.histogram(filtered_data[_kpi], bins=bins_)
# Create the histogram with a KDE
sns.histplot(data=filtered_data
             , x=_kpi
             , kde=True
             , bins=bin_edges
            )

# Set the x-ticks to match the bin edges
plt.xticks(bin_edges)
# Calculate the mean and median of the filtered data
mean_val =  filtered_data['Sum_Numerator'].sum()/filtered_data['Sum_denominator'].sum()
avg_val = filtered_data[_kpi].mean()
median_val = filtered_data[_kpi].median()
min_val = filtered_data[_kpi].min()

# Calculate the standard deviation
std_val = filtered_data[_kpi].std()
vsf_val = (std_val*6)/mean_val
# Calculate the sample size
# Calculate the sample size
sample_size = len(filtered_data)
Employees = len(filtered_data['Employee'].unique())
# Add mean and median lines to the plot
plt.axvline(mean_val, color='red', linestyle='dashed', linewidth=2, label=f"Mean: {mean_val:.2f}")
plt.axvline(median_val, color='blue', linestyle='dashed', linewidth=2, label=f"Median: {median_val:.2f}")

# Add -1 sigma line (LSS) and +1 sigma line (USS)
sigma_minus_1 = mean_val - std_val
sigma_minus_1_N = filtered_data[filtered_data[_kpi] <= sigma_minus_1].shape[0]
sigma_plus_1 = mean_val + std_val
sigma_plus_1_N = filtered_data[filtered_data[_kpi] >= sigma_plus_1].shape[0]
plt.axvline(sigma_minus_1, color='purple', linestyle='dashed', linewidth=2, label=f"LSS: {sigma_minus_1:.2f}")
plt.axvline(sigma_plus_1, color='purple', linestyle='dashed', linewidth=2, label=f"USS: {sigma_plus_1:.2f}")

# Fill the area below the -1 sigma line (LSS) with color
plt.fill_betweenx([0, plt.ylim()[1]], plt.xlim()[0], sigma_minus_1, color='red', alpha=0.2, label='Parts below -1 Sigma')

# Add an annotation for parts below -1 sigma
annotation_x_lss = sigma_minus_1 - ((sigma_minus_1 - min_val) / 2)  # Adjust the position of the annotation
annotation_y_lss = 5  # Adjust the vertical position of the annotation
plt.text(annotation_x_lss, annotation_y_lss, f"Below -1 Sigma \n{sigma_minus_1_N} ({sigma_minus_1_N / sample_size:.2%})", fontsize=12, color='red')

# Fill the area above the +1 sigma line (USS) with color
plt.fill_betweenx([0, plt.ylim()[1]], sigma_plus_1, plt.xlim()[1], color='orange', alpha=0.2, label='Parts below -1 Sigma')

# Add an annotation for parts above +1 sigma
annotation_x_uss = sigma_plus_1 - ((sigma_plus_1 - mean_val) / 2)  # Adjust the position of the annotation
annotation_y_uss = 5  # Adjust the vertical position of the annotation
plt.text(annotation_x_uss, annotation_y_uss, f"Above +1 Sigma \n{sigma_plus_1_N} ({sigma_plus_1_N / sample_size:.2%})", fontsize=12, color='orange')

# Set Title
plt.title(_kpi + ' Histogram', size=20, y=1.05)

# Add data labels to histogram bars
for p in plt.gca().patches:
    plt.gca().annotate(f'{int(p.get_height())}', (p.get_x() + p.get_width() / 2., p.get_height()), ha='center', va='center', fontsize=10, color='black', xytext=(0, 10), textcoords='offset points')

# Manually set the legend labels to include Sample Size (N) and VSF
plt.legend(handles=[
    plt.Line2D([0], [0], color='red', linestyle='dashed', linewidth=2, label=f"Mean: {mean_val:.2f}"),
    plt.Line2D([0], [0], color='red', linestyle='dashed', linewidth=2, label=f"AVG: {avg_val:.2f}"),
    plt.Line2D([0], [0], color='blue', linestyle='dashed', linewidth=2, label=f"Median: {median_val:.2f}"),
    plt.Line2D([0], [0], color='purple', linestyle='dashed', linewidth=2, label=f"LSS: {sigma_minus_1:.2f}"),
    plt.Line2D([0], [0], color='purple', linestyle='dashed', linewidth=2, label=f"USS: {sigma_plus_1:.2f}"),
    plt.Line2D([0], [0], color='red', alpha=0.3, label='Parts below -1 Sigma', linewidth=2),
    plt.Line2D([0], [0], color='orange', alpha=0.3, label='Parts above +1 Sigma', linewidth=2),
    plt.Line2D([0], [0], color='red', label=f"Sample Size: {sample_size}"),
    plt.Line2D([0], [0], color='blue', label=f"#Employee: {Employees}"),
    plt.Line2D([0], [0], color='red', label=f"VSF: {vsf_val:.2f}"),
    plt.Line2D([0], [0], color='red', label=f"STD: {std_val:.2f}")   
])

plt.show()
        


Nikolaos Christoforidis

Senior Power BI Developer | Data Visualization Expert | Six Sigma Yellow Belt | Greece

1 年

Very good, I remember starting this visual for a QA report, but you have improved it a lot. It now shows all important information to get the full insight from the data. Good job!

回复

要查看或添加评论,请登录

Onur Uslu的更多文章

社区洞察

其他会员也浏览了