Central Limit Theorem (CLT) & Control Charts Formulae and Constants
Central Limit Theorem?(CLT)
Several years ago, I used a simulation to show that the Central Limit Theorem (CLT) could be confirmed without recourse to mathematics. The CLT enables us to estimate the population parameters from the sample statistics.
The CLT states that as the sample (subgroup) size increases, the distribution of sample means (Xbar) can be approximated by a normal distribution with mean?μ?and range standard deviation?σ÷√n
average of sample means:?Xbarbar?=?population mean?μ
standard deviation of the sample means:?σXbar?=?σ÷√n
Observation: the above two equations represent the CLT
As a first step, I had Minitab? generate?1,000,000?(k) sets of sample size (n) of?5 using these normal distribution parameters –
The “sample values” were stored consecutively in five rows.
Next, I had Minitab? calculate the mean (average) values of each row and store in a new column.
The graphical analysis is shown below –
On the?left-hand side?is the data from a?normal distribution?with a population of?5 x 1,000,000?data values –?Histogram of individual data points?(xi)
The?right-hand side?displays the Histogram of the?subgroup Means?(Xbar) of sample size,?n?= 5
Mean of subgroup means ≈?0?(zero)
StDev of subgroup means =?0.4469?≈?0.447
?
Remember that the?Central Limit Theorem?(CLT) states that –
?
My simulation (1,000,000 sets of sample size (n?= 5) shown that –
Xbarbar?=?μ?(i.e. -0.00004763 ≈ 0) CLT = 0
σXbar?=?σ?÷ √n?(i.e.?0.4469 ≈ 0.447) CLT = 1 ÷ √5 ≈ 0.447
QED?we’ve (empirically) ‘proved’ the CLT?for n = 5
Control Charts Formulae and Constants
I’m sure we’re all aware that it’s?easier?(and certainly more?economical) to work with 'averages' and 'ranges'! The following basic formulae relate to samples and ranges –
?
Sample (subgroup) average?Xbar
Xbar?=?(x1?+?x2?+ … +?xn)?÷?n?(reminder:?n?– sample size)
?
Grand average?Xbarbar
Xbarbar?=?(Xbar-1?+?Xbar-2?+ … +?Xbar-k)?÷?k?(k?– # of subgroups)
Range (subgroup range)?R
R?=?xMax?–?xMin
Average Range?Rbar
Rbar?=?(R1?+?R2?+ … +?Rk)?÷?k?(k?– number of subgroups)
It should be obvious that the population standard deviation?(σ) and?mean range?(Rbar) are both measures of the?variation?(or?spread) of a data set.
There is a ‘mathematical’ relationship between?Rbar?(the?mean range?for data from ‘samples’ – size?n) and?σ?(the?standard deviation of the?’population’).
this relationship depends only on the sample size, n
The mean range (Rbar) is?d2σ?where the value of?d2?(which is a ‘constant’) is also a function of?n.
an ‘estimator’ of σ = Rbar?/?d2
Also, the?population standard deviation?(σ) and the?standard deviation?of?range values?(σR) are again both measures of the?variation?(or?spread) of a data set.
There is ‘another mathematical’ relationship between?σR?(the?range values StDev?for data from ‘samples’ – size?n) and?σ?(the?standard deviation?of the ‘population’).
this relationship again depends again only on the sample size, n
The?StDev of range values?σR?is?d3σ?where the value of?d3?(which is another ‘constant’) is also a function of?n, therefore,?σR?=?d3?x?σ
an ‘estimator’ of σ = σR /?d3
Okay –?two?equations defining?σ?but we need to determine both?d2?and?d3
In demonstrating the CLT we used Minitab? to create?1,000,000?sets of sample values for?n?= 5 and their average range (Xbar).
Now let’s calculate the?range?of each of the sets of sample values (i.e.?xmax?-?xmin) and store in a further column labeled Range.
The resulting histogram of sample (subgroup) Ranges emerges –
Let’s analyze the?histogram?(very roughly?'bell-shaped') and note the?statistics?…
We now have everything needed to manually create an?Xbar-R?Chart?from scratch (for sample size?n?= 5) .. .. ..
Calculating the σXbar statistic from Rbar data
Estimate?of the ‘population’?(of ranges) standard deviation?–
σ?=?Rbar?/?d2? ? ? ? ? ? ? ? ? ? ? ? ? ? ? .. ?.. ?.. (1)
Remember: we do?not?know?σ?(the StDev of the?population of ranges)
Estimate?of the standard deviation (StDev) of?Xbar?using the CLT –
σXbar?=?σ?/ √n??? ? ? ? ? ? ? ? ? ? ? ? ?.. ?.. ?.. (2)
Reminder:? The CLT applies regardless of the shape of the population’s distribution
Substitute?σ?from (1) into above (2) gives –
领英推荐
?σXbar?= (Rbar?/?d2) / √n? ? ? ? ? ?.. ?.. ?.. (3)
Rationalizing, we have –
σXbar?= Rbar?/?(d2 x √n) ? ? ? ? ? ?.. ?.. ?.. (4)
?
Defining the?Xbar?UCL, Centre Line & LCL
Reminder: Xbarbar?is the average across all k subgroup averages (Xbar) and represents the process centre.
UCL~LCL =?Xbarbar?± 3?σXbar
… substitute for?σXbar?=?Rbar?/?(d2 x √n)]?from (4)
UCL ? ? ? ? = Xbarbar?+?3 x [Rbar?/?(d2 x √n)]
Centre Line = Xbarbar
LCL ? ? ? ? = Xbarbar?– 3 x [Rbar?/?(d2 x √n)]
Tables of Constants?define the ‘factor’?A2 where A2?= 3 ÷ [(d2 x √n)]
UCL ? ? ? ? = Xbarbar+ A2?Rbar
Centre Line = Xbarbar
LCL ? ? ? ? = Xbarbar?–?A2?Rbar
From our simulation –
d2 = 2.326 (mean of ranges)
d3 = 0.864 (StDev of ranges)
d3/d2 = 0.3714531384350817
?
A2?= 3 ÷ (d2x√n) = 3 ÷ (2.326 x 2.236) = 0.5768 ≈ 0.577
Defining the?Rbar?UCL, Centre Line & LCL
The?standard deviation?of the ‘range’ is –
σR?=?d3?x?σ?? ? ? ? ? ? ? ? ? .. ?.. ?.. (4)
As the?population standard deviation?(σ) is?unknown, we may estimate using –
σ?=?Rbar?/?d2? ? ? ? ? ? ? ? ? ?.. ?.. ?.. (1) substitute for?σ?in (4)
σR?=?d3?x?Rbar?/?d2? ? ? ? ?.. ?.. ?.. (5)
UCL =?Rbar?+ 3?σR? =?Rbar?+ 3?d3?x?Rbar?/?d2
Centre Line = Rbar
LCL =?Rbar?– 3?σR? ? =?Rbar?– 3?d3?x?Rbar?/?d2
Rationalizing, we have –
UCL =?Rbar?(1 + 3?d3?/?d2)
Centre Line = Rbar
LCL =?Rbar?(1 – 3?d3?/?d2)
Tables define the ‘factors’?D4?and?D3?to be –
D4?= (1 + 3?d3?/?d2)
D3?= (1 – 3?d3?/?d2)
UCL ? ? ? ? = Rbar?D4
Centre Line = Rbar
LCL ? ? ? ? = Rbar?D3
Table of constants for?Xbar?and?R?control charts
From our simulation –
d2 = 2.326 (mean of ranges)
d3 = 0.864 (StDev of ranges)
d3/d2 = 0.3714531384350817
D4 = 1 + 3 x d3/d2 = 2.114359415305245 ≈ 2.114 (as in Table)
D3 = 1 – 3 x?d3/d2 = -0.1143594153052451 ≈ zero (“-“ as in Table)
?
Conclusion & Notes
You should now understand how the constants d2 and A2 come from, and be confident to use them to deploy Xbar & R Charts with confidence.
?
Xbar Chart
Indicates how the average or mean changes over time. It is utilized to monitor the process mean when calculating subgroups at regular intervals from a process.
The Xbar Chart is typically combined with an R Chart to monitor process variables. If the variable is not under control, then control limits might be too general, which means that causes of variation that are affecting the process mean can’t be pinpointed.
Each point on the chart acts as a subgroup mean (Xbar) value. The process mean (Xbarbar) is the centre line, and if this isn’t specified, then it’s the weighted mean of the subgroup means.
R Chart
Indicates how the range of the subgroups changes over time. This is utilized to monitor process variability, like the range, when measuring subgroups less than ten at regular intervals in a process. Each point on a chart represents the subgroup range (R) value (xMax?–?xMin).
The range statistic expected value is the centre line for each subgroup. The centre line differs when subgroup sizes are not equal.
Important notes
It must be remembered that process control charts (e.g. Xbar & R) do not consider the actual component nominal and tolerances – they only monitor continuing ‘statistical control’ of the process