Cpk a great concept, not being used properly
The idea of describing data from a process (Voice of the Process) in relation to customer needs (Voice of the Customer) using a capability index Cpk is great. We must evaluate data in the context of what will make customers happy. In the following it will be called Ppk, since this is the index that should be used. For the difference see later.
As usual the article became longer than planned due to the many issues we see using Ppk at customers. I have made the issues numbered, so you can jump to the issue of interest.
Most of the issues mentioned below can be solved by describing your data with a model and then calculate capability based on predictions from this model.
The main issues are:
1.???Target is not being enforced
2.???With asymmetric specifications Pp is over estimated
3.???USL and LSL are often set tighter than necessary
4.???The requirement for Ppk is often set higher than necessary
5.???Short term standard deviation is being used to describe the spread of the process even if it is much smaller than the total standard deviation
6.???Wrong estimation of variance when there is more than one variance component
7.???Data are assumed normal distributed even if the shape is clearly not bell-shaped
8.???No test for outliers to the assumed distribution
9.???Ppk is not enforced with confidence
10.?In validation Ppk of the past is demonstrated, not Ppk of the future
Summary solutions to issues:
Solutions to the above-mentioned points are listed below in bullet form.
Further elaboration of below solutions is provided in the last chapter.
1.???Target is not being enforced. Implement Pk index and enforce it to be lower than 0.25.
2.???With asymmetric specifications Pp is over estimated. Change Pp formula to Ppk formula, where mean μ is replaced with Target.
3.???USL and LSL are often set tighter than necessary. Do a tolerance stack-up to see if it is possible to widen and/or redistribute tolerances.
4.???The requirement for Ppk is often set higher than necessary. If Ppk is enforced with confidence, as it should, there is no need to have Ppk requirements higher than 1.
5.???Short term standard deviation is being used to describe the spread. Use Ppk instead of Cpk.
6.???Wrong estimation of variance when there is more than one variance component. Make a variance component analysis to find the total variance.
7.???Data are assumed normal distributed even though the shape is clearly not bell-shaped. Use Quantile distances to estimated process widths or calculate Ppk from control limits and prediction formula from a model.
8.???No test for outliers to the assumed distribution. Use studentized residual plot in e.g., JMP from SAS to check for both outliers and variance homogeneity.
9.???Ppk is not enforced with confidence. Make a model describing your data set and use individual confidence limits (prediction limits) and prediction formula to calculate Ppk. Then it will be with confidence.
10.?In validation Ppk of the past is demonstrated, not Ppk of the future. Enter batch as random factor, thereby future batches are predicted.
Voice of the Process and the Customer including combination of these
The basic properties of a distribution are:
1.???Location
2.???Spread
3.???Shape
Knowing these three, we have the Voice of the Process (VOP)
Voice of the Customer (VOC) can be expressed by:
1.???Target
2.???Lower Specification Limit LSL (if relevant)
3.???Upper Specification limit USL (if relevant)
There should always be a target and at least one specification limit.
An example where all three are relevant could by content of Active Pharmaceutical Ingredient (API). Target will ensure customer get the right treatment. LSL to ensure that the treatment works and USL to avoid side effects.
An example, where there is only Target and USL could be contamination level. Here Target could be 0 or lowest practical realizable level and USL the level where there is an increased risk of harm using the product. There is no lower specification limit seen from the customer point of view and thereby no LSL.
An example, where there is only Target and LSL could be mechanical strength. Here Target could be the highest practical realizable level and LSL the level where there is an increased risk of failure using the product. There is no upper specification limit seen from the customer point of view and thereby no USL.
If a specification limit is not relevant, do not use it. Otherwise, you may later be punished for being too good.
A capability index is the ratio of VOC and VOP:
You can make an analogy to driving a car into a garage. VOC is the width of the garage and VOP is the width of the car. Obviously, the width of the garage must be wider than the width of the car and depending on your driving skills, it might have to be substantially wider. In Six Sigma the ambition is that the ratio should be 2, to have robust process. Having kids that have taken a driving license, I can confirm this.
An obvious combination of VOP and VOC into a capability index working for any shape of the distribution is as described in e.g., ISO3534-2 and 22514-2:
Where:
·????????Xy% is the y% quantile
·????????σ is the standard deviation of the process
·????????μ is the mean of the process
The Pp index describes the width of the tolerance relative to the width of the process, defined as the interval that contains 99.73% of the observations. For a normal distributed process, this is 6σ. This index is only dependent on spread.
The Ppk index describes the distance from the mean to the closest specification limit, divided by the width of the process to that side. For a normal distributed process this is 3σ. This index is both dependent on spread and location.
The Qk index sees bias (distance from target) as variance and then evaluate total spread relative to target. This index depends on location, spread and target.
?
Detailed Solution to issues.
1.???Target is not being enforced
The ISO formulas for Ppk do not enforce target. This means that you can have a high Ppk even though you are far away from target, where products will work the best. This typically happens when you have a very narrow process with a high Pp that is biased. Even when you are far away from target, the distance to the closest specification is still small, compared to the width of the process. Ppk is thereby a Yield measure, not a Quality measure.
In Figure 1 is shown four processes that all have a Ppk of 2. Only the one to the left represents good quality, but they will have a high yield.
The Qk index do enforce Target, but it has several issues:
·????????Does not work when Target is =0 (divide by 0)
·????????Bias is treated as variance
·????????Is also dependent on spread
Instead, the Pk index invented for tolerance stack-ups can be used:
In Six Sigma, it is typically required that Pp=2 and that a process has a maximum drift of 1.5σ, before it is detected on a control chart and counteracted. If Pp=2 with a symmetric tolerance the denominator in the Pk formula is 6σ and then a drift of 1.5σ corresponds to a Pk of 0.25. A Six Sigma requirement for Pk is then smaller than 0.25.
2.???With asymmetric specifications Pp is over estimated.
With asymmetric tolerances the distance between USL and Target is not the same as between Target and LSL. Thereby is does not make sense to use USL-LSL in the formula for Pp. It does not help to have a large tolerance in one side, if it is much narrower in the other side. It is the narrowest tolerance that counts. Inspired by the Ppk formula just replacing mean μ with target, Pp formula must be:
3.???USL and LSL are often set tighter than necessary
There is a clear misbalance between the time being used to set specification limits compared to the effort it takes to live up to them afterwards. At customers we often see that there is a very good rationale for targets (where products perform the best), but not so good rationales for the width of the tolerance. Often default widths have been used, later being narrowed in case of non-conformities, where causes could not be found.
Many tolerances have a built-in safety margin coming from previous enforcements, where just a few samples were tested, and the only requirement was that they should be inside specification. This weak enforcement required safety margin in specification limits. Going to a Ppk and Pk requirements, where it is enforced that you never get close to specification limits, there is no need for a safety margin in the specification limits.
If a tolerance stack-up is made, it is often seen that tolerances in general can be widened and redistributed to allow more for the measures where tolerances are hardest to meet.
4.???The requirement for Ppk is often set higher than necessary
We often see very high requirements for Ppk. Often requirements are based on Severity of being outside specification and if Severity of being extremely outside specification is high, requirements for Ppk is set high, e.g., above 1.5.
However, when you are enforcing a Ppk of 1, you are ensuring that most of the products are inside specification limits and those which are not, can only be slightly outside specification. So, it is the Severity at Specification Limit that should be used to set requirement for Ppk not the severity of being far outside. The Severity at specification limits must be low, since a product just inside is considered good. There is no practical difference seen from a customer point of view between a product just inside and just outside specification. With a low Severity, Ppk requirements should be modest.
领英推荐
If you at the same time have used very narrow specification limits as mentioned in previous bullet, the Ppk requirement will be much higher than necessary, leading to non-conformities for no good reason.
In addition, we recommend enforcing Ppk with confidence (what you have proved given your current sample size) in the next bullet and then your true Ppk needs to be somewhat higher than the requirement to leave room for a confidence interval.
For these reasons we do not recommend having Ppk requirements higher than 1. Then you are in practice enforcing that all samples (also the untested) are inside specifications, which should reflect customer needs.
5.???Short term standard deviation is being used to describe the spread.
In the formula for Cpk, in most statistical software packages, it is not the total spread that is being used, but only the short-term spread. The short-term spread is typically coming from within subgroup spread or, if there are no subgroups, based on the average moving neighbor difference. This is not representative, if the process is not in statistical control, which is the case for most processes. To calculate capability with the total spread just use Ppk instead. Fortunately, most people saying Cpk actually mean Ppk, because they have used the total spread in the formula. But if they mean Cpk, the capability they state is based on short term standard deviation and is only valid is the process is in control where short term standard deviation is equal to total standard deviation. So, it is the capability they would have had if the process was in control and Ppk based on total standard deviation is the one they had. If the process is in control Cpk=Ppk. So just use Ppk all the time.
Figure 2 shows an example of a drifting process. The mean neighboring difference is used to calculate the within-standard deviation and thereby it is extremely short term. It is only 0.96 leading to a Cpk of 1.38, which is a fine value. However, there are products outside specification in the end and they are close to being outside in the beginning. This is due to drift. Taking this into account the overall standard deviation is 1.49. This standard deviation is being used to calculate Ppk, that in this case becomes 0.88, which is much more representative for the dataset in question. However, Ppk assumes that data are normal distributed, which they will not be when a normal distribution moves over time, and the accumulated distribution will hence not be normal. In the next bullet we will work on how to handle this.
6.???Wrong estimation of variance when there is more than one variance component
When you have more than one variance component in your data, e.g., between and within batch variance, you cannot just estimate the total variance with the classical formula:
This is due to the n/(n-1) correction factor, which corrects for the fact that you are comparing observations with the sample mean and not the true mean. Observations are obviously closer to the sample mean, than they are to the true mean. For large n the correction factor is close to 1. But when you have more than one variance component, your effective sample size on the total variance is not just n.
If you have B batches with n measurements in each and a variance ratio of between and within batch variance of VR, the effective degrees of freedom DFe can be calculated using Satterthwaite′s approximation:
For more complicated systems, it is easier to just make a variance component analysis in e.g., JMP from SAS and calculate effective degrees of freedom from the ratio total variance Vtot and its standard error SEVtot:
In the example shown in Figure 3, there are 3 batches with 100 measurements in each, but due to the high VR the effective DF is only 2.08 not 297. The total standard deviation is estimated at 7.1, which is much higher than the standard deviation of all data of 5.8, so a serious underestimation of the total standard deviation can easily occur.
?7.???Data are assumed normal distributed even though the shape is clearly not bell-shaped.
In the classical formula for Ppk the denominator is 3σ representing the half-width of the process for which only 0.135% is outside. However, if data are not normal distributed this is not representative for the width. Instead, quantiles should be used:
If the distribution can be fitted with a known distribution, quantiles can be taken from this distribution. If you in JMP from SAS specify a distribution, it will evaluate the quality of the fit and give the corresponding Ppk from the formula above. Other statistical software packages might transform data and specification limits. We do NOT recommend this, since Ppk is basically a ratio of widths, which can become weird in transformed space.
If a good distribution cannot be found, you can always fit with a nonparametric fit like Fit Smooth Curve in JMP from SAS. Here a normal distribution is put around each observation with a standard deviation that is a function of the standard deviation in the dataset S and the number of observations n:
Then any distribution can be fitted, and you can get a reliable estimated Ppk. This is our recommended fast track method.
Figure 4 is shows an example with a multimodal distribution. If Ppk is calculated as if data are normal distributed, it will be fitted with a too wide distribution with long tails underestimating quality, leading to a Ppk of 1.00. Doing the non-parametric fit, Ppk is 1.20, which you also get fitting with 3 normal distributions, knowing that data comes from 4 parallel tracks where 2 can be assumed the same.
Quality can also be overestimated fitting with a normal distribution when data are not normal distributed. Figure 5 is shows an example with bioburden data that are by nature right skewed. Assuming normal distribution the upward tail is too short leading to a Ppk of above 2 even though there is an observation outside specification. The non-parametric fit shown a Ppk around 1, which you also get fitting with a lognormal distribution commonly used for bioburden data.
However, the non-parametric method is just describing your tested samples. If you want to predict untested and/or have a confidence interval on Ppk, we recommend building a prediction model describing your process. Within a prediction model you can enter systematic and random factors to handle pooling of different systematic and random effects to ensure normal distributed residuals. In the event that normality is still not achieved by adding ?systematic and random factors, data transformation can be used.
In transformed space mean and control limits can be calculated from the prediction formula PFT and total standard deviation sT:
A one-sided α of 0.00135 is used corresponding to a normal quantile of 3. To calculate Ppk these limits as well as the mean must be transformed back to the original space:
For a normal distributed process this becomes
If your data have been transformed with the function T, using a software package as SAS from JMP the prediction formula, PF, will automatically be transformed back.
Figure 6 shows the case from Figure 2, where a model is built with time as predictor. Left plot shows the control limits and prediction limits versus time. The right plot shows the predictions converted into Ppk and its development over time.
Figure 7 shows the case from Figure 4 considering a process with parallel production tracks (cavities). A model us build with track as predictor. As seen, track 2 and 4 behaves the same, while track 3 is the highest and track 1 lies in the middle. This explains the need to fit with 3 discrete normal distributions, which will lead to separate limits and Ppk for each track.
Figure 8 shows the case from Figure 5 with right skewed bioburden data. A model is built without predictors and the best Box-Cox transformation found. This has a λ=0 corresponding to a log transformation. Limits gets skewed when transformed back and as seen we get the same Ppk as in Figure 5.
We have made a small script that automates the calculations and visualizations shown above.
8.???No test for outliers to the assumed distribution.
Building prediction models describing the dataset to calculate Ppk, requires that observations can adequately be described by the model.?We recommend using the studentized residual plot in e.g., JMP from SAS that checks for both outliers and variance homogeneity. However, be aware that outliers can be caused by lack of transformation. We recommend looking at Box-Cox transformation evaluation at the same time. If there are outliers, they need to be excluded and reported separately. It is not enough that you have a good Ppk, with the outlier in the model, because the outlier cannot be described by the model. My experience at many different customers is that they have a hard time keeping the outlier rate below 1%, so expect this to be an issue.
?
9.???Ppk is not enforced with confidence.
If a process is normal distributed and in control, a confidence interval for Ppk towards lower (Ppl) and/or upper limit (Ppu) can be calculated since they are following the non-central t-distribution with the non-centrality parameter:
However, rarely a process is both normal distributed an in control at the same time. Fortunately, it can typically still be described by a statistical model and then individual confidence limits (prediction limits XPL) and prediction formula (PF) can be calculated. If they are inserted into the formula for Ppk, in solution 7 instead of control limits, you will get a capability index based on individual confidence limits:
It must be 99.73% prediction limits (one sided α of 0.00135) used corresponding to +/- Z(0.9973)=3 standard deviations for infinite degrees of freedom.
If data by nature is not normal distributed, data transformation can be used to make the residuals normal distributed in the model. In software’s like JMP from SAS prediction limits are automatically transformed back.
If data are not normal distributed due to drift over time, time can be entered as predictor in the model.
If data are not normal distributed due to pooling of parallel tracks (like cavities in injection moulding), tracks can be entered as systematic factor in the model. Then you get a plPpk for each track.
Alternatively, a classical confidence interval can be calculated using the non-central t distribution. Then Ppk is written as:
If only one-sided specifications apply, then:
For two-sided specifications:
As DF is used the effective degrees of freedom. They can be obtained from the variance components via the equation in solution 6.
For a model without random factors, DF will be shown in all software’s. In JMP from SAS you find it in the window Analysis of Variance.
As n is used the effective sample size ne for each group. This is calculated from the total standard deviation stot and the standard error on the prediction formula SEpred:
The two methods give similar, but not identical results. Figure 9 depicts how lclPpk and plPpk approaches the estimated value with increasing sample size. At low sample sizes plPpk is the lowest and at high sample sizes it is higher.
Confidence intervals are known to have the issue that for all sample sizes the risk of being outside is the same, but at low sample sizes you can be much further out than at higher sample sizes. Therefore, we recommend using plPpk.
10. In validation Ppk of the past is demonstrated, not Ppk of the future
The purpose of validation is not to prove that you have made good batches, but that you are going to make good batches. You must predict the future, not just describe the past. This can be done by building a prediction model with batch as random factor, i.e., validation batches are seen as a random sample of all future batches.
Figure 10 shows an example after 3 batches. The 3 batches are fine, which can be seen by the control limits fitting nicely inside specification limits and a corresponding estimated Ppk of 1.53. However, since we only have 3 batches and the variance ratio is high (variance between batches higher than within) the prediction for the future is uncertain. This is supported by looking at the prediction limits which are far outside specification limits and a corresponding plPpk of only 0.27. lclPpk is slightly higher at this low sample size.
If you just want to see if the 3 batches were good, you can enter batch as systematic factor in your prediction model instead. Then each batch will get its own Ppk, where you are just predicting the samples, you did not take in these 3 specific batches. The model will estimate within-batch variation using pooled variance, which limits the requirements for sample size, but to do so one should justify variance equality across batches using e.g. a studentized residual plot.
The result is shown in Figure 11 where it is confirmed that all 3 batches individually are excellent and can be released as both plPpk and lclPpk are high. At the higher effective sample size setting batch as systematic factor, lclPpk is slightly lower than plPpk
As Figure 10 showed, we need to make more batches to finalize the validation. ?Figure 12 shows the result after 5 batches i.e., 2 additional batches compared to Figure 10. Recalculations now show that there is less than 0.0027% risk of having a future observation outside specification limit corresponding to a plPpk is higher than 1.03. Validation is passed!
Senior Medical Device Engineer en ALK
1 年I was thinking… very interesting article. And then I realize is written by the well known in the pharma business Per Vase. Thanks and thank you
Morehouse Instrument Company and Stephen Puryear ??
Six Sigma >> Statistical Process Control >> Non clinical Statistics >> CPV >> Advanced Analytics >> Industry 4.0
1 年Good article on this topic Per Vase
JMP Country Manager, Korea(JMP ?? ???), JMP Division at SAS
1 年Per Vase Thanks for sharing this article! I think that process capability indices are the most widely used metrics for quality assurance but also the most widely confused.