Cpk and Ppk: Process Capability Insights
Md. Abdur Rakib
Statistics, Process improvement, Process control, Stability, Shelf life estimation, Trend analysis, Extrapolation, CAPA effectiveness verification, Root cause analysis, Investigation, Continued process verification, DoE
The prevalent definition for Cpk and Ppk typically characterizes Cpk as the short-term capability of a process, while Ppk is seen as the long-term capability. However, these statistical indices encompass more nuanced insights, necessitating a comprehensive understanding of process and capability statistics. Accurate assessment and interpretation of data are essential to unveil the actual state of your process.
Four conditions must be satisfied to make process capability and performance statistics meaningful metrics
1. The sample must be truly representative of the process.
2. The distribution of the quality characteristic must be Gaussian, i.e. the data can be normally distributed in a probability curve. If the data does not conform, the question is: Can it be normalized? Various analytical methods are used to potentially normalize data or apply a non-parametric analysis.
3. The process must be in statistical control. In other words, is it stable and its variation generally random (common cause)?
4. The sample must be of sufficient size to build the predictive capability model. How big of a sample size? That is the wrong question. Remember that a statistic, when applied correctly, is just an estimate of the truth. The right question to ask is: How much confidence do I need in the estimate?
See Figure 1?and?Figure 2 to get a sense of what is being described.
Then what is the scenario of the above two figures? As the sample size increases, Cpk=Ppk !!
What exactly are Cpk and Ppk?
Cpk is a snapshot or a series of snapshots of a process at specific points in time and is used to assess the “local and timely” capability of a process. Think of Cpk as more of a point of insight into a much larger, future population of process data.
Cpk is comprised of measurements produced as rational sub-groups. A subgroup is a series of measurements that represent a process snapshot. They are best taken at the same time, in the same way, in a controlled fashion. For example, Tablets are made in a continuous process. 100 Tablets are made every hour and a hardness is to be evaluated in a capability study. 10 Tablets will be sampled every hour for six hours. This means that we will have six subgroups of 10 Tablet measurements for a total of 60 measurements for the day.
领英推荐
The process capability metric for 60 data points will be calculated with a within-subgroup variation of the hardness measurement. It will not account for any drift or shift between the subgroups. This discrimination will be critical to understanding the difference between Cpk and Ppk.
Ppk will indicate what the?potential process?may be capable of in the future. In this case, calculating Ppk on the 60 measured data points will give us an estimate of the overall variation of the critical hardness measurement.? Ppk includes subgroup variation and all process-related variation, including shift and drift. This is another vital discrimination: Cpk includes only common cause variation, whereas Ppk includes both common and special cause variation.
Potential difference between Cpk & Ppk
In our example, there were 600 tablets made. 10 out of 100 were measured each hour for six hours. 60 data points were accumulated. There are six subgroups of 10 data points. Parts are made on one machine, by one operator, the same way on the same day. We can see that the average standard deviation within subgroups is very close to the overall standard deviation. Therefore, in this particular case, Cpk and Ppk should be very close in value – and they are.
Note:?This example is a two-tailed distribution based on the upper and lower control limits (specification limit). Therefore, Cpk and Ppk is taken as the lessor value of the two calculations.
Here is where the difference between?within?and?overall standard?deviation is apparent. If we pull 30 measurements out of the 60 total that were taken, irrespective of time order, to make 3 unique subgroups, there could be wildly different results (we would never, ever do this). The difference between the lower and upper Ppk values (1.72 vs. 1.37) is +25%. However, the difference for Cpk is (3.13 vs. 1.39)? +125%!. Cpk calculates the standard deviation of each subgroup and pools the results, whereas Ppk calculates the standard deviation from all data as one continuous matrix.
This example shows the importance of truly understanding the data you are investigating. Ask: How is the data taken? How stable is it? Where is the variation coming from? Is the data trustworthy? How was it organized? Are there enough data points? Will this process generate like results, lot after lot, time after time? Is the statistical confidence level of the capability indices acceptable?
?
?