The Hidden Secrets of “failed” DoE’s Part Three: Lack of Fit
In conversations with users new to DoE we occasionally get statements that they were unable to solve their problem, with messages such as “my DoE failed, I was not able to fit a model” or “none of my factors were significant” or “my DoE didn’t work because I had missing runs, so I switched back to one-factor-at-a-time to get results”.
“Failed” DoE’s can have several reasons and it is helpful instead to focus on what was learned. Learning is an incremental process which fits very well to the general advice to consider DoE’s as a step-by-step approach instead of one-shot magic. Let us explore some common challenges which might occur when using DoE and how they can be used to explore the process of interest and raise important questions to ask. The last part of this short blog series will be about the lack of fit, what it might or might not mean for the prediction model and how to interpret it.
From Lack of Fit to Fit: A Detective Tale
A team of R&D scientist is projected to develop and optimize a new biochemical process. After planning the DoE, conducting the experiments and data analysis the model looks very promising. High R2, low confidence bands and lots of significant model terms make them confident to have a good model (Fig. 1). Low process times would boost the output in production and the processes revenue even further but the model’s predictions when choosing low process times is not matching the real-world behaviour with way lower yield than expected. Shorter process times not only would ramp up the products revenue but would also help the company to reach their sustainability goals by lowering the overall energy consumption of the process, so they start to investigate why the model performs not optimally in some areas of the design space.
The significant lack of fit test (LOF) included in many fit model reports seems to indicate some potential issues on how good the model fits the data. The test uses the error between replicated runs and compares it with the lack of fit of the different design points and the model’s prediction. A significant result means that the lack of fit error is much larger than the pure error between replicates. After a few meetings with engineers with domain knowledge working in production for other comparable processes the scientists learned that the optimal temperature and the peak of its curve might change considerably with different process times. With this knowledge they start to dig into their data again by plotting residuals against the temperature and time. Figure 2 shows the residuals against the squared temperature values for different process times and might indicate some non-random patterns.
Here the model seems to lack an important term to account for different process behaviour at low process times which can be seen in the significant LOF in the model report and in the non-random patterns of Figure 2. This behaviour can also be seen by plotting the measured yield against the squared temperature like it was done in Figure 3 which indicates the different process behaviours when changing the temperature for different process times.
In this fictitious case the designs model was missing an admittedly rare quadratic interaction between the quadratic term of temperature and the linear term of time. Figure 4 shows the true response (blue) compared to the fitted model’s surface (brown) highlighting long process times (left) and short process times (right). For long process times there is a noticeable difference between the peak curve of the model’s surface (brown) and the constant rise of yield for the true response surface (blue). In this case the significant lack if fit test is referring to this difference.
Including replicates into the design does not only help with pure error estimation but also enables the lack of fit test. So, what might be the different causes for a significant test result? The obvious one is that the model is not fitting the data well. In this example the scientists were fortunate and were able to just add the missing effect term to the already existing model to get much better predictions at short process times. Figure 5 shows the profiler of the model build on the initial design (top) and the model built by adding the missing term (middle) as well as the revenue function which is dependent on the real yield and process time. The new model does much better in addressing the different curve behaviour for temperature when changing the process time. With the new model a revenue increase of over 50?% compared to the initial model might be archivable by finding the optimal revenue area at high temperatures and low process times. At the same time the company mad a big step forward to archive their existing sustainability goals by lowering the process time and reducing the energy consumption by maintaining the temperature needed for the process for the shortest time possible.
Nevertheless, significant lack of fit tests can also occur due to small replication errors while the model is predicting very well despite the significant test result. In some cases, the replicated runs to estimate the pure error might not be true independent replicates but could also be second measurements of the samples of the initial runs with the same factor settings. This can result in small pure errors and might be worth a check-up.
The example above is a quite special case to illustrate lack of fit. As already stated, a significant LOF does not necessarily mean that the model is unusable. For instance, the initial model would have predicted the global maximum of yield when choosing mid settings of temperature quite accurately despite the significant LOF (without accounting for the impact the process time has on revenue and sustainability). A model which does what it initially was designed for like suggesting large enough process improvements or time/cost savings with confirmation is useful despite significant LOF. LOF can be seen as an additional way to investigate the data already there and ask questions.
Summary
Significant LOF can indicate a mismatch between the model’s behaviour and its underlying data. Adding the missing effect terms to the model or augmenting the design with additional runs to estimate the needed model terms might help to find a more refined model. It is important to note that it can also have other reasons like (in some cases artificially) low pure error. The data and its model might still be useful to solve the problem.
Not every DoE is a success in the first place. No model is perfect. But there is valuable information in every planed and structured approach like DoE even when the first set of experiments “failed” or is not perfectly matching expectations. Every next step using this information will lead further to real process understanding to be finally used to add business value, decrease costs, and close projects successfully and in time.
AI and Analytics Leader with a specialized focus on Consumer Industries
2 天前Nathaniel Leies #predictumacademy??
Insightful
JMP Technical Manager, Europe
2 天前Residual plots are a great data detective tool for helping identify unexpected non-random behaviour, thanks for sharing.
Data Scientist at Novartis | small molecules formulation | Passionate about Science: Data science, DoEs, ML, Statistics, PAT and learning new | Physicist
3 天前Great one! Once I run into DoE analysis that showed significant lack of fit and the model with even more rare term -linear term times quadratic term (A*B*B) was showing significance. However, this was at the end due to an outlier in the data set :)
Helping people solve problems and uncover opportunities / JMP Sr. Systems Engineer
3 天前Great example on LoF.. I completely agree that a ‘failed’ DOE still provides information, and I would add even more information than a ‘failed’ OFAT. It's also true that understanding this information may require an intermediate skill and that's where we're always happy to help, aren't we?