登录查看更多内容

Linearity studies are often not done right

Per Vase

Managing Consultant at NNE , Denmark

发布日期: 2023年7月26日

Seeing how linearity studies are typically made, we have identified several issues that need attention. More than 10% error in estimated Concentration can easily happen if you are not aware of these.

Even though there are excellent guidance’s on how to do it right, like CLSI EP-06, they are clearly not always being followed.

Issues

The main issues we see are:

1.???Linear regression assumes that Precision is the same at all levels, requiring the Assay to have constant standard deviation. However, most Assays are closer to having a constant Coefficient of Variation (CV), i.e., constant relative standard deviation.

2.???R-Squared is being used as criteria for linearity of Assay. Although linearity influences R-Squared in a linear regression, it depends as much on Assay Precision and how much the concentration is varied.

3.???If Assay is found non-linear within the approved range of concentrations, this non-linearity is added to the measurement uncertainty for the Assay, thereby increasing it.

Solutions

The solutions to the issues above are straight forward:

1.???Build a regression model with the measured area as response and concentration as discrete predictor. Thereby the residual variance for replicates can be studied versus level. A Box-Cox transformation will give you the optimal power transformation λ. Then variance depends on level as shown in bullet a. below. If λ≠1, variance depends on level and weighted regression is needed with inverse variance, i.e., the formula in bullet b.:

2.???Fit data with a polynomial instead and use Information Criteria′s as stopping rule (we recommend Akaike Information Criteria corrected; AICc) to see if a polynomial fit is better than a linear fit. Polynomial fits will typically get a lower loglikelihood than a linear fit, as it has more fitting parameters, whereas AICc is loglikelihood corrected for the number of fitting parameters.

3.???If the Assay turns out to be non-linear, there is no reason to add the non-linearity to the measurement uncertainty of the assay. Instead use the polynomial fit to convert Area to Concentration.

Below is given an example on how to do this in the statistical software JMP from SAS.

Example

An Assay was tested by 5 times diluting a reference sample of known concentration a factor of 2. At each dilution the Area was measured in triplicate as shown in Figure 1.

An example using unweighted linear regression as is shown in Figure 2. If using a success criterion for linearity of R-Squared > 0.98, one would (erroneously) have concluded that the test for linearity was passed.

However, looking at the regression 2 things related to the issues mentioned above can be seen:

1.???The triplicate variation clearly increases as expected with level.

领英推荐

Ordinary Least Squares

Marcin Majka 2 个月前

How to Deal with Multicollinearity?

Mohammad Arshad 2 年前

"Dynamic Approach to Tackling Multicollinearity &…

Vaidyanathan Ravichandran 1 个月前

2.???There is a clear curvature on the residuals

1 Triplicate variation depends on level

To investigate triplicate variation, a model is built with Area as response and discrete Concentration as predictor. The only thing this model cannot predict perfectly is the triplicate variation.

Figure 3 shows that the best Box-Cox λ is close to 0. Since the Assay is expected to have a constant CV, a λ of 0 is chosen, and according to the proposed solution previously described, a weighted regression will be adopted using Area-2.

2 R-Square should not be the acceptance criteria for linearity

According to CLSI EP06, a linearity study starts with making a 3rd order polynomial fit, to see if the 2nd and/or 3rd order terms are statistically significant, by looking at P-values. However, P-values are very dependent on sample size and noise level on top of non-linearity. We recommend using AICc instead, as mentioned in the solutions section above.

Figure 4 shows 3rd, 2nd and 1st order polynomial fits weighted with Area-2. The 2nd order fit has the lowest AICc (minus loglikelihood corrected for number of fitting parameters) and is therefore the fit to go for. The first order fit has by far the highest AICc. i.e., by far the worst fit.

In the modeling above observations are assumed independent, but there is a risk that the triplicates for each concentration might not be e.g., independent dilutions. In that case DF will be over estimated. When you have multiple observations on experimental unit and there is a risk of dependencies, you have 2 options to get DF right:

Make the model on the means of the duplicates
Enter experimental unit as random factor in the model

We prefer the last method using all observations. This is especially important for detection of outliers, that might be hidden by a mean. The result of entering Concentration as attribute (triplicate group) random factor, is shown in figure 4.a.

It is seen that the triplicate groups can be assumed the same (variance component 0 for Triplicate Group) and thereby observations can be assumed independent. P-Values are exactly the same as in Figure 4.

3 Conversion of Area to Concentration with non-linear fi

In the use of the Assay, an Area is being measured and the goal is to get the Concentration. A model can then be fitted with Concentration as response and Area as predictor, again weighted with Area-2

Figure 5 shows this model including the equation converting Area to Concentration taking non-linearity into consideration.

4 What difference does it make?

Figure 6 shows that the difference between the wrong linear unweighted fit and the more proper weighted curved fit is 1.77 vs. 1.60, at the nominal concentration. This is a difference of more than 10%.?

Theo Levine

Laboratory Scientist Apprentice at F-star Biotechnology

1 年

What are your thoughts on the spacing of standards during linearity studies? I see in this example, and many others, that the serial dilution approach is used resulting in uneven spacing of calibrator points. I have read that this can lead to some points exerting greater leverage on the regression than others, and therefore even spacing of standards is ideal.

1 次回应

Stan Altan

Statistician at Johnson & Johnson

1 年

We shouldn’t forget the role of variance function estimation for providing optimal weights under a variety of calibration and inverse regression applications. Good frequentist approaches have been proposed for this purpose. They do require some specialized programming in some cases. The Bayesian approach has not been widely discussed but that is on the horizon.

1 次回应

Troels Schwarz-Linnet

Senior Data Engineer at Novonesis

1 年

It would be so great if "we" could stop using R2 as an estimator of fit... And instead using Std. Dev. for each fitted parameter. And then one could apply the rule of thumb on the Relative Std. Dev. And when we are at it, please also kick p-values out of the scene. It's such a weak distribution to compare to, that you most certainly are more wrong than right. Per Vase , in your fitting it looks like you are using all points in the fitting. This will affect the degrees of freedom. How about finding and average and std. Dev. for each X and then use this in fitting. Then DOF is alot lower.

4 次回应

Albert Lesire

Manager ECL QC Support GMP Specialist , Black belt project management

1 年

Validation criteria for QC quantitative analytical methods, in allot of cases, allow for 10% differences of nominal value. Because linear regression produces a <= 10%, validation will pass and no efforts are put into using a more suitable fit, which can result in future difficulties with regards to method performance and results generated using the linear fit for the regression. This can be best manged by having a strategy/ mapping how too choose and impliment the best fit for the regression analysis. Stategy can than be captured in validation and feasibility procedures.

John Campbell

available for remote consultant positions immediately

1 年

I have always thought that for 4 pl and 5 pl fitting algorithms r squared was meaningless especially when looking at residual impurity assays. For me better to look at best fit and residuals . I am not a statistician or mathematician just an analyst who has tried to develop all types of "fit for purpose"assays over many many years

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Linearity studies are often not done right

Per Vase

Managing Consultant at NNE , Denmark

Issues

Solutions

Example

领英推荐

1 Triplicate variation depends on level

2 R-Square should not be the acceptance criteria for linearity

3 Conversion of Area to Concentration with non-linear fi

4 What difference does it make?

更多精彩文章

社区洞察

其他会员也浏览了

How to deal with Multicollinearity?

Model Magic: The Wizarding World of Predictive Models

10 Assumptions of Linear Regression

Multicollinearity in Linear Regression

Comparison of Dimensionality Reduction Methods

R Linear Regression

Q. How to choose the best-fit among various Statistical Models ?

Can Machines Predict Your Future? Exploring the Power and Limits of Regression

Confidence Interval without Bayesian Stats

"NO" need to check for multicollinearity or remove correlated variables explicitly when using decision trees.

Issues

Solutions

Example

领英推荐

1 Triplicate variation depends on level

2 R-Square should not be the acceptance criteria for linearity

3 Conversion of Area to Concentration with non-linear fi

4 What difference does it make?

How to calculate and use control limits right.

2023年7月27日

Cpk a great concept, not being used properly

2023年7月24日

Are you having issues with Shelf-Life Studies?

2023年7月13日

Do you have too many OOS results?

2023年7月5日

Why value creation with statistics often fails and how to avoid it.

2022年9月30日

Databased road map to continuous control strategy and real-time release.

2018年12月18日

How to verify product quality using measurement systems with borderline precision compared to tolerances?

2018年12月17日

How to reduce QC costs

2018年10月9日

Prediction intervals in JMP. The answer to many challenges in validation and batch release

2018年9月28日

Modern Process Validation with 3 batches

2018年7月10日

社区洞察

其他会员也浏览了

How to deal with Multicollinearity?

Model Magic: The Wizarding World of Predictive Models

10 Assumptions of Linear Regression

Multicollinearity in Linear Regression

Comparison of Dimensionality Reduction Methods

R Linear Regression

Q. How to choose the best-fit among various Statistical Models ?

Can Machines Predict Your Future? Exploring the Power and Limits of Regression

Confidence Interval without Bayesian Stats

"NO" need to check for multicollinearity or remove correlated variables explicitly when using decision trees.