A Robust Approach to the Thompson Howarth Chart for the Estimation of Analytical Precision in Mining & Exploration
Introduction
The Thompson-Howarth (TH) method of estimating analytical precision, first published in 1976 and summarised in 1978, has become a popular means of estimating analytical precision in some sectors of the resource industry. TH present two approaches, method 1 which is applicable where the number of data pairs is >50, and method 2 is applicable where the number of data pairs is <= 50.?The focus of this article is on method 1.
The aim of this discussion is to demonstrate that by using a linear regression line based on robust statistics, the TH method 1 produces a more reliable estimate of precision.
Discussion
The fundamental assumption underpinning the TH method is that analytical errors follow a Gaussian (aka Normal) distribution. However, TH temper this with several examples where a Gaussian distribution may not apply and will unduly introduce bias. This includes nugget gold and the heterogeneity of the sample. For the former, there is an excellent discussion on the limitations of the TH approach by Stanley (2006). Geological variation is something I cover in a separate article (The Impact of Geological Variation when Quality Controlling Precision). Be mindful of this, a duplicate is not a duplicate in the sense for required for assessing precision.
The steps for their method are:
Let’s look at a typical presentation (in this case for iron ore):
If there are large ranges in concentrations, a log-log plot is used (I have completed a log transform of the regression lines, to maintain linearity):
Many will compute an ordinary least squares regression line. However, the fundamental statistic for this is the mean and the mean is influenced by outliers. Just one outlier can materially affect the mean and therefore influence the slope (and intercept) of the regression line. I discuss this in more detail in another article (Robust Least Squares Regression: A 'Best Fit' Line Resistant to Outliers).?
For brevity I will focus on Sampling Stage (field duplicate). I have hidden the raw pairs:
The OLS line is being influenced and therefore pulled down by the point to the right. Because of this, the OLS has produced a high intercept.?Here is a TH precision v concentration chart based on the OLS statistics:
Two comments:
Looking at the data in another way:
Control lines are derived using statistical quality control techniques, which I cover in a separate article (QC Analytical Results Precision: Identifying Outliers. A Statistical Approach). I could exclude the outliers, but for this article, I will leave as is.
Reverting to the previous TH chart, the RLS regression line is not influenced by the one outlier identified because the fundamental statistic underpinning robust statistics is the median. The data would need to have more than 50 percent of values to be outliers before breakdown. For this reason, the median is a robust statistic. It is resistant to the influence of outliers. Here is the same chart with intercept computed using RLS (displayed in the title):
The intercept is much lower than that produced by OLS. Here is the Precision v Concentration chart based on RLS statistics:
PDL is now much closer than the reported detection limit by the laboratory, compared to that computed from OLS. Moreover, the relationship of the curves more properly reflects what we might expect for the various sampling stages.?
Finally, the choice of partition window is arbitrary. TH suggest 11, however, this could be increased for larger datasets. The larger the window, the less the potential influence of outliers. Below I used a partition of 15. Notice OLS and RLS sit close together:
领英推荐
Wider partition, less points, but better correlation.
Conclusion
In this article, I hope I have demonstrated that the use of a robust statistics to compute the regression line in the TH method 1 is more reliable because of its resistance to the influence of outliers. An RLS line is more likely to produce a realistic practical detection limit and precision at various concentrations.?
However, it is always important to ensure that there is no undue bias in the source data due to geological variation or sampling, for example. In which case these need to be understood and investigated before the TH approach to estimating precision should even be considered.?The fundamental assumption of TH is that measurement errors are normally distributed. However, where this is not the case, it may produce biased results.?
If you see value in this article, please like or share with your connections.
Chart Source
All charts developed using the R programming language and 'Shiny'. A cloud version is available to explore.
LinkedIn Groups
If you are involved with quality assurance and control in the resource sector, I would encourage you to join and actively participate in the following LinkedIn group:
Discussions in the group relate to ‘whole of mine’ quality assurance and quality control.
If you are interested in the application of the R or Python programming languages in the resource sector, I would encourage you to join and actively participate in the following LinkedIn group:
Where Next?
References
Stanley C. R. 2006. On the special application of Thompson–Howarth error analysis to geochemical variables exhibiting a nugget effect, Geochemistry: Exploration, Environment, Analysis, 4, 357-368. https://doi.org/10.1144/1467-7873/06-111
Thompson M., Howarth R. J. 1976. Duplicate Analysis in Geochemical Practice (Parts 1 and 2), Analyst. 101. 690-709.?
Thompson M., Howarth R. J. 1978. A New Approach to the Estimation of Analytical Precision, Journal of Geochemical Exploration. 9. 23-30.
Consultant in Mineral Exploration, Applied Geochemistry and Environmental impact assessment
4 个月Thank you for your article.
Geoscientist | Data Scientist | Business Intelligence Analyst | Python | Tableau | Power BI | SPSS | Team Integration | Data Visualization | Data Quality Assessment
1 年Great
Principal Geologist at AMC Consultants (UK)
3 年Paul, Do you apply some sort of detection limitation to the lower limit values? And how would you propose to handle below detection values? Apologies for the simplistic questions
Database Manager at Donlin Gold LLC
5 年Paul, I'm curious how to address issues plotting the precision v. concentration plot when the results present a negative y-intercept from the regression line.
Head of Innovation @ENVISOL
6 年Hello Paul, nice article - thank you for sharing ! How did you come up with the value of 11 for partitionning ? Is it a rule of thumbs, or a minimum number of samples for a median to make sense ?