登录查看更多内容

A 'Best Fit' Line Resistant to Outliers when Assessing Precision in Mining & Exploration

Paul Fell

Accredited acQuire NOVA Network Partner | GIM Suite Specialist | QAQC Specialist | Power BI

发布日期: 2017年11月16日

This article is aimed at QAQC Geoscientists who may not have a theoretical background but need to be confident that the tools that they use have a sound statistical basis. However, it will be of interest to anyone who wishes to chart a linear trendline which is resistant to outliers. The article is the first of a series of articles which focus on the practical aspects of Statistical Quality Control when assessing accuracy and precision.

In this discussion I will also focus on two other types of regression lines used for inspecting the linear relationship between paired data (duplicates) when quality controlling precision of analytical results:

Ordinary Least Squares
Reduced Major Axis.

The discussion includes no formulae but rather the practical aspects of using these regression lines. There is plenty of background on the internet for those more into theory.

One of the most prolific charts used for quality controlling precision of analytical results is the scatter plot. The chart provides an ‘at a glance’ view of precision and, importantly, allows for the visual identification of outliers. For example, the chart below also displays warning and error control lines and an X=Y line:

It is also usual for a regression line to be overlaid (not shown in the above image) which is the ‘best-fit’ line to the data.

Ordinary Least Squares (OLS)

OLS is by far the most familiar and the most commonly used. However, a fundamental statistic used in the calculation of the OLS line is the arithmetic mean, which is a measure of central tendency. The robustness of a statistic is its resistance to the influence of outliers. The breakdown point of a statistic is the proportion of outliers that the statistic can handle before affecting or invalidating the statistic. The higher the breakdown point, the more robust the statistic. In the case of the mean, the breakdown point is 0, because the mean can be made large or small by changing just one value in the data from which it is derived. This influence can be seen clearly in the charts below:

Another aspect of OLS is the assumption that there is a dependent variable and independent variable. For example, if I were to plot the selling price of a used car (y) against age in years (x), over time selling price will go down:

Selling price is dependent on time (age).

OLS attempts to minimise the error between the dependent variable and independent variable as shown below:

However, sample 2 is not dependent on sample 1 or vice versa. They are two samples taken at the same time (usually) and place for the analysis of precision. For example two quarter cores from the same interval or two samples from the same sample pile (RC cuttings). Neither is dependent on the other. Sometimes the terms primary sample and secondary sample or original sample and check sample are used. Neither are appropriate terms. Sample 1 and sample 2 are duplicates or paired samples.

Reduced Major Axis (RMA)

RMA addresses the limitations of OLS by reducing the errors associated with both variables by minimising the sum of the area of right triangles whose legs are the horizontal and vertical deviations. The method is also called geometric mean regression (and other names):

Below I overlay RMA and OLS regression lines:

In general, RMA will perform better than OLS. However, as you can see from the above, it too is still influenced by outliers. This is because, the mean and standard deviation of the two sets of data feature in the RMA algorithm.

Robust Least Squares (RLS)

I mentioned earlier that, the mean is a measure of central tendency. Another measure of central tendency is the median. The median is the middle value of an ordered set of values. In terms of robustness, it would take 50% or more of values to influence the median. The breakdown point for the median is therefore 50%. For example, consider a set of ordered values:

0.3, 0.4, 0.6, 1.2, 1.3, 1.6, 2.1, 2.1, 2.3

The median value is 1.3. If the last value is changed to 8.7 instead of 2.3, the median still remains at 1.3. I could replace the last two values with 8.7 and 10.5 respectively and the median will remain unchanged at 1.3. The median is a robust statistic. It is highly resistant to the influences of outliers. Incorporating the median, instead of the mean, into the computation of a linear regression line, allows us to build a line which is largely resistant to outliers. Providing, of course, the breakdown point is not reached. Below are the same charts used above, but this time with the RLS regression line included:

The RLS line exactly overlies the X=Y line. It is unaffected by outliers. Having said that, there will be occasions where RLS will fail, for example when the breakdown point is exceeded.

There are a number of different methods for calculating a regression line using the median (robust regression). The method used in these charts is called ‘Method of Repeating Medians’. Unlike other methods ('Least Median of Squares' for example), the algorithm is simple, not iterative and fast.

To finish, here are some charts with all the three regression lines overlaid, for differing sample check stages:

In the Scatter for the Pulverising check stage, RLS is also affected slightly, but it still out performs RMA and OLS for robustness. In the case of the Sampling Stage it performed well even with sparse data.

The content in this article represents my own personal thoughts. Constructive feedback is always appreciated. Please share and/or 'like' this article if you found it interesting and feel it will be of benefit to others.

All charts produced using the acQuire 4 QAQC object from within GIM Suite. Data is real.

Become Part of the Collaboration

If you have R experience and would like to volunteer to bring cloud QAQC to the industry, click here.

Where Next?

benyounes maamar

4 年

Hi Paul. The global idea is good. But if we you use the RMA, is it means that our duplicate analysis results are good? Is it possible to have, un example with a Excel sheet?

Mustafa KAPLAN MSc. EurGeol

Senior Geologist - Consultant

5 年

Hi Paul,? I have two sets of measurements which are expected to be similar. But they also have weighting factor of each couple. Do you think we can calculate weighted RLS? I would be happy if you could send me the workbook. [email protected] Thanks in advance

Horacio Puigdomenech

Exploration Geologist en Independent Consultant

6 年

Paul thanks for this tool. Could you send to my mail. [email protected]

Campbell Mackey

Exploration Consultant - Copper, Gold, Lithium, Anything

7 年

Will this work on a LiDAR dataset of 140 million records for instance? Spatial imaging is often a fast way to spot outliers, when we are talking a function of two variables scenario. Then experience can determine which outliers are not realistic data values - they can be removed according to spatial extents or magnitude. Most imaging systems do a top and bottom cut by default so as to optimise dynamic range for most of the data.

Paul Fell

Accredited acQuire NOVA Network Partner | GIM Suite Specialist | QAQC Specialist | Power BI

7 年

I have an Excel workbook which displays two robust regression lines (Repeating Medians and Least Median of Squares) using VBA. Contact me directly if anyone is interested.

3 次回应

查看更多评论

要查看或添加评论，请登录

Paul Fell的更多文章

The Application of Statistical Quality Control to Assess the Quality of Analytical Results in Mining & Exploration

2021年2月8日

The Application of Statistical Quality Control to Assess the Quality of Analytical Results in Mining & Exploration

The quality control of analytical results (assays) in our industry presents numerous challenges. For example…

9 条评论
Announcing Browser-based QC Analytical Results

2019年11月21日

Announcing Browser-based QC Analytical Results

Since I last posted about the QC Cloud Project I have been involved with, it has evolved from a pure demonstration…

13 条评论
North West England: Mining and Exploration industry personnel - New Group announcement

2019年9月3日

North West England: Mining and Exploration industry personnel - New Group announcement

If you are employed in the Mining and Exploration industry in North West England, this group is for you. In many…
Group Announcement: Mining, Exploration and Processing

2019年7月17日

Group Announcement: Mining, Exploration and Processing

For those involved with QAQC and geoscience data management, there are two groups you may be interested in: QA and QC…
Using R (Shiny) to QC Analytical Results in Mining & Exploration

2019年4月10日

Using R (Shiny) to QC Analytical Results in Mining & Exploration

This short article relates to a screen recording I posted November 2018: https://www.linkedin.

9 条评论
Using R and Shiny to Quality Control Analytical Results in Mining & Exploration

2018年11月30日

Using R and Shiny to Quality Control Analytical Results in Mining & Exploration

I am sharing a screen recording which demonstrates using the R open source language to quality control analytical…

6 条评论
A Robust Approach to the Thompson Howarth Chart for the Estimation of Analytical Precision in Mining & Exploration

2018年9月9日

A Robust Approach to the Thompson Howarth Chart for the Estimation of Analytical Precision in Mining & Exploration

Introduction The Thompson-Howarth (TH) method of estimating analytical precision, first published in 1976 and…

8 条评论
Group Announcement: R and Python in Mining, Exploration and Processing

2018年6月17日

Group Announcement: R and Python in Mining, Exploration and Processing

This new group has spawned based on discussions in the LinkedIn Group: QA and QC - Mining, Exploration and Processing…
Identifying Outliers when Assessing Precision in Mining & Exploration

2018年4月24日

Identifying Outliers when Assessing Precision in Mining & Exploration

Introduction Two methods for the statistical derivation of control lines to identify outliers when assessing precision…

13 条评论
Making a Difference

2018年4月1日

Making a Difference

I wrote the original article 13 months ago, but out of respect for the family’s request to preserve privacy, I have…

4 条评论

See all articles

社区洞察

Mining Engineering

What are the most common errors in sampling and assaying, and how can they be avoided?

A 'Best Fit' Line Resistant to Outliers when Assessing Precision in Mining & Exploration

Paul Fell

Accredited acQuire NOVA Network Partner | GIM Suite Specialist | QAQC Specialist | Power BI

Ordinary Least Squares (OLS)

Reduced Major Axis (RMA)

Robust Least Squares (RLS)

Become Part of the Collaboration

Where Next?

Paul Fell的更多文章

社区洞察

其他会员也浏览了

UNDERGROUND TRANSPORT IN GOLD MINES

MODEL VALIDATION IN MINERAL RESOURCE ESTIMATION

OCEL 2.0: Enabling Object-Centric Process Mining

Three Common Myths About Predictive Maintenance in Mining

How to win at AI transformation in mining

Application of Data Mining in Oil & Gas Facilities

A Detailed Exploration of Titans and Their Potential Impact on the mining industry

Critical Analysis of CHN Scoping Study: Pricing Assumptions and Barriers to Entry

Consistency in Core Logging: A Pillar of Mining Success

Simple ways to Incorporate Fragmentation analysis to your mine: Advantage in view

Ordinary Least Squares (OLS)

Reduced Major Axis (RMA)

Robust Least Squares (RLS)

Become Part of the Collaboration

Where Next?

Paul Fell的更多文章

The Application of Statistical Quality Control to Assess the Quality of Analytical Results in Mining & Exploration

Announcing Browser-based QC Analytical Results

North West England: Mining and Exploration industry personnel - New Group announcement

Group Announcement: Mining, Exploration and Processing

Using R (Shiny) to QC Analytical Results in Mining & Exploration

Using R and Shiny to Quality Control Analytical Results in Mining & Exploration

A Robust Approach to the Thompson Howarth Chart for the Estimation of Analytical Precision in Mining & Exploration

Group Announcement: R and Python in Mining, Exploration and Processing

Identifying Outliers when Assessing Precision in Mining & Exploration

Making a Difference

社区洞察

其他会员也浏览了

UNDERGROUND TRANSPORT IN GOLD MINES

MODEL VALIDATION IN MINERAL RESOURCE ESTIMATION

OCEL 2.0: Enabling Object-Centric Process Mining

Three Common Myths About Predictive Maintenance in Mining

How to win at AI transformation in mining

Application of Data Mining in Oil & Gas Facilities

A Detailed Exploration of Titans and Their Potential Impact on the mining industry

Critical Analysis of CHN Scoping Study: Pricing Assumptions and Barriers to Entry

Consistency in Core Logging: A Pillar of Mining Success

Simple ways to Incorporate Fragmentation analysis to your mine: Advantage in view