Parker Challenge Thoughts (the first bit)
Well, the inaugural Parker Challenge has been completed and the winners announced. Congratulations to Kirsty Sheerin and the team from Hot Chili.
If you were at MREC 2023 hopefully you saw some of the results of the challenge and my early takes on what we are learning about the person-to-person variation in resource estimation. I know that a lot of people couldn’t make it so I’ll try and summarise the outcomes and add some more thoughts on what it all means.?
There’s a lot to unpack. The 45 minute presentation and panel discussion only scratched the surface.
If you’ve not heard of the Parker Challenge before here’s the general idea. Give the same basic resource data to anyone interested in participating and ask them to both estimate and classify the mineral resource. One data set, many independent estimates, how much difference can there be??
Why? Because this is part of understanding resource uncertainty. Our reporting codes (like the JORC Code) are risk-based reporting frameworks that require the Competent Person to assess and judge certainty and risk. And, while that is a laudable objective it’s incomplete. As Kahneman, Sibony and Sunstein (2021) say in ‘Noise, a flaw in human judgement’,?Wherever there is human judgement, there is noise. Let’s face it resource classification is an inherently noisy activity. We all have different ways of interpreting the Codes. We all have different experience, different expertise and different perspectives.?
The main message from this first Parker Challenge is simple.
We have variation in the estimation aspects of mineral resource modelling.
We have variation in the classification aspects of mineral resource reporting.
When estimation variation and classification variation combine the impact can be large. Very large. Extremely large.?
This should not be news to anyone. Yet for some reason we have this belief that calling something a Measured Resource or and Indicated Resource or an Inferred Resource absolves us from considering the person-to-person variation. Maybe that’s due to a false consensus bias or the curse of knowledge and we assume that simply because we know something everyone else does as well.?
Let’s look at some of the information that comes out of the Parker Challenge entries. It’s informative and interesting but I’m very much aware that we had a limited number of entries and that the sample could have a fair bit of selection bias (more on that later). So, with that caveat in mind here we go..
There are several ways to assess the entries and try and gauge the person-to-person variation. We can look at the entire specified volume which included large regions where there were no drill holes and we can look at a restricted volume where the drill hole data should provide a better constraint on the estimates. Here’s how that looks.
For Measured+Indicated+Inferred (MII() with in the entire volume the estimated had a deviation of -372% to +300% (copper metal) compared to the average. That is, the lowest entry predicted more than three times less copper than the average of all entries and the highest entry predicted almost exactly 3 times the average. Let’s acknowledge there are probably some bad outliers and remove the highest 2 and lowest 2 entries - how does that change things? The range reduces to approximately +/-200%.?
Just to really scare you here's the comparison of grade-tonnage curves...
Next, why happens if we remove material classified as Inferred? Looking at Measured and Indicated ONLY and again excluding the two highest and two lowest. Now the range is -219% to + 178%. Not that much change - there’s still a lot of variation in the Measured and Indicted. In fact the maximum amount of metal is almost 4 times the minimum amount of metal. ?
It’s about human judgement folks! Some of the entries classified everything as Inferred. And to be honest I tend to agree with them. Personally, to meet the hurdle from Inferred to indicated I would like a lot more geological context than we provided to the entrants.?
领英推荐
OK so across the entire volume the results are disturbing. That could be due to a lot of extrapolation. What happens if we look only are a zone that is close to the drill hole data - let’s say a buffer of 3 holes within 200m.?
The MII range is -237% to + 175%
The MI Only range is -154% to +143%
Some improvement but that is still a big range for an area close to drill hole data and only classified as Measured and Indicated. It is certainly outside the range for the poll I ran at the beginning of my presentation where I asked people their views on the precision of a Measured and Indicated resource. In that poll 96% of people in the audience thought the Measure+Indicated precision was +/-25% or better. When I asked about the Parker Challenge specifically only 26% of the audience thought the results would be worse than +/-25%.
It seems we have a calibration problem.?
Now there are a lot of other ways we can look at the results and that analysis is coming. For example remove the unconventional estimates leaving just the ordinary kriging results, examine by experience range etc.
The experience range analysis looks like it will be interesting. One thing I did manage to look at was experience vs the amount of Measured resource… You probably won’t be too surprised that the more experienced entries had less Measured (or no Measured at all!). Those battle scars are telling.
Let me try and draw a couple of conclusions from this.?
I think it is clear that we have a big problem with the person-to-person variation in Measured + Indicated + Inferred classification. That is probably the pest that has the most human judgement. We also have person-to-person variation in the estimation process itself. Starting with estimation domain interpretation and going down through all the little decisions we make around parameters.?
The loose standards around things like +/-15% for 3 months production are not going to be appropriate when it comes to the very real differences we are seeing between people. Those metrics are looking at something quite different and it’s only the very tip of the iceberg.?
To me it seems that there are different skills required for estimation vs classification. When you think about it, the JORC Code and other similar Codes all stem from a time when things were less complex. If you found a resource (particularly in Australia) the likelihood was you would be able to exploit it. The sovereign risk profile was well defined and well understood (for better or worse). I mean… We managed to build an underground gold mine in a small region surrounded by National Park, and conservation areas within spitting distance of a certain huon pine (Australia’s oldest tree). Something that would prove much more difficult today! The balance of skills in the 1980’s-90’s and early 2000’s were weighted towards geology, geostatistics and the associated technical disciplines. Today? Different story. A zone of mineralisation may not meet the hurdle to be a mineral resource due to any one (or combination) of issues that have little to do with the geology or the rocks.?
Rather than expect the geologists and engineers involved in the technical number crunching to suddenly become gurus in ESG and RPEEE it’s probably time that became its own area of concern.?
And lastly (for now) I think those Codes need a serious redesign. I know I know… the JORC Code is doing so as we speak and all the changes are hush-hush under lock and key. But… Unless there is some provision in the revised Code that deals with human judgement variation and noise all the other mooted improvements will not matter one iota. The noise we are seeing between practitioners vastly outweighs the variation you will see if only looking at the technical aspects of our estimates…. And that’s before I get started on all the uncertainty in the mine plan, schedule and other modifying factors!
In the interim, what do we do about it? I don’t know that there are easy solutions. I suspect that any solutions we do come up with will add cost to the whole world of Mineral Resource and Ore Reserve Estimation. Equally they will stress an already stretched system requiring more people, more knowledge, more expertise and more diligence.
My first thought was to have multiple independent classifiers. Take the one estimate and have it classified by at least 3 independent people. Ouch… It would reduce noise a bit although how much is open to debate, but the practicalities are insane.?
This is a can of worms. The cynic in me says we will just end up ignoring person-to-person variation and that is a shame. It will leave our non-technical stakeholders with continuing dissatisfaction with our discipline. And maybe the next Senator will be right when he says “those geologists have egg on their faces”.
One last message.... The Parker Challenge has only begun to look at the person-to-person variation. There are a lot of other sources of noise we need to start thinking about. Like the pattern noise of a single person who may act differently when working in copper vs gold vs lithium vs nickel. Or the occasion noise when that one person has had a lot of personal stress while working on an estimate vs times when they have been stress free. After all, we have evidence from other fields that people's judgements can be influenced by something as trivial seeming as whether their favourite football team won or lost on the week-end!
People are paid the most money for the work they are willing to do for free
8 个月Very clear summary and very insightful results. Scott,?thanks for sharing!
Principal Geologist at IMEx Consulting
1 年Sorry Scott Dunham but I think I may have missed this one. Why did the winner win?
Team Lead - Mine Geology at Measured Group
1 年Thank you, Scott Dunham, for a very interesting and thought-provoking article. As a Coal Geologist, I've often wondered how loud the "noise" is in our estimation processes. I've prepared, overseen, and presented numerous estimations from a wide variety of geological terrains and coal basins around the world, both for public and private companies. I've often had to defend my estimates and their classifications, especially to potential investors and fund managers, and these occasions have been particularly testing, as I have sometimes struggled to explain some of my reasonings to non-technical people. I've also been challenged by highly experienced technical people when my reports have gone through external peer review - just as challenging! I think this type of challenge would be great for us coal estimators given the (no so) recent changes to the classification processes involving geostatistics and the requirement to demonstrate eventual economic extraction. I'm up for the challenge ??
Manager - Geological Services
1 年Scott having a second look at the grade tonnage results there are a couple of interesting observations. Above 1% cut off the grades the tonnages are all pretty close end even including the more extreme outliers the spread of grades is similar (both could even be considered normal). What do you think is going on here? At a 0.5 cut off the spread tonnage is much wider but (ignoring the extremes) there seem to be two distinct trends ... one with 9 or so models having a similar range of tonnage (1.25 to 1.75Mt) and a second ranging from 0.5Mt to 1Mt. Also in the article "Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results" the (many) authors concluded ... "Overall, the 29 different analyses used 21 unique combinations of covariates. Neither analysts’ prior beliefs about the effect of interest nor their level of expertise readily explained the variation in the outcomes of the analyses. Peer ratings of the quality of the analyses also did not account for the variability. These findings suggest that significant variation in the results of analyses of complex data may be difficult to avoid, even by experts with honest intentions." Seems very familiar now!
Senior Resource Geologist at Minara Resources
1 年Hi Scott, although we ended up standing next to each other at the conference the other day, we didn't get ta chance to talk - we were still devouring lunch. Thanks go to you and the team that had been put together to go through the Parker Challenge outcomes. May the analysis and insight continue! I was one of the many that was quite surprised by the variation. But I was equally surprised at the amount of what looked like poorly completed work from what I assumed were experienced geologists - regardless whether they were willing to call themselves Resource Geologists. Hopefully you will be able to breakdown this variation with a little more insight as you progress this topic. I'm not sure if you have this statistic, but how many of the more experienced Challengees had a recognisable competence in the style of copper resource? This being one of the reasons I chose not to participate - it being outside my experience.