It might not be about competency…

It might not be about competency…

“Providing overwhelming amounts of information without adequate structure or documentation is not transparency.” (Richard Berk, 2012)

Many years ago, following minor leg surgery, my wife was preparing to have a shower. Her post-surgery routine involved, amongst other things, not standing or weight-bearing on the recovering limb and so, being a faithful husband, I had placed a chair in the shower, reckoning that a seated shower was better than somehow trying to balance on one leg, keep the surgery site dry and wash all at the same time.?

Now here’s the thing about chairs. We use them every day, trusting these devices of our own making to hold our weight in comfort. It’s reasoning by inference and it goes like this… ‘I sat in this chair yesterday and it didn’t break. Therefore I can safely sit in this chair today, trusting it to hold my weight’

Do you see the problem?

There’s a lot of fundamental stuff residing in that simple piece of inference. All bound up together to paint a rosy picture.

As you may have guessed by now my darling wife had a somewhat less than happy experience with that chair. Imagine, if you will, trying to perform a single legged squat in a shower cubical?on to a plastic BBQ chair, straight after an operation. No, it did not go well at all. My wife lowered herself towards the chair, over-balanced on her single leg and landed on the chair from a height of about 30-40cm. The chair, not amused by this, took it upon itself to collapse spilling my wife to the floor and landing me in the doghouse.

Sometimes it pays to think about the hidden assumptions that underpin our inferential logic. If I had, just maybe I could have saved my wife some physical pain and save myself some emotional damage. But no, I trusted that chair to just do it’s job like it had every previous day. I ignored that I had placed it in an unfamiliar situation where it’s splayed legs were on a slippery surface. I didn’t even consider the challenges of single-legged squats in a confined space. I certainly didn’t contemplate the difference in energy when someone executes a controlled sit compared to an uncontrolled stumble from more than a foot above the seat.

All those safety lessons over the years… forgotten because I trusted that chair to do its job!


Trust is also, I think, one of the central challenges facing the JORC Code and other reporting regimes. Much like my experience with that chair, the trust of many stakeholders has shattered on the back of one too many failures to perform as anticipated. But like that chair, I wonder if there’s more going on? Replace ‘chair’ with ‘competent person’ and you might see what I mean. When I trusted that chair I ignored its circumstance and I ignored its use case. I should not have been surprised when it snapped into pieces and dump my chief stakeholder to the cold shower floor. I misused (you could say abused) the chair.

That makes me wonder. How many ways and how many times have we misused the idea of ‘competency’, laying the blame on the competent person when the ‘fault’ (if there is such a thing) lies elsewhere?

To be fair, there are many cases where the ‘chair’ has over-promised and under-delivered. I mean have you ever seen a beautiful and comfortable arm chair that turns out to be as hard as a rock and about as comfortable as lying on a bed of nails?

It is the issue of trust and the over reliance on ‘trusting the competent person’ that spurred many of the changes in the 2012 Code. Well do I remember the discussions around balancing the principal of competency with the other two legs of the Code; transparency and materiality. The idea was that by beefing up the transparency requirements and ensuring disclosure of material matters we could bolster the over-worked competency principal. No more reporting that said “trust me I’m a competent person”.?

In theory this was a great idea. In practice it probably didn’t quite deliver to expectations with too much leeway in interpreting ‘materiality’ and too few examples of good and bad transparency. And the other problem… all that disclosure and ramped up reporting implies that those reading the report have the knowledge and expertise to know what it all means.?

For those with less expertise it’s too easy to mistake jargon for substance.

A great idea for other competent persons. Maybe not so great an idea for those whose expertise lies elsewhere - like the other stakeholders! They way a subject matter expert and a layperson interpret the same information is vastly different. For example if I told you I had taken a photo under bortle 9 skies using a redcat 71 mounted on an ioptron cem40?high-resolution stepper motors capable of 0.08 arcsecond movement steps combined with 110mm, 216 teeth RA and dec. gears and then processed with a 3 hour integration time it would sound kind of impressive but all you could really judge would be how pretty the image was. You might like the picture or you might not but I doubt you could adequately judge its technical merit.

And yet I told you quite a bit. I was ‘transparent’. As the opening quote states, simply blurting out a lot of complex sounding language is not transparent.?

The question of transparency is also a critical challenge in the world of AI and neural networks. These machine learning algorithms are notoriously opaque and discovering how they come to the conclusions they do is difficult. That leads to a problem of trust. How do you know if the neural network is ‘seeing the right things’? How do you know it has not picked up some true but misleading pattern? The anatomy of a neural network can be surprisingly effective at finding patterns - even if some data is explicitly excluded (for instance gender or race). The agent can find subtle secondary, tertiary relationships that are ‘markers’ for the very attributes excluded. And some of those relationships at best imbed bias and at worst could end up killing people! Like the pneumonia diagnosis tool that ‘learned’ that being over 100 years old was a good thing when it came to survival rates and therefore centagenarians did not need hospitalisation… a connection driven by selection bias in the data used to train the model - all people over 100 automatically went to ICU and thus their survival was better than some other populations (not to mention the obvious frequency-related sampling challenges).?

?As it turns out transparency is an active area in AI research and I think our industry can learn from some of the approaches being developed. Finding true transparency tools to enable the diverse stakeholders to both understand our models and evaluate the risk inherent in the models - data, parameter, implementation and execution.

Here are some thoughts arising AI field.

Improper models. One of the most troubling aspects of current geostatistical estimates is the idea of stationarity. The assumption that some selected pool of data are statistically uniform. Essentially this is an expert decision that says ‘hey these data are all apples and these data are all elephants’. This stationarity assumption, which some have called the decision you make and immediately regret, is one of those subtle underlying features of your model - like it or not. Why? Well from this stationarity assumption you progress to variogram modelling, looking at the sum of squared differences vs distance and direction for sample pairs in your assumed stationary pool of data. You can see why stationarity matters. Make a different stationarity assumption (a different pool of data) and you have different data pairs informing that variogram model. The next step in this shaky path is the derivation of kriging weights, the weights applied to each sample during estimation. Those weights are derived from the variogram (and the block-sample geometry) and they are ‘optimal’ for that variogram - kriging is by definition an optimal weighting algorithm - it minimises estimation error.

What’s the issue then, besides the whole process being a big black box for non-specialists? It is in the weight optimisation itself. Those weights are ‘optimal’ for the variogram model derived from the pool of data with assumed stationarity - one pool of data (commonly a “domain”) and one variogram. If you peel back the wrapping on the black box you can sometimes see why this is a problem. There’s a rarely viewed version of the variogram that can be very informative - the scatter plot of each pair of data in the domain (direction vs pair difference). It’s not normally viewed because the number of pairs is huge and the scatter plot is a messy cloud. Compared to the summarised version however, those single point plots of ‘lag distance’ vs ‘variance’ the variogram cloud is exponentially more interesting. Looking at the shape and dispersion of points in the cloud gives you some insight into just how variable the local variogram within your domain can be. Sometimes there are outliers and extremes, sometimes there are tightly clustered points, sometimes the paired differences diverge quickly as distance increases and sometimes those pairs stay inside a recognisable band.?

All that information is summarised into one point per lag in the experiential variogram and then summarised again when we fit a ‘best’ model to the summary. This summary of a summary therefore is the basis of those ‘optimal’ kriging weights. Suddenly they don’t sound quite so optimal anymore.?

And that’s were the idea of improper models might be interesting.?

Way back in 1979, a philosopher cum psychologist named Robyn Dawes researched the difference between clinical prediction by experts and a variety of predictive models including typical regressions with optimal weights and not-so-typical ‘improper’ models where the weights applied during prediction were random or even equal across the data.

Guess what??

The regression model performed better than the expert. I guess that’s not too surprising. What is surprising however is the other two cases; random weights and equal weights. Both these simplified models with essentially no attempt at optimising their predictive capacity outperformed the human expert and came surprisingly close to predicting as well as the optimised model. (Next time you overrule your grade control system you might want to remember this!)

A non-optimal model, one with weights chosen at random or simply equalised predicted pretty well. Ok. So these were clinical trials against a range of experts, certainly not the infallible models developed by the competent person… umm… well… there’s an awful lot of expert judgement built into our resource and reserve estimates - like that stationarity question.

From a transparency perspective I think this is absolutely fascinating. It does not get much simpler than averaging a set of data - a simple linear average with equal weights. What’s more we already do this on a global basis - in our exploratory data analysis where typically the domain average (clustered and declustered sometimes) is reported. It’s also a trivial matter to average the grades of the samples in the search neighbourhood.

Presenting the global average and the simple average within the search neighbourhood along with your preferred estimate might improve transparency. It’s something I’ve been looking at for some time. We have a pseudo equivalent in swath plots but, let’s face it, there’s enough room to drive a truck through on most of those charts!

Salience. The second lesson from AI is about salience - you know, the important stuff or things that stand out from the crowd. In a neural network it is handy to know, for example, what part of an image is driving the agent to classify a picture as a cat or a canary.?What does the algorithm think is important? There are many examples of seemingly well performing image classification systems that are not focusing on what we think they should be. One reported case from 2013 outlines that the network was checking the background focus (bokeh) when identifying images with an animal vs images without an animal. It was -not- an animal detector, it was a bokeh detector. The network found a loophole, a shortcut. Images with animals were typically shallow-field focus on the animal with artfully blurred backgrounds. Images without animals typically had a wider depth of focus. In a second case a neural network used to identify skin cancers had a tendency to classify any image with a ruler included as ‘cancerous’. Another data related shortcut.?

What about our models? How does salience play a role in transparency? It seems to me it would be most useful when understanding a model (and the classification of that model) to know what bits are important and why. I’m still pondering this somewhat but I have one simple idea. With modern software it’s easy to know what samples and what sample weights were used to estimate each block. Surely understanding the differences in that complex matrix is salient? Some questions to aid transparency would be:

? Which samples are used most frequently during estimation? Where do they sit spatially and are those samples in anyway biased (high or low) compared to the rest of the data?

? Which samples have the highest combined sample weight?

Easy questions but never discussed. If I were a layperson and someone showed me a picture that said “hey, 50% of the metal is related to these 2% of samples” I could probably form a reasonable opinion on the risk involved. Simply knowing that not all samples are created equal can be enlightening. It certainly informs risk!

I reckon there’s a lot more we can learn about being transparent - truly transparent - by looking at other AI safety techniques. There are fascinating examples of reverse engineering the connections and layers in neural networks that are probably much more informative than the current ‘checklist’ approach to the standard JORC Code or NI43-101 report. We need to engage some clever minds to find them!

I think it worthwhile to remember that the JORC Code is not only about competency and that not all problems are competency related. Sometimes it more a matter of clear communication at a level understandable by all stakeholders, not only the privileged few.

Mustafa KAPLAN MSc. EurGeol

Senior Geologist - Consultant

2 年

I have been wondering how can I assess cumulative weight of any sample for all block estimations. I would expect all samples, if equally distributed like in grade control, have similar cumulative weights. I guess it is a better representation than comparing sample statistics and estimated block statistics within a domain. I guess it will contribute to optimize composite length and discretization. Would you suggest a software or method for that?

Ian Wollff

Principal Geologist. Independent

2 年

Long but great read. Rene Sterk may be interested.

Keith Whitchurch

President Director PT SMG Consultants

2 年

Good stuff Scott Dunham


Scott Dunham的更多文章

  • The new estimate and subjective validation bias

    The new estimate and subjective validation bias

    As scientists and engineers we should be familiar with the concept of uncertainty and the challenge of working without…

    10 条评论
  • The Central Estimate Fallacy

    The Central Estimate Fallacy

    Sometimes you can be so enmeshed in things that you don’t even know it. Like the proverbial forest-for-the-trees…

    13 条评论
  • That's appalling!

    That's appalling!

    Sometimes I find my work equally humorous and frustrating. Sometime it’s just downright tragic.

    56 条评论
  • I see things that make me weep and in those tears lie shattered dreams.

    I see things that make me weep and in those tears lie shattered dreams.

    More fun and games tonight I see. It’s like watching a tragic comedy.

    43 条评论
  • Parker Challenge - the better news

    Parker Challenge - the better news

    To say that some people were surprised at the range of estimation and classification results from this year’s Parker…

    10 条评论
  • AI is here to stay

    AI is here to stay

    I don’t think of myself as much of a futurist. Predicting the future is a mugs game.

    5 条评论
  • Parker Challenge (iv) - some of the stuff I left out

    Parker Challenge (iv) - some of the stuff I left out

    The Parker Challenge keynote presentation was tough but probably not for the reasons you may think. For me the biggest…

    8 条评论
  • Shouting into the void

    Shouting into the void

    Today I was given a glimpse into the future and it was a grim future… Look I know as well as anyone that LinkedIn is a…

    21 条评论
  • That Domain Question

    That Domain Question

    Let’s talk briefly about domains. Briefly because it’s a broad subject and there are a lot of opinions.

    14 条评论
  • The Parker Challenge (ii). The interaction of uncertainty and classification

    The Parker Challenge (ii). The interaction of uncertainty and classification

    The headline images in this post were taken from my presentation of the Parker Challenge results. It’s an attempt to…

    18 条评论

