Capping

Capping

When I first heard about capping, I hadn't heard of that term before. Once it was explained to me that you choose a cap and any value above that threshold would be capped to that threshold, then it made sense. I knew of it (barely) as winsorization or as truncation (when truncating not clipping but lowering all values above to the value you choose or opposite on the lower end).

Why cap:

There are quite a few reasons why you would cap. If you have outliers (extreme values) which the mean is highly sensitive to, then you would want to limit its effect since you want the statistical measures to be representative of the data. Gold, especially, is skewed and in many deposits I worked in, tends to show outliers.

Sometimes data can have logical limits such as a percentage wouldn't ever go above 100% because how would you get more than itself unless you somehow created it (or didn't clean the crucible or something) or other physical limits. There are variables that you would never get below a certain value (detection limit) nor above a certain value (some assay methods stop at a certain cutoff and then you have to switchover to a different method).

In regression analysis, it can distort the model and fit. The best fit may tilt towards the outlier which misrepresents the relationship for the majority of the data. It affects p-values and R squared values (false sense of model quality) or makes a predictor variable seem more significant than it actually is. There are some regression methods that are less sensitive to outliers but that is not a topic for capping :) .

Outliers should be reviewed to see whether they are natural, errors or rare occurrences. Since with each you may want to change your plan on capping/dealing with them.

Remember though the largest reason why we want to cap in Geostatistics is to prevent overestimation!

How to deal with Outliers:

Note: These are not complete and only ones that I have heard of, used or been told about. Feel free to fill me in if you know of more or you know it a little differently than I do.

Grade Capping/Top Cutting: When I first heard the work top cutting to me this meant taking scissors and cutting the top off so I literally thought we were chopping off the tail of the distribution which was not true. It's where you choose a threshold (based on tests/visuals I will talk about in a minute) and then cap all the values above this to that value. So for instance if your values are 0.005, 0.010, 0.012, 0.045, 0.05, 0.06 and 20; say we choose a threshold of 0.06 that would mean it now looks like this: 0.005, 0.010, 0.012, 0.045, 0.05, 0.06, 0.06 --- see we capped the 20 down to 0.06. We didn't remove it, we just made it 0.06. This can be done before compositing, after compositing/during estimation.

High Yield: I haven't started liking this nor loving it. To me as I've stated previously, it just reminds me of building another indicator shell that limits the impact of high grade, constrains it into poddy little blocks. It removes it from being used below its threshold but then you end up with a few spots depending on what threshold you chose. I guess if you know the volume/geometry from ore control then maybe you could use it or if you know that it was intrusives and that the ore didn't bleed out or the variable didn't bleed out then yeah but I'm still on the fence with this one. Some people use rhograms to help choose/guide a threshold for this but some of the rhograms just look like what I'm used to seeing in very unbehaving IK domains when I'm trying to choose group binning. It very much reminds me of omni/group binning.

Segregation: Sometimes you may have mixed populations where maybe you have very high grade mixed in with medium to low grade. Maybe you didn't domain it properly or didn't quite understand the controls but wanted to create a model or had to create a model. If you had time you could segregate whether through using software like Leapfrog or Micromine or Vulcan or RMSP (this year I got to see Micromine and I was of the assumption it was still way in the past...absolutely not, if you haven't seen it, you need to...it blew my mind for what it can do). It's not great but you could create indicator shells which may work in a pinch or you can dive in and see if you can figure out the geologic control (which would be the preferred method...if you had time). Also, I had someone reach out to me and ask me about ROKE ( so figured I'd throw this in here) and if we had built tools for it in my previous role. Not that I remember, I didn't know what it was so I reached out to Isobel and she shared this for anyone that wants to review/read it: (Thanks Isobel)

ROKE separates component normal or lognormal populations in one mixed histogram. Read my 1974 paper! The math is in my Computers and Geosciences paper but that is just boring!! (Isobel - you are talking to a Math major lol!!!! I will look at it.)

https://www.kriging.com/publications/IMM1974.htm

https://www.kriging.com/publications/CompGeoScRoke.htm

The analysis is available in our free teaching software which you can find at kriging.com.

You are the best Isobel, thank you so much for that! I first met Isobel Clark when she came to Elko and taught us Practical Geostatistics. The best part was we had to do Variography by hand so I was placed on a team with a few other geologists and man did we have lots of laughs because I was so competitive and things were making more sense, I was helping the geos understand it. I always felt like we had to race when we had to do things by hand...it's like a game LOL! Rachel Burgess can tell you how much fun I had :) and she still remembers that day, over 15 years ago! I feel when you do things by hand, it really drives it home and you fully understand it (then later onto automation after you have done it 100+ times).

Cut: I say don't ever cut something off, dispose of it unless you have extreme evidence to do so. It's high for a reason, its data, its trying to scream at you about something. So please take extreme care here. The ONLY time I ever did this was when I was questioning in ore control if sample bags were mixed up and we had gotten undergrounds assays in our database (true story) so we wanted to krige/cut polys with and without that sample. Again...do not try this at home!

Local Capping: I haven't seen this used much, let me know if you have. It basically looks at assays around it and sees what it should be capped down to. We used it once in ore control for similar reasons as above. I wish this would have been available back when we had to use "cut", I would have much rather used this.

How to choose a cap?

Now for the moment you've all been waiting for. How on earth do you choose a cap?

Probability Plots: Here you are looking for breaks or changes in the slope. Some explain it as a "kink" like a slight bend. I remember people showing me by drawing lines with a ruler and where the lines meet if they didn't continuously connect but instead made a V then that is your "kink". Where this "kink" occurs could suggest a change in the distribution or indicate outliers.

Decile Analysis: This involves analyzing the contribution of high-grade samples (top deciles) to the overall mean or total metal content. By comparing different deciles, you can decide on an appropriate capping threshold to reduce the influence of extreme values. I don't like this one because I feel instead of chopping up the data into 10 groups that you need to chop up the high grade tail with more care since most of your metal content lives here. I learned about this alot with MIK.

Some other methods used for comparison are Mean + 2SD or Mean +3SD, CV analysis, correlation indicator: this one you would turn the data into binary (indicator) data and then choose a threshold distance for it to compare the pairs of values. If it turned out negative, it meant the distance was too far.

Risk Hi: This one always gave a cap that was way too low but it was interesting to run and compare to. It uses Monte Carlo simulation to run say 1000 realizations. It would then tell you that 1 in 5 years the metal won't be there but 4 out of 5 years it would using P20. I'm deep in my files trying to find the documentation that I was given by Amec/Wood. and Ed Isaaks over the years. Ed loved doing it all in JMP...he was a JMP wizard and it fascinated me that he used that because I used SAS in school, JMP is just an easier version of it/more user friendly/less scripting - a lot less!

Visual: Nothing beats this. I normally go and look at all the other methods, usually using the probability plots as a good starting point. I may check the tail of the probability plot to see if the high grade was just a hole that piped mineralization or an intrusive or something, then I will dive into a visualization software (Vulcan, Leapfrog, RMSP, etc.) to look. If samples that are going to be capped are in the same area (ponded together) then you know its real and probably a geologic reason for it, you probably don't want to cap this. If its sporadic then those could be real outliers and you would want to cap those, "one-hit-wonders" which at times Geos have told me they would sell their lives for.

Then I always like to run an estimate with and without that cap and compare. How much metal was removed by using the cap that I chose? Does it make sense. Am I too conservative? Too optimistic (I am never this)? I might even choose many caps to compare to just to see how sensitive it is in those domains.

What do you use?

(later I'll come back and add in some figures...I feel figures really drive it home but for now here are the words)

#WisdomWednesdayWithCW #Capping #TopCut #Geostatistics



Good call on this Celeste. I would add to all of this that we should try to understand why we have this outliers in first place in domains that we "called" as stationarity before procced to cap data that maybe this behaviour is explained by a geological or structural control. As you remark Capping, is a geostatistical technique to limit the influence of extreme values on resource estimation, or in other words is a process to control the impact of the high grade samples on the metal resultant from an resource estimation process. My two cents on this would be to look at the capping, globally (maybe by domain), using the tools you choose for it, but as well as local, which it is really important because sometimes some samples are no exceeding the actual cap value you selected for an specific domain but maybe it is for the neighborhood. Great post!

Keith Whitchurch

President Director PT SMG Consultants

4 个月

Great discussion and a perenial one.

Ana Chiquini

Principal Geologist at Resource Modeling Solutions, M.Sc., MAusIMM CP(Geo)

4 个月

Thanks for sharing your thoughts with us, Celeste! I wrote a little lesson a couple of years back, it might be useful to some: https://geostatisticslessons.com/lessons/simulationcapping

John Ashton

Consultant Geologist

4 个月

You need to establish if the high grades have GEOLOGICAL continuity via mapping, sampling and closely spaced diamond drilling. It would be a shame to cap real high grade layers! IE Not necessarily a geostatistical call!

回复
Isobel Clark

Educator and consultant at Geostokos Ecosse Limited

4 个月

What a great article! Thanks for including my message about ROKE. Also for the memories of our course in Elko. I still make my students hand calculate although I let them use Excel for the actual arithmetic! Keep up the great postings!

要查看或添加评论,请登录

Celeste Wilson的更多文章

  • Change of Support: Post I

    Change of Support: Post I

    Let's take a break from Variography. We could dive deeper into variography, but for now, let's save that for later as…

    2 条评论
  • Variography: Post VII - The Models

    Variography: Post VII - The Models

    I remember two times in my career where I've been told which model to choose as if it was a recipe. Once was about 5…

  • Variography: Post VI - Variogram Maps

    Variography: Post VI - Variogram Maps

    This week I was going to go into variogram models but someone commented and brought up a good point about sharing a bit…

    3 条评论
  • Variography: Post V - Experimental Points

    Variography: Post V - Experimental Points

    Experimental points are calculated semivariance values that are plotted on a graph to create a variogram. If you've…

    3 条评论
  • Variography: Post IV - The Range

    Variography: Post IV - The Range

    Last week we touched on the Sill, there is a lot more I could have shared about it and maybe will in the future but…

    3 条评论
  • Variography: Post III - The Sill

    Variography: Post III - The Sill

    Let's continue on with learning about Variography. This week's post is about the sill.

    2 条评论
  • Imposter Syndrome

    Imposter Syndrome

    We've all been there and if you haven't ever experienced it, you are lucky. .

    5 条评论
  • Wisdom Wednesday Links

    Wisdom Wednesday Links

    Before we move on with more talk about Variography, I've had a few people ask me if there is an easy way on LinkedIn to…

    1 条评论
  • Variography: Post II - The Nugget Effect

    Variography: Post II - The Nugget Effect

    Welcome to 2025! Hopefully the new year for you all is starting out in the right direction. I know some years start and…

  • Variography: Post I

    Variography: Post I

    Variography is one of the most important pieces of building a block model. I would say Data is most important because…

    12 条评论

社区洞察

其他会员也浏览了