A Simple Absolute Prospectivity Model Using R and Feature Vectors: Mapping Absolute Prospectivity Using Feature Vectors
This article is the last of a four-part series of articles that will be released on a weekly basis. Links to the full series are available below:
4.???Mapping Absolute Prospectivity Using Feature Vectors
Mapping Absolute Prospectivity Using Feature Vectors
Having estimated the number and size of deposits to be discovered in the next era and inspecting the relationship of mineralisation to mappable geological features in our previous articles (see here and here), the two can be integrated to calculate the probability distribution of deposits of differing sizes occurring in each cell of our study area. The first step is to combine the geological features into a single feature vector for each cell, then simplifying the vector into something which is tractable, and we may then look at the relationship of these vectors to mineralisation in the same way we could for a single feature.
Creation of Feature Vector and Dimensionality Reduction
Using all the features would result in a feature vector with too many dimensions, and so we will need to engage in some dimensionality reduction, that is we need to reduce the number of datasets utilised in the feature vector and the number of categories for each set. Firstly, we shall reduce each feature to as few as discrete categories as possible, to a maximum of 4 categories. Each category is numbered and NA’s were entered as 0. To keep this exercise as simple as possible dimension reduction was done using manual inspection of the charts in the previous section to pick natural breaks and eliminate data sets which were not sufficiently discriminatory. Features which could be seen to have a lower predictive power from the previous analysis were dropped from the study. Techniques such as Principle Component Analysis (“PCA”) could optimise this process so that the most effective categorisation is achieved.
Further dimensionality reduction is done by removing one half of feature pairs that have high correlation. In this analysis that resulted in the removal of “Greenstone” (correlated with Greenstone Thickness), “RFaultBend2400” (correlated with the 200m data of the same set), “RFaultBend5000” (correlated with the 200m data of the same set), “M3” (correlated with “D3”) and “RFaultDensity” (correlated with regional faults and regional fault bends).
Finally, further reduction is achieved by filtering out the feature vectors which have the lowest frequency and replacing them with local vectors which are similar (within the vector space) and have a higher frequency. In this analysis we changed all vectors that had a frequency of 25 or less.
This analysis was done completely within R, processing times were lengthy but again this may be improved with more efficient programming and/or more computing power. The chart below shows the data sets selected and their respective categorisation:
After dimension reduction there are 1,209 unique feature vectors which is approximately 50% of the number of deposits in the terrane. A snippet of the final vector table is given below:
The feature vectors relationship to mineralisation can now be analysed in the same way as for single mappable features.
Given that each cell with the same feature vector are identical with regards to geological features (at least to the level of resolution of our study), we can now calculate the probability of a deposit of differing magnitude occurring in a cell of a given feature vector, assuming the location of known deposits is hidden. This is presented in the map below that shows the percent chance of a deposit of any magnitude existing in each cell based on the feature vector analysis, and is a close proxy to ‘Absolute Prospectivity’:
Mapping Absolute Prospectivity
So far, we have been working with binned deposit data, so that when we estimate a probability of a cell containing a minor deposit we are actually calculating the probability that it contains a deposit of size 0.01 - 0.1Moz (as per the categorisation by Minex Consultants). The minimum size considered in the database is 0.01Moz and therefore when we are calculating the probability of no deposit, we are actually calculating the probability of the 0-0.01Moz bin. To calculate a probability distribution it would be tempting to take the mid-point of each bin and multiply it by its probability, but this would be inappropriate for heavily skewed data with broad categories such as this. Furthermore, many cells have an estimated probability of zero for multiple deposit size classes which would intuitively seem incorrect except in some very specific circumstances.
?To overcome this a continuous distribution was fitted to the binned data and used as the basis of calculating our probability distributions. A few different distributions were tried, lognormal was found to fit best and aligns with what we know about how deposits are formed. To achieve this, the following function was used, note the bounds placed on the parameters:?
This function was applied to the vector data, a small amount of the vectors could not have a model fitted and, in those cases, the continuous models from the nearest neighbour vector in vector space were assigned. Below is a snippet of some of the fitted models:
From visual inspection of a random selection of cells it was clear that the model fitting was not perfect and could be improved with more work. However, for the purposes of this demonstration they were acceptable enough to proceed. A better result could be achieved with a greater number of simulations as the low probabilities for many cells result in the sampled simulations producing data which is too sparse for good model fitting. The simulations require lots of processing time and so there were practical limits to what could be achieved with the computing power of a standard laptop. Advanced programming proficiency or greater skill with statistical modelling may also yield more efficient code that could permit a greater number of simulations.
As a first check on how good the fits were, a simulation was ran using the computed distributions to see if it produces similar results to the known endowment. Given the high number of cells a straightforward simulation of >10,000 simulations would be impractical with the available computing power. So, to do this 100 runs of 1,000 simulations, for a total of 100,000 simulations were ran. The R script produced 100 R objects each with 1,000 simulations over every grid cell. The median value for each cell in each object was calculated, and then the mean of each set of 100 median values for each cell was taken as representative of all 100,000 simulations.
The simulated deposits (red) appear to be conservative, this highlights the potential for improvement from better modelling.
It is now possible to link our continuous distributions calculated for each cell with our expectations derived from the Discoverable Endowment Analysis from earlier in the article. First step is to remove all the cells that already contain a Moderate, Major or Giant/Supergiant deposit as they will not be the sites of new discovery, Minor deposits can remain as they often act as the source of new larger discoveries i.e. these normally represent a neglected brownfield site and are often not fully explored.
For each simulation 5 random points were sampled but from quantiles designed such that the medians and standard deviations match our estimated Discoverable Endowment. The results of this simulation are shown in the chart below:
The simulated deposits (black line) are a slight overestimation, particularly for the larger deposits. This is because the code took the value which was closest to the discoverable endowment median, but at least as large which has added a positive bias.
A probability distribution of a deposit being discovered in each cell can now be calculated from this simulation, again this was done by fitting lognormal distributions. The map below shows the mean deposit size for each cell with the known deposits overlain as graduated red dots. Also shown below is a map showing the probability of a deposit occurring.
Although the mean, and probability of deposits occurring, presented in the maps above are very useful, neither are 'Absolute Prospectivity' in the strict sense. In order to estimate Absolute Prospectivity it is necessary to consider the range of probabilities (i.e. the variance) and ideally also to estimate exploration costs in order to differentiate areas that may be more expensive to explore, for example areas under increased cover. Because exploration cost is a key component the analysis is best done on the project scale where a future exploration cost can be better estimated. However, for a large scale analysis such as this we can make the assumption that the exploration cost is uniform across the area.
The value of the range of possibilities can be achieved by assigning a value to the varying exploration outcomes according to the discovery/non-discovery size from which we can minus the exploration cost, and weight each scenario by its probability. This is illustrated in the diagram below:
领英推荐
Exploration Cost
Project Murchison aims to grapple with an 'Exploration Model' that encompasses the full range of ways a project can develop, and the costs/time frames associated with it. This is beyond the scope of this article, but exploration costs can be derived from a simplified exploration model based on discovery/non-discovery scenario. An exploration cost per cell can be assigned based on a project that does not make a discovery, for those which do make a discovery a larger exploration cost can be assigned based on the size of deposit. The following assumptions were used in this analysis and are based on personal experience, greater study in this area could result in more sophisticated and accurate assumptions.
Exploration cost for 1km2, no discovery = US$150,000
Exploration cost for 1km2 per ounce, discovery = US$30 - R / 3.75
where
R = Deposit size in millions of ounces
Deposit Value
There are a number of different ways that values may be attributed to deposits and a method should be picked that would reflect the basis of a potential transaction for the deposit. As these are conceptual deposits with a single size parameter, I opted for a simple resource multiple approach using the following formula:
V = R . v
where
V = Deposit value in millions of dollars
R = Deposit size in millions of ounces
v = Value per ounce multiple in dollars
Furthermore, we will make the following assumption of the value of v based on my own personal judgement. Though not that this could be improved with specific study in this area, and should reflect the value per ounce of a typical deposit within the study area. In this way a different area that typically hosts deposits with lower average grades or a discount associated with perceived political risk, would require a different resource multiple.
v = US$150
To assign values the resource values are binned into 0.25Moz categories and the mid-point taken as the 'outcome' to be weighted by the probability of occurring. The exploration cost will be subtracted from each scenario also.
The values of future discoveries are discounted by an annual rate of 20%, exploration costs are undiscounted. The selection of discount rate is a key parameter and should be tailored to an investors risk preferences. Given the exceptionally high degree of risk associated with exploration, higher discount rates may be more suitable, particularly if risk cannot be diversified by maintaining a large portfolio of projects, which is often the case for small exploration companies.
Absolute Prospectivity
Using the formulas and assumptions above, a value for each cell was calculated. The map below displays the cell value and therefore Absolute Prospectivity.
The map shows the majority of cells are coloured grey meaning they have a negative value because the weighted value of no-discovery scenario is higher than the weighted value of discovery scenarios. There are then clusters of blue cells which have a high value as the weighted value of discovery scenarios is higher than no-discovery.
We can now take a sub-area, that may represent for example an exploration licence, and derive the value of the licence as the sum of the value of the individual cells within. This valuation is not only a guide of the areas 'deal' value but also what exploration expenditure commitment is warranted.
This is demonstrated below with a fictional example licence:
Licence Extent: xmin = 450000, xmax = 470000, ymin = 6600000, ymax = 6620000
Total Area = 400km2
Total Value = US$4.77M
Probability of a Discovery = 2.4%
A party which is looking to acquire this example licence should seek a deal where the cash payments and earn-in commitments do not exceed US$4.77M plus the estimated exploration cost. For property owners it is likely that the exploration cost can be estimated more accurately than in our example given above, and this more accurate estimate can be used instead. Should their property have a negative Absolute Prospectivity valuation, it is an indication that they need to stop exploring in that property and seek an exit from it.
Conclusion
In this article a standard mineral systems model has been used to estimate Absolute Prospectivity using a relatively straightforward methodology and utilising the R programming language. This is superior to relative prospectivity methods as it can be used not only for prioritisation of projects, but also for project valuation and optimal strategic management. It also highlights the key importance of quantifying uncertainty to evaluate projects. In fact, the overriding driver of project value is uncertainty rather than expected value as illustrated by the charts below, and if you fail to characterise the uncertainty you fail to characterise your projects value and you will be unable to make optimal exploration decisions.
This article utilised only mineral systems data but there is no reason that it could not be extended to encompass traditional exploration data such as geochemistry. However, such datasets should be regional in nature with a reasonably good coverage over the study area. An interesting dataset currently under development in Project Murchison is exploration sterilisation which can be used to dampen the range of uncertainty (and therefore value) in well explored areas compared to unexplored ones. However, demonstrating a model using only mineral systems data is useful as similar datasets can often be created with rudimentary geological data allowing for the models to be deployed in undeveloped areas where characterising uncertainty is the most important.
Though this article was an independent project it covers the same key themes of quantifying the range of uncertain outcomes that is at the heart of the current research at Project Murchison, of which I am Project Manager. Our continued research in this area is creating superior models than the one generated in this article, however this demonstrates that simple models that return true probabilistic models can be constructed by geologists with freely available software and entry-level programming ability. If you would like to know more about this research or how it might be applicable to your project, please contact me for further information.