Identifying New Mineral Occurrence using Remote Sensing Images
Shailendra Singh Kathait
Co-Founder & Chief Data Scientist @ Valiance | Envisioning a Future Transformed by AI | Harnessing AI Responsibly | Prioritizing Global Impact |
The traditional ways of mapping earth’s geology and mineral resources, such as field sampling and aerial photographs, are costly and time-consuming.
Contrary to the above, remote sensing data proved valuable in various earth science applications. Using high-dimensional data with advanced methods such as machine learning algorithms (MLAs), a sub-domain of artificial intelligence, enhances lithological mapping by spectral classification.
As part of this project, we wanted to interpret and disseminate satellite information on geology and mineral resources to find new mining sites. The task was made easier thanks to remote sensing data from the various satellite systems, for example- Sentinel-2A missions. We processed remote sensing data from AWS S3 storage using Python and R and produced geological maps for various climatic zones and different types of geology.
Based on inputs from remote sensing data, we developed a machine learning model to identify new mineral occurrence locations by scoring sites on the likelihood (using band ratios and other such combinations of band reflectance values).?
Data Inputs Available
2. Remote Sensing Data - Sentinel-2A Satellite imagery data residing on AWS S3 bucket:
1. MGRS grid reference available
2. Mineral occurrences data specifying location of known targets
3. Negative mineral occurrence locations.?
Keys Steps Involved
Algorithm Overview
Model Development
1---> Downloading Data
The Sentinel-2 mission is a land monitoring constellation of two satellites that provide high resolution optical imagery data. It provides a global coverage of the Earth's land surface every 5 days and makes high quality data available for on-going studies. L1C data is available from June 2015 and L2A data is available from September 2016 for area covering Europe and from January 2017 for global coverage.
Different sets of tiles Data (from date 01-01-2016 to 31-03-2021) is being downloaded from SENTINEL2_L1C bucket by using the sentinel hub library in python. Using this function, data is saved on the local machine for pre-processing.
##from sentinelhub import AwsTileRequest
???????request = AwsTileRequest(
??????????????????????????????tile=tile_name,
??????????????????????????????time=date,
??????????????????????????????aws_index=None,
??????????????????????????????bands=bands,
??????????????????????????????metafiles=metafiles,
??????????????????????????????data_folder=data_folder,
??????????????????????????????data_collection=DataCollection.SENTINEL2_L1C)
???????request.save_data()
?For a specific date, the first two bands and tileInfo.json files are downloaded. The following information from tileInfo.json is checked:
a)???dataCoveragePercentage_condition>=80%
b)???cloudyPixelPercentage_condition>=50%
c)????snowCover_condition>=50%
领英推荐
2---> Pre-processing the data
The specifications of the 13 bands available is presented in the below figure.
To train a model we need single tiff files with all their properties of individual bands. The 13 bands are available in jp2 format and they are converted into Geotiff.? ?
Before combining the bands and preparing the dataset for training, the following steps are performed:
i)?????These 13 bands have 3 types of resolution (10m, 20m, and 60m) and we have resized them into one resolution (20m). This step helps us in combining all the bands
ii)????Data is available in jp2 format which is converted into Geotiff (.tif file format )
The os command is used to combine all the 13 bands together. The positive and negative coordinates (latitude and longitude) are provided to crop the tiff files and prepare the dataset for modelling. The cropped files can represent the following different areas:
i)?1x1 square km
ii)????0.5x0.5 square km
iii)???0.3x0.3 square km
1x1 square km tiff files are cropped out from 100x100 square km tile for training purpose. The positive and negative labelled files are used for training a classification based neural network model.
Using the Euclidean Distance metric, minimum distance is calculated between the two coordinates and the same is used for cropping the tile. These 2 coordinates are:
i)?????Given latitude and longitude for Positive/Negative site from label sheet
ii)????Latitude and Longitude of every pixel
A 1x1 square km. area is cropped out around the pixel point which has the least Euclidean distance from given latitude/longitude.
3---> Model Training
The training dataset involves 13 band input images, and the objective is to classify each image as positive or negative. Convolutional Neural Network (CNN) is the most appropriate choice for modelling image data and several networks with different architecture were experimented and trained to identify the best model.
The dataset was split in 80%-10%-10% under train-validation-test(unseen) set respectively. For training the neural network model we have taken advantage of generators that help us in loading data in batches into the memory for training. The dataset was prepared in a well-defined directory structure and separated based on two parameters – data type (i.e., whether it is train, valid or test data) and prediction class (i.e., whether the image is classified as positive or negative).
Merged feature dataset was used to build the model and validate
Multiple Model architectures were built and tested. One such example is given below
4---> Model Results
Scoring Model on new RS inputs for potential mineral prospect sites provided:
● Likelihood for Commodity
● JSON with Commodity (e.g., Copper, Gold, or Nickel), Deposit type (e.g., porphyry, sedhosted, massive sulphide), X,Y Coordinates
The results indicated an overall accuracy of 95%.?
Model Testing on Unseen Data
80% of the data is used for training the model and 10% data is kept for validating results. The remaining 10% of the data is kept as test data or unseen data and the model is evaluated on the results obtained on this unseen data.
19KBB tile positive points results
@ 0.5 Prob and higher
@ 0.6 Prob and higher
This article will trigger imagination on using Remote Sensing data across multiple sectors. Feel free to reach out to us for any further information.
Maritime Digitalization | Technology Solutions | Product Management | Decarbonization | Digital Transformation | Digital Partnership | CXO Incubator
1 年Very nice Shailendra. I still remember you mentioned about this project a while back. Very glad to see it successful! ??
Aspiring Marine Engineer At Tolani Maritime Institute ? Trader ? Financial Analyst
1 年Interesting insight , kudos !
Entrepreneur II Ex-Chief Engineer at Anglo Eastern II Mission to create a community for seafarers where right guidance is easily available, with an aim to develop purpose driven leaders.
1 年Great work done , an interesting insight.
10 + Years Professional Specializing in Production and Project Management for FMCG Sector | BRC Certified Lead Auditor & Internal Auditor
1 年Commendable work