Identifying New Mineral Occurrence using Remote Sensing Images

Identifying New Mineral Occurrence using Remote Sensing Images

The traditional ways of mapping earth’s geology and mineral resources, such as field sampling and aerial photographs, are costly and time-consuming.

Contrary to the above, remote sensing data proved valuable in various earth science applications. Using high-dimensional data with advanced methods such as machine learning algorithms (MLAs), a sub-domain of artificial intelligence, enhances lithological mapping by spectral classification.

As part of this project, we wanted to interpret and disseminate satellite information on geology and mineral resources to find new mining sites. The task was made easier thanks to remote sensing data from the various satellite systems, for example- Sentinel-2A missions. We processed remote sensing data from AWS S3 storage using Python and R and produced geological maps for various climatic zones and different types of geology.


No alt text provided for this image

Based on inputs from remote sensing data, we developed a machine learning model to identify new mineral occurrence locations by scoring sites on the likelihood (using band ratios and other such combinations of band reflectance values).?

Data Inputs Available

  1. Descriptive information – contained information on scripts specific for index mineral detection. This code also included information on how to deal with overlapping tiles and time series aggregation of data.

2. Remote Sensing Data - Sentinel-2A Satellite imagery data residing on AWS S3 bucket:

  • Sentinel-2 satellite images are stored as tiles which are referenced using the
  • Military Grid Reference System (MGRS).?
  • Each tile contains a time series of multiple scenes (same location but different time) and each scene contains 13 bands (same time and same location but different spectrum window) that are stored as jp2 files. These bands emit light of certain wavelengths on earth and different minerals react to it in different ways, leading to their potential presence on earth.
  • Each scene contains metadata information in a json file. Metadata information includes cloud coverage percentage and data coverage percentage which are used to prioritise scene quality.
  • The best scenes within each tile are chosen and the corresponding bands are downloaded. Scene bands are combined to generate a geological index image. Multiple index images within a tile are averaged to get one high quality (HQ) index image per tile.?
  • GIS Data – vector data

1. MGRS grid reference available

2. Mineral occurrences data specifying location of known targets

3. Negative mineral occurrence locations.?


No alt text provided for this image
Framework: Sentinel-2A Satellite Imagery Data


No alt text provided for this image

Keys Steps Involved

  • Setup Sentinel 2A access and processing infrastructure on AWS cloud.
  • Test & validate refactored Python code in current process
  • Information extraction on RS scenes to create predictor variables
  • Use the information gained from the above analysis to create Machine Learning model
  • Validate Machine Learning model against hold out data
  • Score incoming RS scene data to identify new potential mineral sites?


No alt text provided for this image


Algorithm Overview

No alt text provided for this image
No alt text provided for this image

Model Development

1---> Downloading Data

The Sentinel-2 mission is a land monitoring constellation of two satellites that provide high resolution optical imagery data. It provides a global coverage of the Earth's land surface every 5 days and makes high quality data available for on-going studies. L1C data is available from June 2015 and L2A data is available from September 2016 for area covering Europe and from January 2017 for global coverage.

Different sets of tiles Data (from date 01-01-2016 to 31-03-2021) is being downloaded from SENTINEL2_L1C bucket by using the sentinel hub library in python. Using this function, data is saved on the local machine for pre-processing.

##from sentinelhub import AwsTileRequest

???????request = AwsTileRequest(

??????????????????????????????tile=tile_name,

??????????????????????????????time=date,

??????????????????????????????aws_index=None,

??????????????????????????????bands=bands,

??????????????????????????????metafiles=metafiles,

??????????????????????????????data_folder=data_folder,

??????????????????????????????data_collection=DataCollection.SENTINEL2_L1C)

???????request.save_data()

?For a specific date, the first two bands and tileInfo.json files are downloaded. The following information from tileInfo.json is checked:

a)???dataCoveragePercentage_condition>=80%

b)???cloudyPixelPercentage_condition>=50%

c)????snowCover_condition>=50%


No alt text provided for this image

2---> Pre-processing the data

The specifications of the 13 bands available is presented in the below figure.

No alt text provided for this image

To train a model we need single tiff files with all their properties of individual bands. The 13 bands are available in jp2 format and they are converted into Geotiff.? ?

Before combining the bands and preparing the dataset for training, the following steps are performed:

i)?????These 13 bands have 3 types of resolution (10m, 20m, and 60m) and we have resized them into one resolution (20m). This step helps us in combining all the bands

ii)????Data is available in jp2 format which is converted into Geotiff (.tif file format )

The os command is used to combine all the 13 bands together. The positive and negative coordinates (latitude and longitude) are provided to crop the tiff files and prepare the dataset for modelling. The cropped files can represent the following different areas:

i)?1x1 square km

ii)????0.5x0.5 square km

iii)???0.3x0.3 square km

1x1 square km tiff files are cropped out from 100x100 square km tile for training purpose. The positive and negative labelled files are used for training a classification based neural network model.

Using the Euclidean Distance metric, minimum distance is calculated between the two coordinates and the same is used for cropping the tile. These 2 coordinates are:

i)?????Given latitude and longitude for Positive/Negative site from label sheet

ii)????Latitude and Longitude of every pixel

A 1x1 square km. area is cropped out around the pixel point which has the least Euclidean distance from given latitude/longitude.

No alt text provided for this image

3---> Model Training

The training dataset involves 13 band input images, and the objective is to classify each image as positive or negative. Convolutional Neural Network (CNN) is the most appropriate choice for modelling image data and several networks with different architecture were experimented and trained to identify the best model.

The dataset was split in 80%-10%-10% under train-validation-test(unseen) set respectively. For training the neural network model we have taken advantage of generators that help us in loading data in batches into the memory for training. The dataset was prepared in a well-defined directory structure and separated based on two parameters – data type (i.e., whether it is train, valid or test data) and prediction class (i.e., whether the image is classified as positive or negative).

Merged feature dataset was used to build the model and validate

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image


Multiple Model architectures were built and tested. One such example is given below

No alt text provided for this image
No alt text provided for this image


4---> Model Results

Scoring Model on new RS inputs for potential mineral prospect sites provided:

● Likelihood for Commodity

● JSON with Commodity (e.g., Copper, Gold, or Nickel), Deposit type (e.g., porphyry, sedhosted, massive sulphide), X,Y Coordinates

The results indicated an overall accuracy of 95%.?

No alt text provided for this image

Model Testing on Unseen Data

80% of the data is used for training the model and 10% data is kept for validating results. The remaining 10% of the data is kept as test data or unseen data and the model is evaluated on the results obtained on this unseen data.

19KBB tile positive points results

@ 0.5 Prob and higher

No alt text provided for this image

@ 0.6 Prob and higher

No alt text provided for this image

This article will trigger imagination on using Remote Sensing data across multiple sectors. Feel free to reach out to us for any further information.

Uttam Kumar

Maritime Digitalization | Technology Solutions | Product Management | Decarbonization | Digital Transformation | Digital Partnership | CXO Incubator

1 年

Very nice Shailendra. I still remember you mentioned about this project a while back. Very glad to see it successful! ??

回复
Shivram Sreedhar

Aspiring Marine Engineer At Tolani Maritime Institute ? Trader ? Financial Analyst

1 年

Interesting insight , kudos !

回复
Praneet Mehta

Entrepreneur II Ex-Chief Engineer at Anglo Eastern II Mission to create a community for seafarers where right guidance is easily available, with an aim to develop purpose driven leaders.

1 年

Great work done , an interesting insight.

回复
SURAJ BHANDARI

10 + Years Professional Specializing in Production and Project Management for FMCG Sector | BRC Certified Lead Auditor & Internal Auditor

1 年

Commendable work

回复

要查看或添加评论,请登录

Shailendra Singh Kathait的更多文章

社区洞察

其他会员也浏览了