Download, Combine, and Visualize MODIS in python – a tutorial + code

Download, Combine, and Visualize MODIS in python – a tutorial + code

Link: https://github.com/kgavahi/modis-tutorials/blob/main/modis_tutorial.ipynb

When I started my PhD in 2019, MODIS products were one of the main datasets that we used for our research. It was also one of the most challenging ones to work with. There are three main reasons that I think made me and a lot of other students struggle when working with MODIS products:

1 – MODIS does not provide x, y coordinates as separate variables.

2 – MODIS products come in tiles and therefore in many situations your area of interest might cover multiple hdf files.

3 – the Coordinate system is Sinusoidal which is different from many other products that usually use WGS84 or lambert conic.

?

In this tutorial we will tackle all these issues using Xarray, pyhdf, and pyproj packages.

First, let's import the libraries.

1. Import Libraries

import pandas as pd
import urllib.request
from bs4 import BeautifulSoup
import os
from pyhdf.SD import SD, SDC
import numpy as np
import pandas as pd
import re
from pyproj import Transformer
import xarray as xr
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt        

2. Download Data

The first step would be to download the data. I developed the following class which will help us to easily do that. You can use this for any MODIS product (here is the full list for terra and aqua). To be able to use it, you will need wget and a NASA EARTHDATA account. You can easily create one here. ?Let’s start working with MOD11A2 Version 6.1 product which provides an average 8-day per-pixel Land Surface Temperature. We will download all the MOD11A2 version 6.1 files that cover Contiguous United States (CONUS) for January 1st, 2021.

In Many situations, you might want to perform your data preprocessing without downloading the raw data to save time and storage. Check out this nice tutorial by Ali Ahmadalipour on how to do this for ERA5-Land daily data.

class DataPreprocess:


    def __init__(self, user=None, password=None):


        self.user = user
        self.password = password


    def dl_modis(self, path=None, product=None,
                 start_date=None, end_date=None, tiles=None):

        #assert isinstance(tiles, list), 'tiles must be a list'

        if tiles=='conus':
            # 14 tiles that cover the CONUS
            tiles = ["h08v04","h08v05","h08v06",
                            "h09v04","h09v05","h09v06",
                             "h10v04","h10v05","h10v06",
                             "h11v04","h11v05","h12v04",
                             "h12v05","h13v04"]

        satellite_mapping = {
            'MCD': 'MOTA',
            'MYD': 'MOLA',
            'MOD': 'MOLT'
        }

        satellite = satellite_mapping.get(product[:3], 'unknown')

        if end_date==None:
            date_range = pd.date_range(start=start_date,
                                       end=start_date,
                                       freq='D')
        else:
            date_range = pd.date_range(start=start_date,
                                       end=end_date,
                                       freq='D')

        date_str = [str(date)[:10].replace('-', '') for date in date_range]

        page_url = f'https://e4ftl01.cr.usgs.gov/{satellite}/{product}/'

        uf = urllib.request.urlopen(page_url, timeout=120)
        html = uf.read()
        soup = BeautifulSoup(html, "html.parser")
        link_list = set([link.get('href') for link in soup.find_all('a')])

        filtered_links = [link for link in link_list if
                          any(date in link.replace('.', '') for date in date_str)]

        page_urls = [page_url+link for link in filtered_links]


        urls = []
        for page_url in page_urls:

            uf = urllib.request.urlopen(page_url, timeout=120)
            html = uf.read()
            soup = BeautifulSoup(html, "html.parser")
            link_list = set([link.get('href') for link in soup.find_all('a')])

            filtered_links = [link for link in link_list if
                              any(tile in link for tile in tiles)]

            for link in filtered_links:
                urls.append(page_url+link)

        urls = set(urls)
        urls = sorted(urls)

        # let's save the urls in a text file to
        # download them with a single wget command
        txt_path = os.path.join(path, "urls.txt")
        if os.path.exists(txt_path): os.remove(txt_path)

        with open(txt_path, 'a') as fp:
            fp.write('\n'.join(urls))

        # download the files
        os.system(f'wget --load-cookies .urs_cookies --save-cookies \
                  .urs_cookies --keep-session-cookies --user={self.user}\
                      --password={self.password} -P {path}\
                          --content-disposition -i {txt_path}')        

Use your username and password in the following lines to download the raw data.

dp = DataPreprocess(user='', password='')
dp.dl_modis(path='/content', product='MOD11A2.061',
            start_date='20210101',
            tiles='conus')        

3. Get Coordinates

As I mentioned MODIS does not provide x and y coordinates for all the pixels. However, the hdf files contain the corner coordinates. We can use those and the number of rows and columns in the raster image to create the x and y coordinates. The following function will do that for us.

def mod_y_x(hdf):

    gridmeta = hdf.attributes(full=1)["StructMetadata.0"][0]

    ul_regex = re.compile(r'''UpperLeftPointMtrs=
''', re.VERBOSE)

    match = ul_regex.search(gridmeta)
    x0 = float(match.group('upper_left_x'))
    y0 = float(match.group('upper_left_y'))

    lr_regex = re.compile(r'''LowerRightMtrs=
''', re.VERBOSE)
    match = lr_regex.search(gridmeta)
    x1 = float(match.group('lower_right_x'))
    y1 = float(match.group('lower_right_y'))

    nx, ny = data[0].shape
    x = np.linspace(x0, x1, nx*2+1)
    y = np.linspace(y0, y1, ny*2+1)

    x = x[1::2]
    y = y[1::2]



    return y, x        

4. Read and Combine hdf Files

Next, we will read all the hdf files and save them in a DataArray to be able to use Xarray and combine them.

dss = []

for file in files:
  hdf = SD(file, SDC.READ)
  DATAFIELD_NAME = 'LST_Day_1km'

  # get the dataset fill value
  _FillValue = hdf.select(DATAFIELD_NAME).getfillvalue()

  #get the dataset min and max values: valid_range
  range = hdf.select(DATAFIELD_NAME).getrange()

  # get the dataset calibration coefficients scale_factor, scale_factor_err, add_offset, add_offset_err, calibrated_nt
  cal_coef = hdf.select(DATAFIELD_NAME).getcal()

  # Read dataset
  data2D = hdf.select(DATAFIELD_NAME)
  data = data2D[:,:].astype('float32')
  data = data.reshape(1, data.shape[0], data.shape[1])
  data = np.where((data==_FillValue) | (data < range[0]) | (data > range[1]),
                  np.nan, data) * cal_coef[0]

  # get coordinates
  lat, lon = mod_y_x(hdf)

  da = xr.DataArray(
      data=data,
      dims=["time", "y", "x"],
      coords=dict(
          x=(["x"], lon),
          y=(["y"], lat),
          time=pd.date_range("2021-01-01", periods=1),

      ),

  )

  dss.append(da)        

You can also open the hdf files using Xarray and its “netcdf4” engine, however you might end up getting compatibility errors if you don’t have the correct version of Xarray. That’s why we used pyhdf and defined the DataArrays ourselves in the for loop.

Now that we have all the 14 tiles as DataArrays saved in the dss list, we can used “combine_by_coords” to merge them.

da_comb = xr.combine_by_coords(dss)
da_comb.plot()        

5. Transform and Visualize

Next, we will transform the coordinate system from Sinusoidal to WGS84 using the pyproj package.

xv, yv = np.meshgrid(da_comb.x, da_comb.y)
transformer = Transformer.from_crs("+proj=sinu +R=6371007.181 +nadgrids=@null +wktext",
                                   "+init=EPSG:4326")

x, y= transformer.transform(xv, yv)        

Finally, let’s plot the combined hdf files using basemap and with a lambert conic projection.

down = 10
up = 55
left = -130
right = -60
#m = Basemap(projection='cyl', resolution='l',
#            llcrnrlat=down-.1, urcrnrlat =up+.1,
#            llcrnrlon=left-.1, urcrnrlon =right+.1)

m = Basemap(llcrnrlon=-119,llcrnrlat=15,urcrnrlon=-55,urcrnrlat=52,
        projection='lcc',lat_1=33,lat_2=45,lon_0=-95)


m.drawcoastlines(linewidth=0.5)
m.drawparallels(np.arange(down, up, 10),
                labels=[1, 0, 0, 0])
m.drawmeridians(np.arange(left, right, 10),
                labels=[0, 0, 0, 1])

pcolormesh = m.pcolormesh(x, y, da_comb[0], latlon=True, cmap='terrain_r')
fig = plt.gcf()

fig.colorbar(pcolormesh)        


Bushra Ibrahim

Cambridge & As &A level Physics teacher at knowledge Masterpiece International Private school.

11 个月

Thanks for sharing

Abdiweli Gedi

Analysis & Evidence Specialist(GIS) at International Committee of the Red Cross - ICRC

11 个月

thanks for sharing, indeed it makes the life easier for many

Interesting and thanks for sharing!!!

Peyman Abbaszadeh

Assistant Professor, Portland State University, CEE

11 个月

Thanks Keyhan for sharing this! This is very useful ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了