Download, Combine, and Visualize MODIS in python – a tutorial + code
Keyhan Gavahi
Meteorological Data Scientist @ AtmosphericG2 | Weather Forecasting | Geospatial | Remote Sensing | Climate Tech | Software Engineer
When I started my PhD in 2019, MODIS products were one of the main datasets that we used for our research. It was also one of the most challenging ones to work with. There are three main reasons that I think made me and a lot of other students struggle when working with MODIS products:
1 – MODIS does not provide x, y coordinates as separate variables.
2 – MODIS products come in tiles and therefore in many situations your area of interest might cover multiple hdf files.
3 – the Coordinate system is Sinusoidal which is different from many other products that usually use WGS84 or lambert conic.
?
First, let's import the libraries.
1. Import Libraries
import pandas as pd
import urllib.request
from bs4 import BeautifulSoup
import os
from pyhdf.SD import SD, SDC
import numpy as np
import pandas as pd
import re
from pyproj import Transformer
import xarray as xr
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
2. Download Data
The first step would be to download the data. I developed the following class which will help us to easily do that. You can use this for any MODIS product (here is the full list for terra and aqua). To be able to use it, you will need wget and a NASA EARTHDATA account. You can easily create one here. ?Let’s start working with MOD11A2 Version 6.1 product which provides an average 8-day per-pixel Land Surface Temperature. We will download all the MOD11A2 version 6.1 files that cover Contiguous United States (CONUS) for January 1st, 2021.
In Many situations, you might want to perform your data preprocessing without downloading the raw data to save time and storage. Check out this nice tutorial by Ali Ahmadalipour on how to do this for ERA5-Land daily data.
class DataPreprocess:
def __init__(self, user=None, password=None):
self.user = user
self.password = password
def dl_modis(self, path=None, product=None,
start_date=None, end_date=None, tiles=None):
#assert isinstance(tiles, list), 'tiles must be a list'
if tiles=='conus':
# 14 tiles that cover the CONUS
tiles = ["h08v04","h08v05","h08v06",
"h09v04","h09v05","h09v06",
"h10v04","h10v05","h10v06",
"h11v04","h11v05","h12v04",
"h12v05","h13v04"]
satellite_mapping = {
'MCD': 'MOTA',
'MYD': 'MOLA',
'MOD': 'MOLT'
}
satellite = satellite_mapping.get(product[:3], 'unknown')
if end_date==None:
date_range = pd.date_range(start=start_date,
end=start_date,
freq='D')
else:
date_range = pd.date_range(start=start_date,
end=end_date,
freq='D')
date_str = [str(date)[:10].replace('-', '') for date in date_range]
page_url = f'https://e4ftl01.cr.usgs.gov/{satellite}/{product}/'
uf = urllib.request.urlopen(page_url, timeout=120)
html = uf.read()
soup = BeautifulSoup(html, "html.parser")
link_list = set([link.get('href') for link in soup.find_all('a')])
filtered_links = [link for link in link_list if
any(date in link.replace('.', '') for date in date_str)]
page_urls = [page_url+link for link in filtered_links]
urls = []
for page_url in page_urls:
uf = urllib.request.urlopen(page_url, timeout=120)
html = uf.read()
soup = BeautifulSoup(html, "html.parser")
link_list = set([link.get('href') for link in soup.find_all('a')])
filtered_links = [link for link in link_list if
any(tile in link for tile in tiles)]
for link in filtered_links:
urls.append(page_url+link)
urls = set(urls)
urls = sorted(urls)
# let's save the urls in a text file to
# download them with a single wget command
txt_path = os.path.join(path, "urls.txt")
if os.path.exists(txt_path): os.remove(txt_path)
with open(txt_path, 'a') as fp:
fp.write('\n'.join(urls))
# download the files
os.system(f'wget --load-cookies .urs_cookies --save-cookies \
.urs_cookies --keep-session-cookies --user={self.user}\
--password={self.password} -P {path}\
--content-disposition -i {txt_path}')
Use your username and password in the following lines to download the raw data.
dp = DataPreprocess(user='', password='')
dp.dl_modis(path='/content', product='MOD11A2.061',
start_date='20210101',
tiles='conus')
领英推荐
3. Get Coordinates
As I mentioned MODIS does not provide x and y coordinates for all the pixels. However, the hdf files contain the corner coordinates. We can use those and the number of rows and columns in the raster image to create the x and y coordinates. The following function will do that for us.
def mod_y_x(hdf):
gridmeta = hdf.attributes(full=1)["StructMetadata.0"][0]
ul_regex = re.compile(r'''UpperLeftPointMtrs=
''', re.VERBOSE)
match = ul_regex.search(gridmeta)
x0 = float(match.group('upper_left_x'))
y0 = float(match.group('upper_left_y'))
lr_regex = re.compile(r'''LowerRightMtrs=
''', re.VERBOSE)
match = lr_regex.search(gridmeta)
x1 = float(match.group('lower_right_x'))
y1 = float(match.group('lower_right_y'))
nx, ny = data[0].shape
x = np.linspace(x0, x1, nx*2+1)
y = np.linspace(y0, y1, ny*2+1)
x = x[1::2]
y = y[1::2]
return y, x
4. Read and Combine hdf Files
Next, we will read all the hdf files and save them in a DataArray to be able to use Xarray and combine them.
dss = []
for file in files:
hdf = SD(file, SDC.READ)
DATAFIELD_NAME = 'LST_Day_1km'
# get the dataset fill value
_FillValue = hdf.select(DATAFIELD_NAME).getfillvalue()
#get the dataset min and max values: valid_range
range = hdf.select(DATAFIELD_NAME).getrange()
# get the dataset calibration coefficients scale_factor, scale_factor_err, add_offset, add_offset_err, calibrated_nt
cal_coef = hdf.select(DATAFIELD_NAME).getcal()
# Read dataset
data2D = hdf.select(DATAFIELD_NAME)
data = data2D[:,:].astype('float32')
data = data.reshape(1, data.shape[0], data.shape[1])
data = np.where((data==_FillValue) | (data < range[0]) | (data > range[1]),
np.nan, data) * cal_coef[0]
# get coordinates
lat, lon = mod_y_x(hdf)
da = xr.DataArray(
data=data,
dims=["time", "y", "x"],
coords=dict(
x=(["x"], lon),
y=(["y"], lat),
time=pd.date_range("2021-01-01", periods=1),
),
)
dss.append(da)
You can also open the hdf files using Xarray and its “netcdf4” engine, however you might end up getting compatibility errors if you don’t have the correct version of Xarray. That’s why we used pyhdf and defined the DataArrays ourselves in the for loop.
Now that we have all the 14 tiles as DataArrays saved in the dss list, we can used “combine_by_coords” to merge them.
da_comb = xr.combine_by_coords(dss)
da_comb.plot()
5. Transform and Visualize
Next, we will transform the coordinate system from Sinusoidal to WGS84 using the pyproj package.
xv, yv = np.meshgrid(da_comb.x, da_comb.y)
transformer = Transformer.from_crs("+proj=sinu +R=6371007.181 +nadgrids=@null +wktext",
"+init=EPSG:4326")
x, y= transformer.transform(xv, yv)
Finally, let’s plot the combined hdf files using basemap and with a lambert conic projection.
down = 10
up = 55
left = -130
right = -60
#m = Basemap(projection='cyl', resolution='l',
# llcrnrlat=down-.1, urcrnrlat =up+.1,
# llcrnrlon=left-.1, urcrnrlon =right+.1)
m = Basemap(llcrnrlon=-119,llcrnrlat=15,urcrnrlon=-55,urcrnrlat=52,
projection='lcc',lat_1=33,lat_2=45,lon_0=-95)
m.drawcoastlines(linewidth=0.5)
m.drawparallels(np.arange(down, up, 10),
labels=[1, 0, 0, 0])
m.drawmeridians(np.arange(left, right, 10),
labels=[0, 0, 0, 1])
pcolormesh = m.pcolormesh(x, y, da_comb[0], latlon=True, cmap='terrain_r')
fig = plt.gcf()
fig.colorbar(pcolormesh)
Cambridge & As &A level Physics teacher at knowledge Masterpiece International Private school.
11 个月Thanks for sharing
Analysis & Evidence Specialist(GIS) at International Committee of the Red Cross - ICRC
11 个月thanks for sharing, indeed it makes the life easier for many
PhD Student
11 个月Interesting and thanks for sharing!!!
Assistant Professor, Portland State University, CEE
11 个月Thanks Keyhan for sharing this! This is very useful ??