Advanced Time Series Analysis with Google Earth Engine and Python Libraries
Pulakesh Pradhan
PhD Scholar, MPhil, UGC-SRF (Focus Area: Climate and Agriculture || Rural Geography)
This guide shows how to extract time series data from Google Earth Engine (GEE) and analyze it with Python. I cover time series plotting, seasonal decomposition, anomaly detection, outlier identification, and forecasting using tools like pandas, statsmodels, scikit-learn, and pmdarima.
Introductionstats models, sci-kit-learn, and prima
This tutorial will guide you through the process of extracting Normalized Difference Vegetation Index (NDVI) data from satellite imagery using Google Earth Engine, analyzing the time series in Python, and applying machine learning techniques to detect anomalies and forecast future trends. The tutorial uses tools such as geemap, statsmodels, pmdarima, and scikit-learn.
Step 1: Setting Up Google Earth Engine
To begin, you’ll need access to Google Earth Engine (GEE). GEE provides a powerful platform for accessing and analyzing a wide range of geospatial datasets.
import ee
import geemap
import pandas as pd
import matplotlib.pyplot as plt
# Initialize the Earth Engine
# ee.Authenticate() # Uncomment if you need to authenticate
ee.Initialize()
If you’re using GEE for the first time, run ee.Authenticate() to authorize access.
Step 2: Defining the Location of Interest
In this example, we’ll define a specific location (Mountain View, CA) using latitude and longitude coordinates. You can replace this with coordinates for your own area of interest.
# Define the location of interest (e.g., a point or a polygon)
location = ee.Geometry.Point([-122.084051, 37.385348]) # Mountain View, CA
Step 3: Selecting the Satellite Imagery Dataset
We are using Landsat 8 data for this analysis. Landsat 8 provides surface reflectance data, which is useful for calculating NDVI, a measure of vegetation health.
# Define the satellite imagery dataset (e.g., Landsat 8)
dataset = ee.ImageCollection('LANDSAT/LC08/C02/T1_L2')
# Filter the dataset by location and date range
filtered_dataset = dataset.filterBounds(location).filterDate('2013-01-01', '2022-12-31')
Step 4: Calculating NDVI
The NDVI is calculated using the formula:
In Landsat 8, band 5 represents the near-infrared (NIR) light, and band 4 represents the red light.
# Define a function to calculate NDVI
def calculate_ndvi(image):
ndvi = image.normalizedDifference(['SR_B5', 'SR_B4']).rename('NDVI')
return image.addBands(ndvi)
# Map the NDVI calculation function over the filtered dataset
ndvi_dataset = filtered_dataset.map(calculate_ndvi)
Step 5: Extracting Time Series Data
We use the getRegion() function to extract NDVI data for the location over time. This returns data as a list of lists, which we convert to a Pandas DataFrame for easier handling.
# Get the time series data for the location
time_series_data = ndvi_dataset.getRegion(location, 30).getInfo()
# Extract the time property separately and format it
time_values = [item[0] for item in time_series_data[1:]]
time_values = pd.to_datetime([item.split('_')[-1] for item in time_values], format='%Y%m%d')
# Create a pandas DataFrame from the time series data (excluding time)
df = pd.DataFrame([item[1:] for item in time_series_data[1:]], columns=time_series_data[0][1:])
df['time'] = time_values
Step 6: Plotting NDVI Over Time
To visualize vegetation health, we plot NDVI against time using Matplotlib.
# Plot the NDVI values over time
plt.figure(figsize=(12, 6))
plt.plot(df['time'], df['NDVI'])
plt.xlabel('Time')
plt.ylabel('NDVI')
plt.title('NDVI Time Series')
plt.show()
This helps us visualize seasonal patterns or abrupt changes in vegetation health.
Step 7: Seasonal Decomposition
We use statsmodels to decompose the NDVI time series into its seasonal, trend, and residual components.
import statsmodels.api as sm
ndvi_series = df[['time', 'NDVI']].rename(columns={'time': 'Date'}).set_index('Date')
ndvi_series = ndvi_series.fillna(ndvi_series.mean())
# Resample the data to monthly frequency
ndvi_series = ndvi_series.resample('M').mean()
# Seasonal Decompose
decomposition = sm.tsa.seasonal_decompose(ndvi_series['NDVI'], model='additive')
decomposition.plot()
The decomposition plot helps to distinguish between seasonal effects, trends, and irregular fluctuations in the NDVI data.
Step 8: Forecasting NDVI with ARIMA
We fit an ARIMA model to the NDVI series for forecasting future values.
# Apply ARIMA for forecasting
arima_model = sm.tsa.ARIMA(ndvi_series['NDVI'], order=(5,1,0))
arima_result = arima_model.fit()
# Forecast the next 12 months
forecast = arima_result.forecast(steps=12)
print(forecast)
# Print ARIMA model statistics
print(arima_result.summary())
Step 9: Anomaly Detection Using Isolation Forest
We use the IsolationForest algorithm from scikit-learn to detect anomalies in the NDVI data.
from sklearn.ensemble import IsolationForest
import numpy as np
# Apply Isolation Forest for anomaly detection
X = ndvi_series['NDVI'].values.reshape(-1, 1)
model = IsolationForest(contamination=0.1)
model.fit(X)
anomalies = model.predict(X)
# Detect anomalies (outliers)
outliers = X[anomalies == -1]
print(outliers)
# Plot the data with anomalies highlighted
plt.figure(figsize=(10, 6))
plt.plot(ndvi_series['NDVI'], label='Normal Data')
plt.scatter(np.where(anomalies == -1)[0], outliers, color='red', label='Outliers')
plt.title('Anomaly Detection using Isolation Forest')
plt.xlabel('Time')
plt.ylabel('NDVI')
plt.legend()
plt.show()
This helps to identify periods where vegetation health deviates significantly from normal patterns, potentially indicating disturbances such as droughts or disease.
Step 10: Forecasting NDVI with Auto ARIMA (pmdarima)
Finally, we use the pmdarima library to automatically select the best ARIMA parameters and forecast future values.
import pmdarima as pm
# Auto ARIMA model
auto_arima_model = pm.auto_arima(ndvi_series['NDVI'], seasonal=True, m=12)
# Forecast future values
forecast = auto_arima_model.predict(n_periods=12)
forecast_dates = [ndvi_series.index[-1] + pd.DateOffset(months=i) for i in range(1, 13)]
# Plot the forecasted values
plt.figure(figsize=(10, 6))
plt.plot(ndvi_series.index, ndvi_series['NDVI'], label='Actual')
plt.plot(forecast_dates, forecast, label='Forecast', linestyle='--', marker='o')
plt.legend()
plt.title('NDVI Forecast')
plt.xlabel('Date')
plt.ylabel('NDVI')
plt.show()
The auto_arima function simplifies the process of choosing the best model for forecasting and helps in making more accurate predictions.
Conclusion
In this tutorial, we’ve seen how to extract and analyze NDVI time series using Google Earth Engine and Python. We covered NDVI calculation, seasonal decomposition, anomaly detection, and forecasting. This end-to-end approach offers a comprehensive view of vegetation health, helping us detect changes and make informed predictions.
Feel free to modify the location, data range, or add new analysis methods to suit your specific needs. Understanding these variations in NDVI can provide valuable insights into the ecological health and productivity of different regions.
?? More insights & projects: pulakeshpradhan.github.io