登录查看更多内容

Visualize CO2 Time Series with Python

Chonghua Yin

Head of Data Science | Climate Risk & Extreme Event Modeling | AI & Geospatial Analytics

发布日期: 2018年5月17日

Nowadays, when people talk about the rise of our planet's average surface temperature, they will inevitably mention carbon dioxide and other Greenhouse Gases (GHGs). We can easily check the latest CO2 data using Python. CO2 data can be downloaded from esrl, covering the period from Mar/1958 to Apr/2018. CO2 expressed as a mole fraction in dry air, micromol/mol, abbreviated as ppm.

The data are a typical time series data, which are one of the most common data types. One powerful yet simple method for analyzing and predicting periodic data is the additive model. The idea is straightforward: represent a time-series as a combination of patterns at different scales such as daily, weekly, seasonally, and yearly, along with an overall trend.

In this notebook, we will introduce some common techniques used in time-series analysis and walk through the iterative steps required to manipulate, visualize time-series data.

1. Load all needed libraries

import pandas as pd
import statsmodels.api as sm
from matplotlib import pyplot as plt

import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

# Set some parameters to apply to all plots. These can be overridden
import matplotlib
# Plot size to 12" x 7"
matplotlib.rc('figure', figsize = (12, 7))
# Font size to 14
matplotlib.rc('font', size = 14)
# Do not display top and right frame lines
matplotlib.rc('axes.spines', top = False, right = False)
# Remove grid lines
matplotlib.rc('axes', grid = False)
# Set backgound color to white
matplotlib.rc('axes', facecolor = 'white')

2. Read CO2 time series data

2.1 Load data

co2 = pd.read_csv('data\co2_mm_mlo.txt', 
                  skiprows=72,
                  header=None, 
                  comment = "#", 
                  delim_whitespace = True, 
                  names = ["year", "month", "decimal_date", "average", "interpolated", "trend", "days"],
                  na_values =[-99.99, -1])

co2['Date'] = co2['year']*100 + co2['month']
co2['Date'] = pd.to_datetime(co2['Date'], format='%Y%m', errors='ignore')
co2.set_index('Date', inplace=True)

2.2 Drop other columns, only keep the original data?

co2.drop(["year", "month", "decimal_date", "interpolated",  "trend", "days"], axis=1, inplace=True)
co2.head()

2.3 Handle missing values

Real world data tends to be messy. Data can have missing values for a number of reasons such as observations that were not recorded and data corruption. Handling missing data is important as many data analysis algorithms do not support data with missing values.

The simplest way is using the command of isnull to reveal missing data.

co2.isnull().sum()
[Out]: average    7

There are 7 months with missing values in our time series.

The simplest strategy for handling missing data is to drop those records that contain a missing value. Pandas provides the dropna() function that can be used to drop either columns or rows with missing data. The syntax of drop rows with missing values looks like: dataset.dropna(inplace=True).

However, we should "fill in" missing values if they are not too numerous so that we don’t have gaps in the data. This can be done using the fillna() command in pandas. The filling methods consist of

backfill
bfill
pad
ffill
None (default)

For simplicity, missing values are filled with the closest non-null value in CO2 time series, although it is important to note that a rolling mean would sometimes be preferable.

co2 = co2.fillna(co2.bfill())

Now the number of missing values should be 0.

co2.isnull().sum()
[Out] average    0

3. Visualizing CO2 Time-series Data?

3.1 Start with a quick plot

It is very easy to use Pandas to plot the co2 time series. Moreover, deeper analysis always starts with the first view of data.

co2.plot(title='Monthly CO2 (ppm)')

From the above image, we can find that there may be a linear trend, but it is hard to be sure from eye-balling. Moreover, it has an obvious seasonality pattern, but the amplitude (height) of the cycles appears to be stable, suggesting that it should be suitable for an additive model.

We can also visualize our data using a method called time-series decomposition. As its name suggests, time series decomposition allows us to decompose our time series into three distinct components: trend, seasonality, and noise.

3.2 Decompose time-series

Seasonal_decompose function provided by statsmodels is applied to perform seasonal decomposition of the CO2 data.

decomposition = sm.tsa.seasonal_decompose(co2, model='additive')
fig = decomposition.plot()

Each component of decomposition is accessible via:

decomposition.resid
decomposition.seasonal
decomposition.trend

For example, we can check the trend in 1991.

decomposition.trend['1991']

Summary

The plot above clearly shows an upward trend of the monthly CO2, along with a stable seasonality using time-series decomposition.

References

Seabold, Skipper, and Josef Perktold. “Statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference. 2010.

John D. Hunter. Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9, 90-95 (2007), DOI:10.1109/MCSE.2007.55

Wes McKinney. Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference, 51-56 (2010)

pandas: a Foundational Python Library for Data Analysis and Statistics; presented at PyHPC2011

https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.seasonal_decompose.html

https://climatedataguide.ucar.edu/climate-data-tools-and-analysis/trend-analysis

要查看或添加评论，请登录

Chonghua Yin的更多文章

SPEI: A Smarter Way to Measure Drought

2025年3月8日

SPEI: A Smarter Way to Measure Drought

When we think about drought, we often focus on rainfall—how much (or little) precipitation a place receives. But is…
NaN Wrangling: LOESS/LOWESS to the Rescue

2025年3月8日

NaN Wrangling: LOESS/LOWESS to the Rescue

Have you ever tried interpolating geospatial data near coastlines, only to find your results ruined by NaN (Not a…

2 条评论
Efficient Geospatial Nearest Neighbor Search with KDTree and xarray

2025年3月1日

Efficient Geospatial Nearest Neighbor Search with KDTree and xarray

When working with large-scale geospatial data, efficient nearest neighbor search is crucial. This article explores how…

1 条评论
Unlocking Data's Potential: Four Types of Analytics

2025年2月21日

Unlocking Data's Potential: Four Types of Analytics

In today's data-driven world, businesses that can harness the power of analytics gain a significant competitive edge…
Analytics: Team Driven

2025年2月19日

Analytics: Team Driven

A data analytics team’s strength doesn’t come from a single exceptional individual but from the collective impact of…
Secret to Product Longevity: Simplicity, Support, and Feedback

2025年2月15日

Secret to Product Longevity: Simplicity, Support, and Feedback

In today's rapidly evolving tech landscape, products constantly emerge and transform. Yet, some stand the test of time,…
Flying High: A Simple Metaphor for Business

2025年2月11日

Flying High: A Simple Metaphor for Business

I recently discussed the relationship between marketing and sales with a friend. During our conversation, he used a…
Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

2025年2月6日

Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

In business management, grasping project and product life cycle disparities is paramount. Although both concepts entail…
Separating Data APIs and Business Logic with an API Gateway

2025年1月23日

Separating Data APIs and Business Logic with an API Gateway

Today, I conversed with a friend about separating data APIs from business logic. Coincidentally, my friend is a wine…
Direct Access to NetCDF Files in TAR Archives

2024年8月30日

Direct Access to NetCDF Files in TAR Archives

Recently, I need to validate the performance of wind data from CONUS404 against observational data at a specific site…

See all articles

Visualize CO2 Time Series with Python

Chonghua Yin

Head of Data Science | Climate Risk & Extreme Event Modeling | AI & Geospatial Analytics

1. Load all needed libraries

2. Read CO2 time series data

2.1 Load data

2.2 Drop other columns, only keep the original data?

2.3 Handle missing values

3. Visualizing CO2 Time-series Data?

3.1 Start with a quick plot

3.2 Decompose time-series

Summary

References

Chonghua Yin的更多文章

社区洞察

其他会员也浏览了

Mastering Matplotlib: Easy Plotting Tips and Common Pitfalls Explained

Automate Data Visualization for Geotechnical Interpretive Report with Power BI and?Python

+30 Useful Operations in Pandas ??

6th Story – If You can Visualize It. You can Explain It

Least Cost Path Analysis with A* Algorithm

How To make Interactive Plot Graph For Statistical Data Visualization Using Seaborn Python library

A complete Exploratory Data Analysis in Python

Heuristics Search Technique to Find shortest distance between two cities.

The Chance Framework: How to Explain A/B Test Results to Managers Using Probability (Without p-values)

Using Multiple Regression To Examine What Variables Are Most Correlated With A Movie’s Box Office Success

1. Load all needed libraries

2. Read CO2 time series data

2.1 Load data

2.2 Drop other columns, only keep the original data?

2.3 Handle missing values

3. Visualizing CO2 Time-series Data?

3.1 Start with a quick plot

3.2 Decompose time-series

Summary

References

Chonghua Yin的更多文章

SPEI: A Smarter Way to Measure Drought

NaN Wrangling: LOESS/LOWESS to the Rescue

Efficient Geospatial Nearest Neighbor Search with KDTree and xarray

Unlocking Data's Potential: Four Types of Analytics

Analytics: Team Driven

Secret to Product Longevity: Simplicity, Support, and Feedback

Flying High: A Simple Metaphor for Business

Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

Separating Data APIs and Business Logic with an API Gateway

Direct Access to NetCDF Files in TAR Archives

社区洞察

其他会员也浏览了

Mastering Matplotlib: Easy Plotting Tips and Common Pitfalls Explained

Automate Data Visualization for Geotechnical Interpretive Report with Power BI and?Python

+30 Useful Operations in Pandas ??

6th Story – If You can Visualize It. You can Explain It

Least Cost Path Analysis with A* Algorithm

How To make Interactive Plot Graph For Statistical Data Visualization Using Seaborn Python library

A complete Exploratory Data Analysis in Python

Heuristics Search Technique to Find shortest distance between two cities.

The Chance Framework: How to Explain A/B Test Results to Managers Using Probability (Without p-values)

Using Multiple Regression To Examine What Variables Are Most Correlated With A Movie’s Box Office Success