登录查看更多内容

Apply Docker Compose To Deploy A Jupyter Environment For Processing Meteorological Data

Chonghua Yin

Head of Data Science | Climate Risk & Extreme Event Modeling | AI & Geospatial Analytics

发布日期: 2019年5月12日

We've demonstrated how to create a Jupyter notebook container for meteorological data processing in the previous notebook, where we followed a hierarchy way and built our image based on the data science stack image. In fact, Project Jupyter maintains a GitHub repository containing numerous well-defined, well tested, fully functional and ready to use Jupyter Docker images with different tools already installed. This could save us a lot of time. All we need to do is tell Docker to start up a container based on that pre-defined image. This is the reason I love docker and use it for my work every day.

A dependency tree taken from their documentation is shown below:

In this notebook, let's look at how we can use Docker Compose to rapidly deploy a Jupyter environment for processing meteorological/climatic data, even though using Docker Compose for our current project just looks like we are using a cannon to hit a mosquito.

Docker Compose

Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration.

Before moving forward, make sure you have Docker and Docker Compose installed. Notes on installing Docker and Docker Compose can be found at Docker and Docker Compose.

1. Define directory structure

Once Docker and Docker Compose are ready to go, let's define the file structure of our project. Start by creating a couple of directories in your current directory: data, docker, and notebooks. Within the docker directory, create a file Docker?le, and then in your current directory, create a file docker-compose.yml.

The working directory should have the following file structure:

├── data

├── docker-compose.yml

├── docker

│ └── Docker?le

├── notebooks

2. Create Dockerfile

Just as we did before, we will create our own Dockerfile on the data science stack image. While the bulk of what's needed is provided by the base image, it's useful to define our own Dockerfile to include dependencies that may not be included in the base image. In this case, we're installing a handful of dependencies that are useful for meteorological data processing such as dask, xarray, cfgrib and metpy.

Add the following contents to ./jupyter/Docker?le:

FROM jupyter/datascience-notebook

MAINTAINER Chonghua Yinroyalosyin@gmail.com

RUN conda install --quiet --yes -c conda-forge dask cfgrib metpy

WORKDIR /notebooks

VOLUME /notebooks

In the above Dockerfile, we define the working directory at /notebook with the keyword of WORKDIR and use the VOLUME to tell Docker that the /notebooks directory would be stored on the host file system, not in the container file system. This implies that those really-working notebooks stored in the volume will persist and be available even if you stop the container and remove the container with docker rm.

3. Create docker-compose.yml

Now that we have our Dockerfile defined, let's add some content to ./docker-compose.yml that will help us get Jupyter kicked off.

Add the following content to ./docker-compose.yml:

version: "3"

services:

jupyter:

build:

  context: ./docker

ports:
  - "8888:8888" 

volumes:

      - "./notebooks:/notebooks" 

      - "./data:/data"

This file defines only one service: jupyter. The jupyter service will be built from the Dockerfile defined above (/docker). We also mount two working directories created above, ./notebooks and ./data to the jupyter service at /notebooks and /data, respectively.

For more information about the Compose file, see the Compose file reference.

4. Start docker-compose

If all the above went well, all that should be left to get things spinning is running the following in our working directory:

$ docker-compose up

This should result in the jupyter container being built, and then the jupyter service being started. You should see something like this in your console:

The first time we access the Jupyter web interface, we need to use the link provided in the console, subsequently we can directly visit http:/ localhost:8888. Click the link in the console - a browser should open to the following view:

From here, you can create a new notebook - the .ipynb file will live in the host machine's filesystem at ./notebooks. You can use the ./data directory to store any data that you might want to access from your notebook. In addition, to shut down the container once you’re done working, simply hit Ctrl-C in the terminal/command prompt. Your work will all be saved on your actual machine in the path we set in our Docker compose file. And there you have it — a quick and easy way to start using Jupyter notebooks with the magic of Docker.

5. Start Jupyter notebook show

Here we put the downloaded GFS GRIB file of gfs.t06z.pgrb2.0p25.f000 into the /data folder and We still use xarray with the cfgrib backend to open the file.

!ls /data

The output should look like:

gfs.t06z.pgrb2.0p25.f000 gfs.t06z.pgrb2.0p25.f000.887f8.idx

Now let's check the variable of surface temperature

%matplotlib inline
import xarray as xr

ds = xr.open_dataset("/data/gfs.t06z.pgrb2.0p25.f000",  
                     engine="cfgrib", 
                     backend_kwargs={
                        'filter_by_keys': {'typeOfLevel': 'pressureFromGroundLayer'},
                        'errors': 'ignore'
                    })

temperature = ds['t']

temperature.plot(figsize=(16,7));

Summary

Docker Compose is particularly useful for the data scientist in building standalone computational systems comprised of Jupyter and one or more data stores. Docker-compose could work in all environments: production, staging, development, testing, as well as CI workflows.

Moreover, using Docker Compose just need simply follow a three-step process:

Define your app’s environment with a Dockerfile so it can be reproduced anywhere.
Define the services that make up your app in docker-compose.yml so they can be run together in an isolated environment.
Run docker-compose up and Compose starts and runs your entire app.

In this notebook, we just gave a simple try. However, we still could find that it is really convenient to apply Docker Compose to rapidly deploy a Jupyter environment for processing meteorological/climatic data.

要查看或添加评论，请登录

Chonghua Yin的更多文章

SPEI: A Smarter Way to Measure Drought

2025年3月8日

SPEI: A Smarter Way to Measure Drought

When we think about drought, we often focus on rainfall—how much (or little) precipitation a place receives. But is…
NaN Wrangling: LOESS/LOWESS to the Rescue

2025年3月8日

NaN Wrangling: LOESS/LOWESS to the Rescue

Have you ever tried interpolating geospatial data near coastlines, only to find your results ruined by NaN (Not a…

2 条评论
Efficient Geospatial Nearest Neighbor Search with KDTree and xarray

2025年3月1日

Efficient Geospatial Nearest Neighbor Search with KDTree and xarray

When working with large-scale geospatial data, efficient nearest neighbor search is crucial. This article explores how…

1 条评论
Unlocking Data's Potential: Four Types of Analytics

2025年2月21日

Unlocking Data's Potential: Four Types of Analytics

In today's data-driven world, businesses that can harness the power of analytics gain a significant competitive edge…
Analytics: Team Driven

2025年2月19日

Analytics: Team Driven

A data analytics team’s strength doesn’t come from a single exceptional individual but from the collective impact of…
Secret to Product Longevity: Simplicity, Support, and Feedback

2025年2月15日

Secret to Product Longevity: Simplicity, Support, and Feedback

In today's rapidly evolving tech landscape, products constantly emerge and transform. Yet, some stand the test of time,…
Flying High: A Simple Metaphor for Business

2025年2月11日

Flying High: A Simple Metaphor for Business

I recently discussed the relationship between marketing and sales with a friend. During our conversation, he used a…
Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

2025年2月6日

Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

In business management, grasping project and product life cycle disparities is paramount. Although both concepts entail…
Separating Data APIs and Business Logic with an API Gateway

2025年1月23日

Separating Data APIs and Business Logic with an API Gateway

Today, I conversed with a friend about separating data APIs from business logic. Coincidentally, my friend is a wine…
Direct Access to NetCDF Files in TAR Archives

2024年8月30日

Direct Access to NetCDF Files in TAR Archives

Recently, I need to validate the performance of wind data from CONUS404 against observational data at a specific site…

See all articles

Apply Docker Compose To Deploy A Jupyter Environment For Processing Meteorological Data

Chonghua Yin

Head of Data Science | Climate Risk & Extreme Event Modeling | AI & Geospatial Analytics

1. Define directory structure

2. Create Dockerfile

3. Create docker-compose.yml

4. Start docker-compose

5. Start Jupyter notebook show

Summary

Chonghua Yin的更多文章

社区洞察

其他会员也浏览了

5 Reasons for Why You Should Attend the Spatial Data Science Conference 2023 in New York

Open Geospatial Data: Democratizing Access to Dynamic Intelligence

Geospatial Analytics | KNIME: My Journey from Montoro to Barbate

GEOSPATIAL ANALYTICS

Exploring Geographic Databases: Understanding and Implementing Geospatial Queries with PostGIS

Understanding Geospatial Data

Guru News - January 2025

Mistral Model High-Level Architecture and Deployment Structure

Exploring Machine Learning Techniques for Geospatial Tabular Data

Visualizing Marketing Geo Data with PyGWalker: As Simple as Mapping Meteorite Strikes

1. Define directory structure

2. Create Dockerfile

3. Create docker-compose.yml

4. Start docker-compose

5. Start Jupyter notebook show

Summary

Chonghua Yin的更多文章

SPEI: A Smarter Way to Measure Drought

NaN Wrangling: LOESS/LOWESS to the Rescue

Efficient Geospatial Nearest Neighbor Search with KDTree and xarray

Unlocking Data's Potential: Four Types of Analytics

Analytics: Team Driven

Secret to Product Longevity: Simplicity, Support, and Feedback

Flying High: A Simple Metaphor for Business

Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

Separating Data APIs and Business Logic with an API Gateway

Direct Access to NetCDF Files in TAR Archives

社区洞察

其他会员也浏览了

5 Reasons for Why You Should Attend the Spatial Data Science Conference 2023 in New York

Open Geospatial Data: Democratizing Access to Dynamic Intelligence

Geospatial Analytics | KNIME: My Journey from Montoro to Barbate

GEOSPATIAL ANALYTICS

Exploring Geographic Databases: Understanding and Implementing Geospatial Queries with PostGIS

Understanding Geospatial Data

Guru News - January 2025

Mistral Model High-Level Architecture and Deployment Structure

Exploring Machine Learning Techniques for Geospatial Tabular Data

Visualizing Marketing Geo Data with PyGWalker: As Simple as Mapping Meteorite Strikes