登录查看更多内容

Cracking Transit Data — Calgary 2025

Tanu Nanda Prabhu

Technical Writer | Full-Stack Developer (Python, Django, React) | Former Assistant Manager at Excel Promotions | Educator & Content Strategist

发布日期: 2025年1月15日

+ 关注

How to Decode and Leverage GTFS for Real-Time Transit Insights

Introduction

In the digital age, public transportation systems increasingly leverage technology to provide passengers with real-time data, enabling a more seamless travel experience. Transit systems share this data primarily through the General Transit Feed Specification (GTFS), a standard that provides data on schedules, routes, and real-time updates. However, while this data can be incredibly valuable, accessing and interpreting it can be challenging, especially when combining static and real-time feeds.

This article shows you how to use Calgary Transit’s data about bus locations and routes to get useful information. If you work with data, understanding these datasets can help improve transit planning and make it better for users. We’ll cover the essential steps for fetching and processing Calgary Transit’s static and real-time data, including troubleshooting common issues you might encounter. By the end of this guide, you’ll be well-equipped to tap into Calgary’s transit data to solve real-world problems.

GTFS

The General Transit Feed Specification (GTFS) is an open standard that formats public transport schedules and geographic data. GTFS allows public transit agencies to publish their data in a format that various software applications can consume, such as trip planners and API developers. This allows users easy access to travel information on smartphones and other devices.

GTFS includes information such as

Trip Name
Stops
Bus Routes
Fares
Time and more

When working with GTFS data, it’s important to understand the source and format of the data you are using. To fully explore and use the transit’s GTFS feeds, including the static and real-time data, visit their official website.

What is GTFS - General Transit Feed Specification

Data Portal

We can use Calgary’s Open Data Portal to get real-time transit vehicle locations. Calgary Transit provides real-time data through the General Transit Feed Specification Realtime (GTFS-RT) format. These feeds offer live information on vehicle positions, trip updates, and service alerts.

Available GTFS-RT Feeds

Vehicle Positions: Provides real-time locations of transit vehicles.
Trip Updates: Offers real-time updates on scheduled trips, including delays and cancellations.
Service Alerts: Contains information on disruptions or changes in service.

Accessing the Feeds

These feeds are accessible via Calgary’s Open Data Portal:

Vehicle Positions Feed — Calgary Open Data
Trip Updates Feed — Calgary Open Data
Service Alerts — Calgary Open Data

Vehicle Positions Feed

Handling GTFS-RT Feeds with Python

Install Required Libraries

pip install requests protobuf

The requests library is used to send HTTP requests in Python. Whereas the protobuf is a library for working with Protocol Buffers (Protobuf), a method developed by Google for serializing structured data. Long story short, if you are handling GTFS-RT feed, then requests is used to fetch and protobuf is used to parse the data from an API.

Generating Python code from a Protocol Buffer

We need to use the Protocol buffer compiler protoc1, a tool provided by Google for working with .proto files. We shall use the gtfs-realtime.proto Protobuf definition file. It describes the structure of GTFS-RT messages, including FeedMessage, FeedHeader, FeedEntity, etc. The technical documentation follows below.

Protobuf - General Transit Feed Specification

!protoc --python_out=. gtfs-realtime.proto

Running this command creates a Python module, which is gtfs_realtime_pb2.py. If you don’t want the hassle, you can skip this step by just manually uploading the gtfs_realtime_pb2.py to your drive. Click here to access the file. To upload a file on Google Colab, you can write as:

from google.colab import files
files.upload()

Your directory should look something like this. Use the !ls command.

drive  gtfs_realtime_pb2.py  gtfs-realtime.proto  __pycache__

The whole point of using gtfs_realtime_pb2.py is to provide Python classes and methods that make it easy to work with GTFS-RT Protobuf (Protocol Buffers) encoded data. Without it, manual parsing and interpretation of the binary data would be necessary, which is both error-prone and impractical.

Fetch Calgary Transit’s GTFS-RT Feed

Now let’s fetch and parse the data from the feed. Try uploading the gtfs_realtime_pb2.py manually because if there is a mismatch between versions of the .proto file, then that will lead to a TypeError.

# Import the requests library to handle HTTP requests
import requests
import pandas as pd

def fetch_gtfs_rt_feed(url):
    """
    Fetches GTFS real-time data from the given URL.
    Args:
        url (str): The URL to fetch the GTFS real-time data from.
    Returns:
        feed: A GTFS-RT FeedMessage object containing the parsed data, or None if an error occurs.
    """
    try:
        # Send a GET request to the provided URL
        response = requests.get(url)
        
        # Check if the response status code is 200 (OK)
        if response.status_code == 200:
            # Initialize a FeedMessage object from gtfs_realtime_pb2
            feed = gtfs_realtime_pb2.FeedMessage()
            
            # Parse the response content into the FeedMessage object
            feed.ParseFromString(response.content)
            return feed
        else:
            # Print an error message if the status code is not 200
            print(f"Error fetching data: {response.status_code} - {response.reason}")
            return None
    except Exception as e:
        # Print an error message if an exception occurs
        print(f"An error occurred: {e}")
        return None
# URL for GTFS real-time vehicle positions data
vehicle_positions_url = "https://data.calgary.ca/download/am7c-qe3u/application%2Foctet-stream"
# Fetch the GTFS real-time data from the specified URL
feed = fetch_gtfs_rt_feed(vehicle_positions_url)
# Check if the feed was fetched successfully
if feed:
    vehicle_data = []  # Initialize an empty list to store vehicle information
    
    # Loop through the first 5 entities in the feed
    for entity in feed.entity[:5]:  # [:5] ensures we only process the first 5 entities
        if entity.HasField('vehicle'):  # Check if the entity contains vehicle data
            vehicle = entity.vehicle  # Extract the vehicle field
            
            # Create a dictionary with relevant vehicle information
            vehicle_info = {
                "Vehicle ID": vehicle.vehicle.id,  # Vehicle identifier
                "Latitude": vehicle.position.latitude,  # Latitude position of the vehicle
                "Longitude": vehicle.position.longitude  # Longitude position of the vehicle
            }
            
            # Append the vehicle information dictionary to the list
            vehicle_data.append(vehicle_info)
    
    # Create a DataFrame from the list of dictionaries
    df = pd.DataFrame(vehicle_data)
    
    # Print the DataFrame to display the data in a tabular format
    print(df)

Explanation

First, you are using the requests library to fetch the data from the URL. Second, you execute the function fetch_gtfs_rt_feed that retrieves and parses the GTFS real-time feed. Third, you check the response to ensure successful data retrieval. Fourth, the system implements error handling to catch and print any errors. Finally, we then process the feed, which extracts and prints vehicle information like ID and position if available. To neaten the results, I used the Pandas library to display the data.

Output

领英推荐

Shaping the Future of Advanced Traffic Management…

Aimsun 7 个月前

Put Location Data on Map: A Friendly Guide to…

Mapstack 12 个月前

Identifying Key Points of Interest with Geospatial Data

ADVINTEK 2 个月前

Data Frame that shows Vehicle ID, Latitude and Longitude

The data that you are seeing is raw real-time information that comprises vehicle ID and its position (Latitude and Longitude). In some cases, the vehicle ID might directly relate to the Bus number (For example Vehicle ID 1280 will be Bus 128). It all depends on the City’s style of encoding the data. Contact the City Commission to get more accurate results.

Mapping the Location

Let’s use folium library to map the data. Folium easily visualizes data manipulated in Python. Visit the documentation below.

Folium — Folium 0.19.3 documentation

You need to install the folium library

!pip install folium

Folium is used to create interactive maps. The reason I am using it is that it’s simple and easy to understand. Let me know if you come across any other similar libraries that get the job done.

import folium  # Importing the folium library to work with interactive maps

def plot_vehicle_on_map(latitude, longitude, vehicle_id):
    """
    Plots the vehicle's location on a map using its latitude, longitude, and ID.
    
    Parameters:
    latitude (float): The latitude of the vehicle's location.
    longitude (float): The longitude of the vehicle's location.
    vehicle_id (str): The unique identifier for the vehicle.
    
    Returns:
    folium.Map: A Folium map centered on the vehicle's location with a marker.
    """
    # Create a map centered at the vehicle's location with a zoom level of 14
    vehicle_map = folium.Map(location=[latitude, longitude], zoom_start=14)
    
    # Add a marker to the map at the vehicle's location
    folium.Marker(
        location=[latitude, longitude],  # The latitude and longitude of the marker
        popup=f"Vehicle ID: {vehicle_id}",  # Popup text to display when the marker is clicked
        icon=folium.Icon(color="blue", icon="bus", prefix="fa"),  # Custom icon for the marker
    ).add_to(vehicle_map)  # Add the marker to the map
    
    return vehicle_map  # Return the map object
# Example data for a vehicle's location and ID
latitude = 50.997478   # Example latitude value
longitude = -114.066544   # Example longitude value
vehicle_id = "8080"  # Example vehicle ID
# Generate the map with the example vehicle data
map_output = plot_vehicle_on_map(latitude, longitude, vehicle_id)
map_output  # Display the map

Output

Upon execution of the above code, you will get the exact location of the Bus with the ID of 8080. Now keep in mind this data is dynamic. According to the Calgary Data Portal, this data changes every 30 minutes or even sooner.

Trip Updates Feed

The GTFS Realtime Trip Updates feed contains information about real-time updates to scheduled trips, such as delays, changes in stop times, and other dynamic data. But in this case, the Calgary real-time feed only provides Trip ID, Start Time, Start Date, Stop ID, Arrival Time, and Departure Time. The feed design reflects trip delays; for example, a trip scheduled for 8:00 AM with a 10-minute delay will show this updated information. The update will also include this ID if bus 8080 is running the trip.

import requests
import gtfs_realtime_pb2  # Import the compiled GTFS Realtime protocol buffer
import pandas as pd
from datetime import datetime

def fetch_gtfs_rt_trip_updates(url):
    """Fetches and parses the GTFS Realtime Trip Updates feed from the given URL."""
    try:
        # Make a GET request to fetch the data from the specified URL
        response = requests.get(url)
        if response.status_code == 200:
            # Parse the response content into a FeedMessage object
            feed = gtfs_realtime_pb2.FeedMessage()
            feed.ParseFromString(response.content)
            return feed
        else:
            # Print an error message if the response status is not OK
            print(f"Error fetching data: {response.status_code} - {response.reason}")
            return None
    except Exception as e:
        # Catch and print any exceptions that occur during the request
        print(f"An error occurred: {e}")
        return None
def extract_trip_updates(feed):
    """Extracts trip update information from the GTFS Realtime feed."""
    trip_updates = []
    
    # Loop through each entity in the feed
    for entity in feed.entity:
        if entity.HasField('trip_update'):
            # Extract the trip update data
            trip_update = entity.trip_update
            trip_id = trip_update.trip.trip_id
            start_time = trip_update.trip.start_time
            start_date = trip_update.trip.start_date
            
            # Loop through each stop time update in the trip update
            for stop_time_update in trip_update.stop_time_update:
                stop_id = stop_time_update.stop_id
                # Extract arrival and departure times, if available
                arrival_time = stop_time_update.arrival.time if stop_time_update.HasField('arrival') else None
                departure_time = stop_time_update.departure.time if stop_time_update.HasField('departure') else None
                
                # Convert timestamps to human-readable format
                arrival_time = datetime.utcfromtimestamp(arrival_time).strftime('%Y-%m-%d %H:%M:%S') if arrival_time else None
                departure_time = datetime.utcfromtimestamp(departure_time).strftime('%Y-%m-%d %H:%M:%S') if departure_time else None
                
                # Add the extracted information to the trip updates list
                trip_updates.append({
                    "Trip ID": trip_id,
                    "Start Time": start_time,
                    "Start Date": start_date,
                    "Stop ID": stop_id,
                    "Arrival Time": arrival_time,
                    "Departure Time": departure_time
                })
    return trip_updates
# URL for GTFS Realtime Trip Updates
trip_updates_url = "https://data.calgary.ca/download/gs4m-mdc2/application%2Foctet-stream"  # Replace with the actual Trip Updates URL
# Fetch the trip updates feed
feed = fetch_gtfs_rt_trip_updates(trip_updates_url)
if feed:
    # Extract the trip updates from the feed
    trip_updates = extract_trip_updates(feed)
    
    # Convert the trip updates into a DataFrame for easy manipulation and display
    df_trip_updates = pd.DataFrame(trip_updates)
    
    # Display the first 10 rows of the DataFrame
    print(df_trip_updates.head(10))

Output

If you’re not receiving start_time and start_date from the GTFS Realtime Trip Updates, it might be because those fields are optional and not always provided in the feed. This might be for security reasons as well. Upon contacting the City’s Transit service, you can get this vital information. I hope you understand the point.

Calgary Transit Realtime Trip Updates GTFS-R

Side Note

You see how there are different IDs involved, such as Trip and the Stop. Now, if all the data is available, you can easily find the corresponding bus that has been running in those routes. Let me know if you can connect those dots. I would be happy to Colab with you and make this a working project.

Service Alerts

Calgary Transit refreshes its real-time data every half minute. To learn more about the GTFS-RT specification and its components (Trip Updates, Service Alerts, and Vehicle Positions), check out the Google Transit API page. Also, see Service Updates. Let’s see what the service alerts look like.

import requests  # Library to make HTTP requests
import gtfs_realtime_pb2  # Ensure this proto file is compiled as Python
import pandas as pd  # For handling and displaying data in DataFrame format

# Function to fetch the GTFS Realtime Alerts feed
def fetch_gtfs_rt_alerts(url):
    """Fetches and parses the GTFS Realtime Alerts feed from the given URL."""
    try:
        # Send a request to the URL to get the feed
        response = requests.get(url)
        
        # Check if the response is successful (status code 200)
        if response.status_code == 200:
            # Parse the feed using GTFS Realtime protocol
            feed = gtfs_realtime_pb2.FeedMessage()
            feed.ParseFromString(response.content)
            return feed  # Return the parsed feed
        else:
            # Print error if the response status is not 200
            print(f"Error fetching data: {response.status_code} - {response.reason}")
            return None
    except Exception as e:
        # Catch and print any exception that occurs during the request
        print(f"An error occurred: {e}")
        return None

# Function to extract alerts from the GTFS Realtime feed
def extract_alerts(feed):
    """Extracts alert information from the GTFS Realtime feed."""
    alerts = []  # Initialize an empty list to store alert information
    
    # Loop through each entity in the feed
    for entity in feed.entity:
        # Check if the entity contains an alert
        if entity.HasField('alert'):
            alert = entity.alert
            # Extract relevant fields from the alert
            alert_id = entity.id  # Unique ID for the alert
            # Extract header text from the alert (if available)
            header_text = alert.header_text.translation[0].text if alert.header_text.translation else "No header"
            # Extract description text from the alert (if available)
            description_text = alert.description_text.translation[0].text if alert.description_text.translation else "No description"
            severity_level = alert.severity_level  # Severity level of the alert (e.g., low, medium, high)
            
            # Append the extracted alert information to the alerts list
            alerts.append({
                "Alert ID": alert_id,
                "Header": header_text,
                "Description": description_text,
                "Severity Level": severity_level
            })
    
    # Return the list of alerts
    return alerts

# URL for GTFS Realtime Alerts (replace with the actual URL)
alerts_url = "https://data.calgary.ca/download/alerts_feed_url"  # Example placeholder URL

# Fetch the alerts feed
feed = fetch_gtfs_rt_alerts(alerts_url)

# If feed is fetched successfully, extract the alerts
if feed:
    alerts = extract_alerts(feed)
    
    # Convert the list of alerts into a DataFrame for easier viewing
    df_alerts = pd.DataFrame(alerts)
    
    # Display the first 5 rows of the alerts DataFrame
    print(df_alerts.head(5))

You can read the documentation of the Service Alerts below.

Calgary Transit Realtime Service Alerts GTFS-RT

Upon executing the above code snippet, I noticed that the data was not in the readable format. The descriptions were not fully displayed and contained HTML tags. So that’s why I have used BeautifulSoup library to clean this up, you can trim leading and trailing spaces or newlines from the alert text.

from bs4 import BeautifulSoup  # Import the BeautifulSoup library to parse and clean HTML

def clean_html(raw_html):
    """Removes HTML tags and returns plain text."""
    # Create a BeautifulSoup object to parse the HTML content
    soup = BeautifulSoup(raw_html, 'html.parser')
    # Use the get_text() method to extract and return the plain text from the HTML
    return soup.get_text()
# Loop through each alert and clean the "Header" and "Description" fields by removing HTML tags
for alert in alerts:
    # Apply the clean_html function to the "Header" field
    alert["Header"] = clean_html(alert["Header"])
    # Apply the clean_html function to the "Description" field
    alert["Description"] = clean_html(alert["Description"])
# Convert the cleaned alerts into a pandas DataFrame for easy viewing and manipulation
df_alerts = pd.DataFrame(alerts)
# Display the first 5 rows of the DataFrame to verify the cleaned alerts
print(df_alerts.head(5))

Output

You might be asking yourself, “Can I associate this with the trip_id and vehicle_id”. The answer is Yes, by incorporating that information into your data processing, you can link the alerts to particular trip_id and vehicle_id. Every alert should be connected to the appropriate trip and vehicle. Once again, it’s a moving piece of the puzzle. Once the data is completely available for the public without encapsulation, this can be possible.

Conclusion

You’ve made it to the end! This topic is vast and offers significant opportunities for further exploration and development. With the right approach, you could create a new application to help Calgary’s residents avoid inconveniences during bus travel, especially in extreme weather conditions. If you encounter any issues while executing the code, feel free to reach out. Suggestions are always welcome! I hope you enjoyed reading this article, and I look forward to seeing you next time. Happy coding!

Before you go

Be sure to Like and follow Me ????
Follow Me : Medium | GitHub | LinkedIn | Python Hub
Check out my Stories for the latest articles on Programming ??
You can find the original post of this article on Medium

Michiel Vos, PMP, FMVA

Managing Director, North America @ Lynxx

1 个月

I really enjoyed reading this article—very interesting and insightful! Tanu Nanda Prabhu, I’m curious to know if you encountered any challenges working with GTFS feed and if it’s being used in its purest form. Today, we launched an article (https://www.dhirubhai.net/pulse/data-pipelines-critical-systems-choose-standard-stick-please-lynxx-ajodc/) about standards, emphasizing the importance of adhering to them in their purest form and augmenting this with complementary tools and extensions. From our experience, this approach helped avoid supplier lock-in, inconsistencies, and unnecessary complexity.

Hayk C.

Founder @Agentgrow | 3x Head of Sales

1 个月

This is really interesting, Tanu! I'd love to hear about any real-world projects where you or someone you know used transit data to solve a problem. Keep sharing such cool insights!

查看更多评论

要查看或添加评论，请登录

Tanu Nanda Prabhu的更多文章

5 Must Use Visual Studio Code Extensions in 2025

2025年2月24日

5 Must Use Visual Studio Code Extensions in 2025

It's free, no hidden charges To stay ahead in 2025, coders need to use the latest, most powerful extensions. As we all…
How to solve an uncaught reference error in JavaScript in 2 minutes?

2025年2月17日

How to solve an uncaught reference error in JavaScript in 2 minutes?

I spent hours debugging this error Introduction Yesterday, I was building a web-based application to sort all my…
Why Your Team Isn’t Growing?

2025年2月15日

Why Your Team Isn’t Growing?

Six leadership pitfalls to avoid Leaders often find that building a powerful team is a significant hurdle. Team growth…
Why Smart Entrepreneurs Laugh All the Way to the Bank

2025年2月5日

Why Smart Entrepreneurs Laugh All the Way to the Bank

How grit, wit, and a dash of humor can turn your business dreams into reality I'll cut to the chase Are you ready to…
The Good, the Needy, and the Ghost of Business Personalities

2025年2月4日

The Good, the Needy, and the Ghost of Business Personalities

Understanding and managing the unique challenges of high, low, and no-maintenance employees Are you ever fascinated by…
How to Master the LinkedIn Algorithm for Viral Posts in 2025

2025年2月3日

How to Master the LinkedIn Algorithm for Viral Posts in 2025

It works! Ready to make your LinkedIn posts go viral in 2025? Mastering the LinkedIn algorithm is crucial for…

4 条评论
Sell Like a Pro

2025年1月28日

Sell Like a Pro

Six secrets to closing deals without losing your sanity Are you ready to sell like a pro? Playing sales can feel like a…

7 条评论
Laughing Through Life: The Untold Story of Hyenas

2025年1月19日

Laughing Through Life: The Untold Story of Hyenas

From giggles to grit Hyneas are the most annoying mammals in the animal kingdom. Even the mighty king of the jungle has…
Ditch the Paper, Embrace Python: Build a Note-Taking App in 2025

2025年1月18日

Ditch the Paper, Embrace Python: Build a Note-Taking App in 2025

File It Away — It Works! Introduction This article guides you through building a simple note-taking application using…
Link It Up: Mastering Instagram Story Links in 2025

2025年1月12日

Link It Up: Mastering Instagram Story Links in 2025

Unlock the Power of Instagram Stories: Drive Traffic and Engagement with Clickable Links Introduction In this article…

See all articles

Cracking Transit Data — Calgary 2025

Tanu Nanda Prabhu

Technical Writer | Full-Stack Developer (Python, Django, React) | Former Assistant Manager at Excel Promotions | Educator & Content Strategist

How to Decode and Leverage GTFS for Real-Time Transit Insights

Introduction

GTFS

Data Portal

Available GTFS-RT Feeds

Accessing the Feeds

Vehicle Positions Feed

Handling GTFS-RT Feeds with Python

Generating Python code from a Protocol Buffer

Fetch Calgary Transit’s GTFS-RT Feed

Explanation

Output

领英推荐

Mapping the Location

Output

Trip Updates Feed

Output

Side Note

Service Alerts

Output

Conclusion

Before you go

Tanu Nanda Prabhu的更多文章

社区洞察

其他会员也浏览了

Wejo Monthly Newsletter: July 2022

Why Geospatial Insights are Critical for Logistics and Transportation

From Line Strings to AreaSeal GRIDS: A Game-Changing Geospatial Approach

January 2025 Edition

Unveiling the Power of Location Extraction: Emergency Action Plan Data

The Dos and Don'ts of GIS Data Collection

Revolutionizing District Mapping with BIS Mapping Solutions

Elevating Real Estate with Geospatial Intelligence by Tejumade Ojo

Fleets, Carriers, Brokers, and Others Risk Losing Thousands Due to Geofencing

Business Intelligence Tools Deliver Economic Development to Pasadena

How to Decode and Leverage GTFS for Real-Time Transit Insights

Introduction

GTFS

Data Portal

Available GTFS-RT Feeds

Accessing the Feeds

Vehicle Positions Feed

Handling GTFS-RT Feeds with Python

Generating Python code from a Protocol Buffer

Fetch Calgary Transit’s GTFS-RT Feed

Explanation

Output

领英推荐

Mapping the Location

Output

Trip Updates Feed

Output

Side Note

Service Alerts

Output

Conclusion

Before you go

Tanu Nanda Prabhu的更多文章

5 Must Use Visual Studio Code Extensions in 2025

How to solve an uncaught reference error in JavaScript in 2 minutes?

Why Your Team Isn’t Growing?

Why Smart Entrepreneurs Laugh All the Way to the Bank

The Good, the Needy, and the Ghost of Business Personalities

How to Master the LinkedIn Algorithm for Viral Posts in 2025

Sell Like a Pro

Laughing Through Life: The Untold Story of Hyenas

Ditch the Paper, Embrace Python: Build a Note-Taking App in 2025

Link It Up: Mastering Instagram Story Links in 2025

社区洞察

其他会员也浏览了

Wejo Monthly Newsletter: July 2022

Why Geospatial Insights are Critical for Logistics and Transportation

From Line Strings to AreaSeal GRIDS: A Game-Changing Geospatial Approach

January 2025 Edition

Unveiling the Power of Location Extraction: Emergency Action Plan Data

The Dos and Don'ts of GIS Data Collection

Revolutionizing District Mapping with BIS Mapping Solutions

Elevating Real Estate with Geospatial Intelligence by Tejumade Ojo

Fleets, Carriers, Brokers, and Others Risk Losing Thousands Due to Geofencing

Business Intelligence Tools Deliver Economic Development to Pasadena