Cracking Transit Data — Calgary 2025

Cracking Transit Data — Calgary 2025

How to Decode and Leverage GTFS for Real-Time Transit Insights

Introduction

In the digital age, public transportation systems increasingly leverage technology to provide passengers with real-time data, enabling a more seamless travel experience. Transit systems share this data primarily through the General Transit Feed Specification (GTFS), a standard that provides data on schedules, routes, and real-time updates. However, while this data can be incredibly valuable, accessing and interpreting it can be challenging, especially when combining static and real-time feeds.

This article shows you how to use Calgary Transit’s data about bus locations and routes to get useful information. If you work with data, understanding these datasets can help improve transit planning and make it better for users. We’ll cover the essential steps for fetching and processing Calgary Transit’s static and real-time data, including troubleshooting common issues you might encounter. By the end of this guide, you’ll be well-equipped to tap into Calgary’s transit data to solve real-world problems.

GTFS

The General Transit Feed Specification (GTFS) is an open standard that formats public transport schedules and geographic data. GTFS allows public transit agencies to publish their data in a format that various software applications can consume, such as trip planners and API developers. This allows users easy access to travel information on smartphones and other devices.


Credits - GTFS

GTFS includes information such as

  • Trip Name
  • Stops
  • Bus Routes
  • Fares
  • Time and more

When working with GTFS data, it’s important to understand the source and format of the data you are using. To fully explore and use the transit’s GTFS feeds, including the static and real-time data, visit their official website.

What is GTFS - General Transit Feed Specification


Data Portal

We can use Calgary’s Open Data Portal to get real-time transit vehicle locations. Calgary Transit provides real-time data through the General Transit Feed Specification Realtime (GTFS-RT) format. These feeds offer live information on vehicle positions, trip updates, and service alerts.

Available GTFS-RT Feeds

  • Vehicle Positions: Provides real-time locations of transit vehicles.
  • Trip Updates: Offers real-time updates on scheduled trips, including delays and cancellations.
  • Service Alerts: Contains information on disruptions or changes in service.

Accessing the Feeds

These feeds are accessible via Calgary’s Open Data Portal:


Vehicle Positions Feed

Handling GTFS-RT Feeds with Python

Install Required Libraries

pip install requests protobuf        

The requests library is used to send HTTP requests in Python. Whereas the protobuf is a library for working with Protocol Buffers (Protobuf), a method developed by Google for serializing structured data. Long story short, if you are handling GTFS-RT feed, then requests is used to fetch and protobuf is used to parse the data from an API.

Generating Python code from a Protocol Buffer

We need to use the Protocol buffer compiler protoc1, a tool provided by Google for working with .proto files. We shall use the gtfs-realtime.proto Protobuf definition file. It describes the structure of GTFS-RT messages, including FeedMessage, FeedHeader, FeedEntity, etc. The technical documentation follows below.

Protobuf - General Transit Feed Specification

!protoc --python_out=. gtfs-realtime.proto        

Running this command creates a Python module, which is gtfs_realtime_pb2.py. If you don’t want the hassle, you can skip this step by just manually uploading the gtfs_realtime_pb2.py to your drive. Click here to access the file. To upload a file on Google Colab, you can write as:

from google.colab import files
files.upload()        

Your directory should look something like this. Use the !ls command.

drive  gtfs_realtime_pb2.py  gtfs-realtime.proto  __pycache__        

The whole point of using gtfs_realtime_pb2.py is to provide Python classes and methods that make it easy to work with GTFS-RT Protobuf (Protocol Buffers) encoded data. Without it, manual parsing and interpretation of the binary data would be necessary, which is both error-prone and impractical.

Fetch Calgary Transit’s GTFS-RT Feed

Now let’s fetch and parse the data from the feed. Try uploading the gtfs_realtime_pb2.py manually because if there is a mismatch between versions of the .proto file, then that will lead to a TypeError.

# Import the requests library to handle HTTP requests
import requests
import pandas as pd

def fetch_gtfs_rt_feed(url):
    """
    Fetches GTFS real-time data from the given URL.
    Args:
        url (str): The URL to fetch the GTFS real-time data from.
    Returns:
        feed: A GTFS-RT FeedMessage object containing the parsed data, or None if an error occurs.
    """
    try:
        # Send a GET request to the provided URL
        response = requests.get(url)
        
        # Check if the response status code is 200 (OK)
        if response.status_code == 200:
            # Initialize a FeedMessage object from gtfs_realtime_pb2
            feed = gtfs_realtime_pb2.FeedMessage()
            
            # Parse the response content into the FeedMessage object
            feed.ParseFromString(response.content)
            return feed
        else:
            # Print an error message if the status code is not 200
            print(f"Error fetching data: {response.status_code} - {response.reason}")
            return None
    except Exception as e:
        # Print an error message if an exception occurs
        print(f"An error occurred: {e}")
        return None
# URL for GTFS real-time vehicle positions data
vehicle_positions_url = "https://data.calgary.ca/download/am7c-qe3u/application%2Foctet-stream"
# Fetch the GTFS real-time data from the specified URL
feed = fetch_gtfs_rt_feed(vehicle_positions_url)
# Check if the feed was fetched successfully
if feed:
    vehicle_data = []  # Initialize an empty list to store vehicle information
    
    # Loop through the first 5 entities in the feed
    for entity in feed.entity[:5]:  # [:5] ensures we only process the first 5 entities
        if entity.HasField('vehicle'):  # Check if the entity contains vehicle data
            vehicle = entity.vehicle  # Extract the vehicle field
            
            # Create a dictionary with relevant vehicle information
            vehicle_info = {
                "Vehicle ID": vehicle.vehicle.id,  # Vehicle identifier
                "Latitude": vehicle.position.latitude,  # Latitude position of the vehicle
                "Longitude": vehicle.position.longitude  # Longitude position of the vehicle
            }
            
            # Append the vehicle information dictionary to the list
            vehicle_data.append(vehicle_info)
    
    # Create a DataFrame from the list of dictionaries
    df = pd.DataFrame(vehicle_data)
    
    # Print the DataFrame to display the data in a tabular format
    print(df)        

Explanation

First, you are using the requests library to fetch the data from the URL. Second, you execute the function fetch_gtfs_rt_feed that retrieves and parses the GTFS real-time feed. Third, you check the response to ensure successful data retrieval. Fourth, the system implements error handling to catch and print any errors. Finally, we then process the feed, which extracts and prints vehicle information like ID and position if available. To neaten the results, I used the Pandas library to display the data.

Output

Data Frame that shows Vehicle ID, Latitude and Longitude

The data that you are seeing is raw real-time information that comprises vehicle ID and its position (Latitude and Longitude). In some cases, the vehicle ID might directly relate to the Bus number (For example Vehicle ID 1280 will be Bus 128). It all depends on the City’s style of encoding the data. Contact the City Commission to get more accurate results.

Mapping the Location

Let’s use folium library to map the data. Folium easily visualizes data manipulated in Python. Visit the documentation below.

Folium — Folium 0.19.3 documentation

You need to install the folium library

!pip install folium        

Folium is used to create interactive maps. The reason I am using it is that it’s simple and easy to understand. Let me know if you come across any other similar libraries that get the job done.

import folium  # Importing the folium library to work with interactive maps

def plot_vehicle_on_map(latitude, longitude, vehicle_id):
    """
    Plots the vehicle's location on a map using its latitude, longitude, and ID.
    
    Parameters:
    latitude (float): The latitude of the vehicle's location.
    longitude (float): The longitude of the vehicle's location.
    vehicle_id (str): The unique identifier for the vehicle.
    
    Returns:
    folium.Map: A Folium map centered on the vehicle's location with a marker.
    """
    # Create a map centered at the vehicle's location with a zoom level of 14
    vehicle_map = folium.Map(location=[latitude, longitude], zoom_start=14)
    
    # Add a marker to the map at the vehicle's location
    folium.Marker(
        location=[latitude, longitude],  # The latitude and longitude of the marker
        popup=f"Vehicle ID: {vehicle_id}",  # Popup text to display when the marker is clicked
        icon=folium.Icon(color="blue", icon="bus", prefix="fa"),  # Custom icon for the marker
    ).add_to(vehicle_map)  # Add the marker to the map
    
    return vehicle_map  # Return the map object
# Example data for a vehicle's location and ID
latitude = 50.997478   # Example latitude value
longitude = -114.066544   # Example longitude value
vehicle_id = "8080"  # Example vehicle ID
# Generate the map with the example vehicle data
map_output = plot_vehicle_on_map(latitude, longitude, vehicle_id)
map_output  # Display the map
        

Output

Upon execution of the above code, you will get the exact location of the Bus with the ID of 8080. Now keep in mind this data is dynamic. According to the Calgary Data Portal, this data changes every 30 minutes or even sooner.

Plotting of the Vehicle ID 8080

Trip Updates Feed

The GTFS Realtime Trip Updates feed contains information about real-time updates to scheduled trips, such as delays, changes in stop times, and other dynamic data. But in this case, the Calgary real-time feed only provides Trip ID, Start Time, Start Date, Stop ID, Arrival Time, and Departure Time. The feed design reflects trip delays; for example, a trip scheduled for 8:00 AM with a 10-minute delay will show this updated information. The update will also include this ID if bus 8080 is running the trip.

import requests
import gtfs_realtime_pb2  # Import the compiled GTFS Realtime protocol buffer
import pandas as pd
from datetime import datetime

def fetch_gtfs_rt_trip_updates(url):
    """Fetches and parses the GTFS Realtime Trip Updates feed from the given URL."""
    try:
        # Make a GET request to fetch the data from the specified URL
        response = requests.get(url)
        if response.status_code == 200:
            # Parse the response content into a FeedMessage object
            feed = gtfs_realtime_pb2.FeedMessage()
            feed.ParseFromString(response.content)
            return feed
        else:
            # Print an error message if the response status is not OK
            print(f"Error fetching data: {response.status_code} - {response.reason}")
            return None
    except Exception as e:
        # Catch and print any exceptions that occur during the request
        print(f"An error occurred: {e}")
        return None
def extract_trip_updates(feed):
    """Extracts trip update information from the GTFS Realtime feed."""
    trip_updates = []
    
    # Loop through each entity in the feed
    for entity in feed.entity:
        if entity.HasField('trip_update'):
            # Extract the trip update data
            trip_update = entity.trip_update
            trip_id = trip_update.trip.trip_id
            start_time = trip_update.trip.start_time
            start_date = trip_update.trip.start_date
            
            # Loop through each stop time update in the trip update
            for stop_time_update in trip_update.stop_time_update:
                stop_id = stop_time_update.stop_id
                # Extract arrival and departure times, if available
                arrival_time = stop_time_update.arrival.time if stop_time_update.HasField('arrival') else None
                departure_time = stop_time_update.departure.time if stop_time_update.HasField('departure') else None
                
                # Convert timestamps to human-readable format
                arrival_time = datetime.utcfromtimestamp(arrival_time).strftime('%Y-%m-%d %H:%M:%S') if arrival_time else None
                departure_time = datetime.utcfromtimestamp(departure_time).strftime('%Y-%m-%d %H:%M:%S') if departure_time else None
                
                # Add the extracted information to the trip updates list
                trip_updates.append({
                    "Trip ID": trip_id,
                    "Start Time": start_time,
                    "Start Date": start_date,
                    "Stop ID": stop_id,
                    "Arrival Time": arrival_time,
                    "Departure Time": departure_time
                })
    return trip_updates
# URL for GTFS Realtime Trip Updates
trip_updates_url = "https://data.calgary.ca/download/gs4m-mdc2/application%2Foctet-stream"  # Replace with the actual Trip Updates URL
# Fetch the trip updates feed
feed = fetch_gtfs_rt_trip_updates(trip_updates_url)
if feed:
    # Extract the trip updates from the feed
    trip_updates = extract_trip_updates(feed)
    
    # Convert the trip updates into a DataFrame for easy manipulation and display
    df_trip_updates = pd.DataFrame(trip_updates)
    
    # Display the first 10 rows of the DataFrame
    print(df_trip_updates.head(10))        

Output

Trip Real Time Updates

If you’re not receiving start_time and start_date from the GTFS Realtime Trip Updates, it might be because those fields are optional and not always provided in the feed. This might be for security reasons as well. Upon contacting the City’s Transit service, you can get this vital information. I hope you understand the point.

Calgary Transit Realtime Trip Updates GTFS-R


Side Note

You see how there are different IDs involved, such as Trip and the Stop. Now, if all the data is available, you can easily find the corresponding bus that has been running in those routes. Let me know if you can connect those dots. I would be happy to Colab with you and make this a working project.


Service Alerts

Calgary Transit refreshes its real-time data every half minute. To learn more about the GTFS-RT specification and its components (Trip Updates, Service Alerts, and Vehicle Positions), check out the Google Transit API page. Also, see Service Updates. Let’s see what the service alerts look like.

import requests  # Library to make HTTP requests
import gtfs_realtime_pb2  # Ensure this proto file is compiled as Python
import pandas as pd  # For handling and displaying data in DataFrame format

# Function to fetch the GTFS Realtime Alerts feed
def fetch_gtfs_rt_alerts(url):
    """Fetches and parses the GTFS Realtime Alerts feed from the given URL."""
    try:
        # Send a request to the URL to get the feed
        response = requests.get(url)
        
        # Check if the response is successful (status code 200)
        if response.status_code == 200:
            # Parse the feed using GTFS Realtime protocol
            feed = gtfs_realtime_pb2.FeedMessage()
            feed.ParseFromString(response.content)
            return feed  # Return the parsed feed
        else:
            # Print error if the response status is not 200
            print(f"Error fetching data: {response.status_code} - {response.reason}")
            return None
    except Exception as e:
        # Catch and print any exception that occurs during the request
        print(f"An error occurred: {e}")
        return None

# Function to extract alerts from the GTFS Realtime feed
def extract_alerts(feed):
    """Extracts alert information from the GTFS Realtime feed."""
    alerts = []  # Initialize an empty list to store alert information
    
    # Loop through each entity in the feed
    for entity in feed.entity:
        # Check if the entity contains an alert
        if entity.HasField('alert'):
            alert = entity.alert
            # Extract relevant fields from the alert
            alert_id = entity.id  # Unique ID for the alert
            # Extract header text from the alert (if available)
            header_text = alert.header_text.translation[0].text if alert.header_text.translation else "No header"
            # Extract description text from the alert (if available)
            description_text = alert.description_text.translation[0].text if alert.description_text.translation else "No description"
            severity_level = alert.severity_level  # Severity level of the alert (e.g., low, medium, high)
            
            # Append the extracted alert information to the alerts list
            alerts.append({
                "Alert ID": alert_id,
                "Header": header_text,
                "Description": description_text,
                "Severity Level": severity_level
            })
    
    # Return the list of alerts
    return alerts

# URL for GTFS Realtime Alerts (replace with the actual URL)
alerts_url = "https://data.calgary.ca/download/alerts_feed_url"  # Example placeholder URL

# Fetch the alerts feed
feed = fetch_gtfs_rt_alerts(alerts_url)

# If feed is fetched successfully, extract the alerts
if feed:
    alerts = extract_alerts(feed)
    
    # Convert the list of alerts into a DataFrame for easier viewing
    df_alerts = pd.DataFrame(alerts)
    
    # Display the first 5 rows of the alerts DataFrame
    print(df_alerts.head(5))
        

You can read the documentation of the Service Alerts below.

Calgary Transit Realtime Service Alerts GTFS-RT

Upon executing the above code snippet, I noticed that the data was not in the readable format. The descriptions were not fully displayed and contained HTML tags. So that’s why I have used BeautifulSoup library to clean this up, you can trim leading and trailing spaces or newlines from the alert text.

from bs4 import BeautifulSoup  # Import the BeautifulSoup library to parse and clean HTML

def clean_html(raw_html):
    """Removes HTML tags and returns plain text."""
    # Create a BeautifulSoup object to parse the HTML content
    soup = BeautifulSoup(raw_html, 'html.parser')
    # Use the get_text() method to extract and return the plain text from the HTML
    return soup.get_text()
# Loop through each alert and clean the "Header" and "Description" fields by removing HTML tags
for alert in alerts:
    # Apply the clean_html function to the "Header" field
    alert["Header"] = clean_html(alert["Header"])
    # Apply the clean_html function to the "Description" field
    alert["Description"] = clean_html(alert["Description"])
# Convert the cleaned alerts into a pandas DataFrame for easy viewing and manipulation
df_alerts = pd.DataFrame(alerts)
# Display the first 5 rows of the DataFrame to verify the cleaned alerts
print(df_alerts.head(5))        

Output

You might be asking yourself, “Can I associate this with the trip_id and vehicle_id”. The answer is Yes, by incorporating that information into your data processing, you can link the alerts to particular trip_id and vehicle_id. Every alert should be connected to the appropriate trip and vehicle. Once again, it’s a moving piece of the puzzle. Once the data is completely available for the public without encapsulation, this can be possible.

Service Alerts

Conclusion

You’ve made it to the end! This topic is vast and offers significant opportunities for further exploration and development. With the right approach, you could create a new application to help Calgary’s residents avoid inconveniences during bus travel, especially in extreme weather conditions. If you encounter any issues while executing the code, feel free to reach out. Suggestions are always welcome! I hope you enjoyed reading this article, and I look forward to seeing you next time. Happy coding!


Before you go


Michiel Vos, PMP, FMVA

Managing Director, North America @ Lynxx

1 个月

I really enjoyed reading this article—very interesting and insightful! Tanu Nanda Prabhu, I’m curious to know if you encountered any challenges working with GTFS feed and if it’s being used in its purest form. Today, we launched an article (https://www.dhirubhai.net/pulse/data-pipelines-critical-systems-choose-standard-stick-please-lynxx-ajodc/) about standards, emphasizing the importance of adhering to them in their purest form and augmenting this with complementary tools and extensions. From our experience, this approach helped avoid supplier lock-in, inconsistencies, and unnecessary complexity.

回复
Hayk C.

Founder @Agentgrow | 3x Head of Sales

1 个月

This is really interesting, Tanu! I'd love to hear about any real-world projects where you or someone you know used transit data to solve a problem. Keep sharing such cool insights!

回复

要查看或添加评论,请登录

Tanu Nanda Prabhu的更多文章

社区洞察

其他会员也浏览了