登录查看更多内容

Debugging RTOS Systems Without Losing Your Mind

Hussein Elsherbini

Embedded Software Engineer

发布日期: 2025年1月20日

Ever found yourself staring at an oscilloscope at 3 AM, questioning every life decision that led you to where you are now? Yeah, me too. Let me tell you about the time my self-balancing robot decided it has had enough of my shenanigans and filed for divorce.

The Scene of the Crime

There I was, watching my creation maintain its balance with all the grace of the robot gods, when suddenly one of the motor drivers decided it had accomplished enough in its career and quietly resigned. No two-week notice, no farewell party - just straight-up ghosted me mid-operation. Not ideal when your robot's entire existence depends on, you know, both motor drivers working.

The Traditional Debugging Approach (Or: How to Make Things Worse)

If you've been in this field long enough, you know the drill: scatter printf statements like breadcrumbs, set breakpoints like landmines, and pray to whatever deity oversees your day to day life operations. Spoiler alert: None of them answer when you're debugging RTOS timing issues.

Setting a breakpoint in an RTOS is like hitting pause during brain surgery - technically possible, but probably not your best move. Your carefully timed 5ms control loop suddenly stretches to 2 seconds, and your robot demonstrates its newfound passion for break dancing.

Enter SystemView: The Black Box Flight Recorder for RTOS

This is where Segger SystemView enters the picture, acting like a sophisticated surveillance system for your RTOS. It's like having a high-speed camera recording every task switch, interrupt, and questionable decision your system makes, all with as minimal a footprint as possible. Finally, a way to debug without inducing more bugs - revolutionary, I know.

Snapshot of systemview in action on my robot! — Snapshot of my robot in action using Systemview!

Under the Hood: Real-Time Transfer (RTT)

Let's dive into the technical brilliance of Segger's Real-Time Transfer (RTT). Understanding how RTT works is crucial because its architecture directly explains why it achieves such impressive performance.

The Core Architecture

At its foundation, RTT uses a memory-mapped approach for debugging communication. The system is built around a control block structure that resides in the target's RAM. Here's the basic structure:

typedef struct {
    char acID[16];                    // "SEGGER RTT"
    uint32_t MaxUpBufferCount;        // Number of up-buffers (target to host)
    uint32_t MaxDownBufferCount;      // Number of down-buffers (host to target)
    // Up buffers and down buffers follow
} SEGGER_RTT_CB;

The Secret Sauce!

RTT achieves its remarkable performance through several key architectural decisions:

1. Lock-Free Ring Buffer Implementation: The ring buffer uses separate read and write pointers that are owned by different entities (target owns write, host owns read). This means no synchronization overhead - just memory writes.

2. Memory-Mapped Communication: Instead of using peripheral interfaces like UART, RTT operates purely through memory operations. Consider these performance numbers:

o??? Memory write operation: ~2-3 CPU cycles

o??? UART byte transmission at 115200 baud: ~87μs per byte

o??? SWO transmission: ~4μs per byte

o??? RTT: ~0.2μs per byte

3. Zero Interrupt Overhead: Unlike traditional debug interfaces, RTT doesn't use interrupts for data transfer. Every byte of data is written directly to memory with no overhead.

Transfer Speed Characteristics

In real-world applications, RTT consistently outperforms traditional debugging methods:

Sending a 32-byte debug message:

UART (115200): ~2.5ms
SWO: ~128μs
RTT: ~6μs

领英推荐

Alf's Musings #19

Alfonso Martínez de la Torre 1 年前

From Good to Great: Elevating Your Embedded Systems…

Lance Harvie Bsc (Hons) 2 年前

Part 0 - Creating a minimal build environment with…

Charles Dias, M.Sc. 10 个月前

This performance means you can leave RTT instrumentation in your production code with virtually no impact on system behavior - it only becomes active when a debugger is connected.

The combination of lock-free design, direct memory access, and efficient buffer management makes RTT an ideal solution for real-time system debugging, where traditional methods would significantly impact system behavior.

Integrating SystemView in your project

Let me walk you through how I integrated SystemView into my project. It turned out to be a lifesaver, especially since my robot seemed determined to perfect its face-planting skills:

    // Enable DWT and ITM units on your arm core, place this code in your initialization section
    CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk; 
    DWT->CYCCNT = 0; 
    DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk; 
    SEGGER_RTT_Init(); 
    SEGGER_SYSVIEW_Conf(); 
    SEGGER_SYSVIEW_Start();

Here is how i monitored critical ISR entries and exits:

void ADC_IRQHandler(void)
{
    SEGGER_SYSVIEW_RecordEnterISR();

    if(system_sensors.port_config->Instance->SR & ADC_SR_EOC) {
        // Clear the EOC flag
        __asm("nop");
    }
    SEGGER_SYSVIEW_RecordExitISR();
}

and here is a way you can use a dedicated task to send application specific data back to SystemView:

void vLogTask(void *pvParameters) {

    xLogTaskHandle = xTaskGetCurrentTaskHandle();
    balance_log_data_t log_data = {0};
    BalanceLogger_t logger = {0};
    uint32_t notificationValue;
    TickType_t xLastWakeTime = xTaskGetTickCount();
    const TickType_t ADC_TIMEOUT_TICKS = pdMS_TO_TICKS(20);

    // Initialize our logging system
    Logger_Init(&logger, &log_data);

    while(1) {
        // start conversions on ADC by enabling the timer
        system_sensors.port_config->Instance->CR2 |= ADC_CR2_SWSTART;
        notificationValue = ulTaskNotifyTake(pdTRUE, ADC_TIMEOUT_TICKS);
       
        if (notificationValue == NOTIFY_LOGGER_TASK_ADC) {
            // Check for new balance data without blocking

            if(xQueueReceive(xLogQueue, &log_data, 0) == pdTRUE) {
                __asm("nop");
            }
// if all telemetry data is available, send data over RTT to SystemView 
       if(telemetryDataReady){
          for (int i = 0; i < LOGGER_DATA_COUNT; i++) {
             logger->dataSamples[i].pValue.pFloat = &data[i];
             SEGGER_SYSVIEW_SampleData(&logger->dataSamples[i]);
          }
        }
        else {
            DEBUG_WARN("ADC timeout at %u", xTaskGetTickCount());
        }
        xTaskDelayUntil(&xLastWakeTime, pdMS_TO_TICKS(LOG_TASK_PERIOD_MS));
    }
    
}

What SystemView Revealed (Besides my poor life choices..)

The SystemView recordings were enlightening, to say the least:

Task Interactions: See exactly when each task runs, for how long, and what interrupted it.
Interrupt Behavior: Watch your interrupts in action - their frequency, duration, and (most importantly) their impact on your tasks.
Resource Usage: Track CPU load, stack usage, and even custom events.

Adding AI to the Mix (Because Why Not Make It More Complicated?)

Because manually analyzing SystemView data is about as enjoyable as debugging race conditions, I wrote a Python script to help spot impending doom:

import pandas as pd
from sklearn.ensemble import IsolationForest
import numpy as np

def find_suspicious_behavior(data_path):
    # Load the data
    df = pd.read_csv(data_path)
    
    # Feature engineering: Create new features to capture relationships
    df['current_ratio'] = df['motor_a_current'] / df['motor_b_current']
    df['power_consumption'] = df['motor_a_current'] * df['battery_voltage'] + df['motor_b_current'] * df['battery_voltage']
    df['angular_momentum'] = df['robot_angular_velocity'] * df['angle_measurement']
    
    # Select features for anomaly detection
    features = [
        'motor_a_current', 'motor_b_current', 'battery_voltage',
        'robot_angular_velocity', 'angle_measurement', 'pid_output',
        'current_ratio', 'power_consumption', 'angular_momentum'
    ]
    
    # Train the Isolation Forest model
    detector = IsolationForest(contamination=0.1, random_state=42)
    anomalies = detector.fit_predict(df[features])
    
    # Return timestamps of anomalies
    return df.loc[anomalies == -1, 'timestamp']

The idea was to extract some critical data from the robot, such as motor voltages, currents, angle measurement, angular velocity and the PID output, to spot outliers. For example:

Current Ratio: The ratio of motor_a_current to motor_b_current can indicate imbalances between the motors.
Power Consumption: Combines motor currents and battery voltage to estimate total power usage.
Angular Momentum: Combines angular velocity and angle measurements to capture rotational dynamics.

These data samples can also be fed to an AI model that can make more complex connections, for example it can tell me "Hey, your PID output seems to swing very rapidly when the angle measurements spike, you should consider a low pass filter on your angle measurements."

What's Next?

I'm working on creating a machine learning model that can automatically analyze SystemView recordings and automatically tune my PID gains. A question that has crossed my mind was, can a machine learning model outperform a PID control loop?

Takeaway

Debugging RTOS systems doesn't have to be a mystical art practiced only by the chosen few. With tools like SystemView and a systematic approach, we can peek inside our systems while they're running and actually understand what's going on.

Remember: your RTOS is just trying its best to manage all the tasks you've thrown at it. Sometimes it just needs a little help figuring out what went wrong. Kind of like all of us on a Monday morning, right?

P.S. If you're wondering about that motor driver I mentioned earlier - turns out current spikes caused some internal mosfets to fail. If you have made it this far, here is a picture of my robot, you are welcome. check out the git here

Bob Power

1 个月

I love your continous pursuit of learning and honing your craft!

1 次回应

Greg Hauck

Principal at engineering2design

2 个月

Very helpful

1 次回应

Tony Saleh

President and CEO, TDJ Consulting; US GM, Terafast

2 个月

nice job Hussein and liked the idea of incorporating AI to assist with anomaly detection!

查看更多评论

要查看或添加评论，请登录

Hussein Elsherbini的更多文章

"Confessions of a Lazy Engineer: How Claude the AI Saved Me from Death by Datasheets"

2024年6月24日

"Confessions of a Lazy Engineer: How Claude the AI Saved Me from Death by Datasheets"

I had a very interesting conversation with an AI named Claude a few weeks ago. Over the course of a few days, Claude…

3 条评论
Building an Embedded System, Part 3: The Mystical Journey of Source Code from Host to Target

2021年3月1日

Building an Embedded System, Part 3: The Mystical Journey of Source Code from Host to Target

“Quiet!” yelled my frustrated friend at his disruptive dog as we entered his apartment. I quickly realized that this is…
Building an Embedded System, Part 2: The Hardware

2020年11月21日

Building an Embedded System, Part 2: The Hardware

The first VCR (video camera recorder) was invented in 1965 and it was the size of a piano. 9 years prior, 5 MBs of…
Building an Embedded System, Part 1: Choosing a Microcontroller

2020年10月20日

Building an Embedded System, Part 1: Choosing a Microcontroller

For some time now I have been entertaining the idea of building an embedded system from scratch. The application I…

Debugging RTOS Systems Without Losing Your Mind

Hussein Elsherbini

Embedded Software Engineer

The Scene of the Crime

The Traditional Debugging Approach (Or: How to Make Things Worse)

Enter SystemView: The Black Box Flight Recorder for RTOS

Under the Hood: Real-Time Transfer (RTT)

The Core Architecture

The Secret Sauce!

领英推荐

Integrating SystemView in your project

What SystemView Revealed (Besides my poor life choices..)

Adding AI to the Mix (Because Why Not Make It More Complicated?)

What's Next?

Takeaway

Hussein Elsherbini的更多文章

社区洞察

其他会员也浏览了

Offline voice module VC series factory firmware tutorial: Is the SDK open source?

RTL DEBUG

Creating "Magic" for My?Child

KatWalk C2: p.5: overclocking and bugfixing or how to use Ghidra to analyse ARM firmware

Hierarchical Debug In Non-Equivalence Checking

COCOTB (Coroutine Based Cosimulation Test Bench)

453: Too Dumb to Quit with Nathan Jones

CUDA Toolkit 11 – The most powerful SW development platform for building GPU-accelerated apps

VUnit User Conference

RTL Linting :

The Scene of the Crime

The Traditional Debugging Approach (Or: How to Make Things Worse)

Enter SystemView: The Black Box Flight Recorder for RTOS

Under the Hood: Real-Time Transfer (RTT)

The Core Architecture

The Secret Sauce!

领英推荐

Integrating SystemView in your project

What SystemView Revealed (Besides my poor life choices..)

Adding AI to the Mix (Because Why Not Make It More Complicated?)

What's Next?

Takeaway

Hussein Elsherbini的更多文章

"Confessions of a Lazy Engineer: How Claude the AI Saved Me from Death by Datasheets"

Building an Embedded System, Part 3: The Mystical Journey of Source Code from Host to Target

Building an Embedded System, Part 2: The Hardware

Building an Embedded System, Part 1: Choosing a Microcontroller

社区洞察

其他会员也浏览了

Offline voice module VC series factory firmware tutorial: Is the SDK open source?

RTL DEBUG

Creating "Magic" for My?Child

KatWalk C2: p.5: overclocking and bugfixing or how to use Ghidra to analyse ARM firmware

Hierarchical Debug In Non-Equivalence Checking

COCOTB (Coroutine Based Cosimulation Test Bench)

453: Too Dumb to Quit with Nathan Jones

CUDA Toolkit 11 – The most powerful SW development platform for building GPU-accelerated apps

VUnit User Conference

RTL Linting :