Debugging RTOS Systems Without Losing Your Mind
Ever found yourself staring at an oscilloscope at 3 AM, questioning every life decision that led you to where you are now? Yeah, me too. Let me tell you about the time my self-balancing robot decided it has had enough of my shenanigans and filed for divorce.
The Scene of the Crime
There I was, watching my creation maintain its balance with all the grace of the robot gods, when suddenly one of the motor drivers decided it had accomplished enough in its career and quietly resigned. No two-week notice, no farewell party - just straight-up ghosted me mid-operation. Not ideal when your robot's entire existence depends on, you know, both motor drivers working.
The Traditional Debugging Approach (Or: How to Make Things Worse)
If you've been in this field long enough, you know the drill: scatter printf statements like breadcrumbs, set breakpoints like landmines, and pray to whatever deity oversees your day to day life operations. Spoiler alert: None of them answer when you're debugging RTOS timing issues.
Setting a breakpoint in an RTOS is like hitting pause during brain surgery - technically possible, but probably not your best move. Your carefully timed 5ms control loop suddenly stretches to 2 seconds, and your robot demonstrates its newfound passion for break dancing.
Enter SystemView: The Black Box Flight Recorder for RTOS
This is where Segger SystemView enters the picture, acting like a sophisticated surveillance system for your RTOS. It's like having a high-speed camera recording every task switch, interrupt, and questionable decision your system makes, all with as minimal a footprint as possible. Finally, a way to debug without inducing more bugs - revolutionary, I know.
Under the Hood: Real-Time Transfer (RTT)
Let's dive into the technical brilliance of Segger's Real-Time Transfer (RTT). Understanding how RTT works is crucial because its architecture directly explains why it achieves such impressive performance.
The Core Architecture
At its foundation, RTT uses a memory-mapped approach for debugging communication. The system is built around a control block structure that resides in the target's RAM. Here's the basic structure:
typedef struct {
char acID[16]; // "SEGGER RTT"
uint32_t MaxUpBufferCount; // Number of up-buffers (target to host)
uint32_t MaxDownBufferCount; // Number of down-buffers (host to target)
// Up buffers and down buffers follow
} SEGGER_RTT_CB;
The Secret Sauce!
RTT achieves its remarkable performance through several key architectural decisions:
1. Lock-Free Ring Buffer Implementation: The ring buffer uses separate read and write pointers that are owned by different entities (target owns write, host owns read). This means no synchronization overhead - just memory writes.
2. Memory-Mapped Communication: Instead of using peripheral interfaces like UART, RTT operates purely through memory operations. Consider these performance numbers:
o??? Memory write operation: ~2-3 CPU cycles
o??? UART byte transmission at 115200 baud: ~87μs per byte
o??? SWO transmission: ~4μs per byte
o??? RTT: ~0.2μs per byte
3. Zero Interrupt Overhead: Unlike traditional debug interfaces, RTT doesn't use interrupts for data transfer. Every byte of data is written directly to memory with no overhead.
Transfer Speed Characteristics
In real-world applications, RTT consistently outperforms traditional debugging methods:
Sending a 32-byte debug message:
领英推荐
This performance means you can leave RTT instrumentation in your production code with virtually no impact on system behavior - it only becomes active when a debugger is connected.
The combination of lock-free design, direct memory access, and efficient buffer management makes RTT an ideal solution for real-time system debugging, where traditional methods would significantly impact system behavior.
Integrating SystemView in your project
Let me walk you through how I integrated SystemView into my project. It turned out to be a lifesaver, especially since my robot seemed determined to perfect its face-planting skills:
// Enable DWT and ITM units on your arm core, place this code in your initialization section
CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
DWT->CYCCNT = 0;
DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;
SEGGER_RTT_Init();
SEGGER_SYSVIEW_Conf();
SEGGER_SYSVIEW_Start();
Here is how i monitored critical ISR entries and exits:
void ADC_IRQHandler(void)
{
SEGGER_SYSVIEW_RecordEnterISR();
if(system_sensors.port_config->Instance->SR & ADC_SR_EOC) {
// Clear the EOC flag
__asm("nop");
}
SEGGER_SYSVIEW_RecordExitISR();
}
and here is a way you can use a dedicated task to send application specific data back to SystemView:
void vLogTask(void *pvParameters) {
xLogTaskHandle = xTaskGetCurrentTaskHandle();
balance_log_data_t log_data = {0};
BalanceLogger_t logger = {0};
uint32_t notificationValue;
TickType_t xLastWakeTime = xTaskGetTickCount();
const TickType_t ADC_TIMEOUT_TICKS = pdMS_TO_TICKS(20);
// Initialize our logging system
Logger_Init(&logger, &log_data);
while(1) {
// start conversions on ADC by enabling the timer
system_sensors.port_config->Instance->CR2 |= ADC_CR2_SWSTART;
notificationValue = ulTaskNotifyTake(pdTRUE, ADC_TIMEOUT_TICKS);
if (notificationValue == NOTIFY_LOGGER_TASK_ADC) {
// Check for new balance data without blocking
if(xQueueReceive(xLogQueue, &log_data, 0) == pdTRUE) {
__asm("nop");
}
// if all telemetry data is available, send data over RTT to SystemView
if(telemetryDataReady){
for (int i = 0; i < LOGGER_DATA_COUNT; i++) {
logger->dataSamples[i].pValue.pFloat = &data[i];
SEGGER_SYSVIEW_SampleData(&logger->dataSamples[i]);
}
}
else {
DEBUG_WARN("ADC timeout at %u", xTaskGetTickCount());
}
xTaskDelayUntil(&xLastWakeTime, pdMS_TO_TICKS(LOG_TASK_PERIOD_MS));
}
}
What SystemView Revealed (Besides my poor life choices..)
The SystemView recordings were enlightening, to say the least:
Adding AI to the Mix (Because Why Not Make It More Complicated?)
Because manually analyzing SystemView data is about as enjoyable as debugging race conditions, I wrote a Python script to help spot impending doom:
import pandas as pd
from sklearn.ensemble import IsolationForest
import numpy as np
def find_suspicious_behavior(data_path):
# Load the data
df = pd.read_csv(data_path)
# Feature engineering: Create new features to capture relationships
df['current_ratio'] = df['motor_a_current'] / df['motor_b_current']
df['power_consumption'] = df['motor_a_current'] * df['battery_voltage'] + df['motor_b_current'] * df['battery_voltage']
df['angular_momentum'] = df['robot_angular_velocity'] * df['angle_measurement']
# Select features for anomaly detection
features = [
'motor_a_current', 'motor_b_current', 'battery_voltage',
'robot_angular_velocity', 'angle_measurement', 'pid_output',
'current_ratio', 'power_consumption', 'angular_momentum'
]
# Train the Isolation Forest model
detector = IsolationForest(contamination=0.1, random_state=42)
anomalies = detector.fit_predict(df[features])
# Return timestamps of anomalies
return df.loc[anomalies == -1, 'timestamp']
The idea was to extract some critical data from the robot, such as motor voltages, currents, angle measurement, angular velocity and the PID output, to spot outliers. For example:
These data samples can also be fed to an AI model that can make more complex connections, for example it can tell me "Hey, your PID output seems to swing very rapidly when the angle measurements spike, you should consider a low pass filter on your angle measurements."
What's Next?
I'm working on creating a machine learning model that can automatically analyze SystemView recordings and automatically tune my PID gains. A question that has crossed my mind was, can a machine learning model outperform a PID control loop?
Takeaway
Debugging RTOS systems doesn't have to be a mystical art practiced only by the chosen few. With tools like SystemView and a systematic approach, we can peek inside our systems while they're running and actually understand what's going on.
Remember: your RTOS is just trying its best to manage all the tasks you've thrown at it. Sometimes it just needs a little help figuring out what went wrong. Kind of like all of us on a Monday morning, right?
P.S. If you're wondering about that motor driver I mentioned earlier - turns out current spikes caused some internal mosfets to fail. If you have made it this far, here is a picture of my robot, you are welcome. check out the git here
I love your continous pursuit of learning and honing your craft!
Principal at engineering2design
2 个月Very helpful
President and CEO, TDJ Consulting; US GM, Terafast
2 个月nice job Hussein and liked the idea of incorporating AI to assist with anomaly detection!