Freedom From Interference: Watchdog Manager Safety Mechanism (I)
Jan 29th, 2023, Issue no.36, ISO 26262
This series is dedicated to automotive functional safety beginners, system engineers, software engineers, or anyone who wants to know about automotive functional safety ISO 26262 standard from ZERO. Disclaimer, this series is the author's view of ISO 26262 and not the view of any company, institution, or organization.
Introduction
In complex software architectures, it is common to see mixed critical functions coexist and interact together: ASIL D with ASILA or ASILx with QM
When we are systematically analyzing any two sets of software components(SWCs) for any potential temporal, spatial, or communication interference using the FFI, we come across a temporal interference so we have to address this with the watchdog timer. In AUTOSAR, we are using a Watchdog Manager (WdgM) that handles that failure mode.
For example, the Pre-Processing SWC and its Monitoring Function shall be analyzed to make sure they are free from interference, temporal one.
Watchdog Manager
The Watchdog Manager is a basic software module at the service layer of the basic software architecture of AUTOSAR.
In the above layered-architecture, you can notice that the watchdog manager at the system service layer is independent of the watchdog drivers. That being said, it would offer multiple services as a server to different software components (clients): SWCs, BSW, or (Complex Device Driver)CDD. ?
Imagine that your software dynamic architecture is structured into 5 tasks = (T1, T2, T3, T4, T5). Three of these tasks T1, T2, and T3 are periodic and you would like to make sure we are not missing a single job of any of these tasks, what do we need?
We need a live supervision mechanism that can be realized by the watchdog manager mechanism. It supervises if the task is online (exists) in a configured periodic time.
The remaining two tasks T4 and T5 are aperiodic, for example, if the vehicle in an aperiodic event upon certain touch on the Infotainment cluster caused an external interrupt but that event has a certain deadline so we need a deadline supervision mechanism to make sure we are within the bounded time (windowed time) tmin & tmax.
Let us Zoom inside any Task T1, it contains multiple runnable (functions) and we would like to verify on run time that the execution sequence of a specific runnable is correct according to the desired sequence, we would configure the WdgM to logical supervision mechanism.
In the following snippet, if for any reason the ButtonFlag is changed to False by error, the CPU control will execute a wrong branch and hence, wrong output and we won't be aware of that. Therefore, control flow monitoring using logical supervision of the WdgM will detect whether a specific sequence is achieved chronologically or not. If not, it will reset the ECU or the MCU to recover from the erroneous event. One of the key functionality of the WdgM is if the global supervision status is not OK then reset ECU or do not set triggering condition so that the WDG driver will not trigger WDG and WDG will generate reset.
If(ButtonFlag == TRUE)
{
DisplayChargingStation();
}
else
{
DisplayCurrentRoute();
}
Alive Supervision
Periodic Supervised Entities (Runnables for example) have constraints on the number of times they are executed within a given time span. By means of Alive Supervision, Watchdog Manager checks periodically if the Checkpoints of a Supervised Entity have been reached within the given limits. This means that Watchdog Manager checks if a Supervised Entity is run not too frequently or not too rarely.
Supervised entities(runnable) use?checkpoints?and when a checkpoint is reached while execution,?supervised entities notify to watchdog manager that this?checkpoint is reached?which means, executed. There can be multiple checkpoints in one runnable or a single checkpoint in one runnable that depends on the design and can be configurable.
WdgM is not responsible for triggering the WDG driver since AUTOSAR version 4, WdgM sets a flag in software to true or false i.e. sets triggering condition to TRUE or FALSE?based on global supervision status.?WDGM uses the function?WdgIf_SetTriggerConditionto() to update a flag.
Why the triggering condition is based on the global supervision status?
Because if Task T1 is running properly while T2 task is stuck so:
领英推荐
T1 would kick the WdgM and prevent it from resetting
T2 wouldn't kick the WdgM because it is stuck, it won't reset because T1 already prevents the WdgM from resetting.
Therefore, we need the global status of all tasks to get the aggregate decision to kick the WdgM. That's why WDGM uses the function?WdgIf_SetTriggerConditionto() to update a flag based on global supervision.
void SE1Runnable_5ms()
{
Rte_Call_WdgMCheckpointReached(SE1_ID,CP_ID_1);
//code to be executed every 05ms
}
void SE2Runnable_10ms()
{
Rte_Call_WdgMCheckpointReached(SE2_ID,CP_ID_2);
//code to be executed every 10ms
}
void SE3Runnable_20ms()
{
Rte_Call_WdgMCheckpointReached(SE3_ID,CP_ID_3);
//code to be executed every 20ms
}
Note:
For each supervised entity, upon watchdog configuration, there are two parameters to be added as a margin for the watchdog because we can't configure the timeout of the watchdog on the edge of the periodicity of each SE. Therefore, for example, SE3Runnable_20ms(), the configuration time of the WdgM would be:
20ms+max_config_time or 20ms-min_config_time
This is because of task jitter.
Deadline Supervision
Aperiodic Supervised Entities have individual constraints on the timing between two Checkpoints. By means of Deadline Supervision, Watchdog Manager checks the timing of transitions between two Checkpoints of a Supervised Entity. This means that Watchodog Manager checks if some steps in a Supervised Entity take the time that is within the configured minimum and maximum. This watched deadline has also a boundary: min_WdgMdeadline or max_WdgMdeadline
void InitializationADAS_ECU(
{
RteCall_WdgM_CheckpointReached(SE1_ID,CP_ID_1); // Report Checkpoint 1 Reached
// Init perception
// Init planning
// Init Actuation
RteCall_WdgM_CheckpointReached(SE1_ID,CP_ID_2); //Report Checkpoint 2 Reached
})
As you have seen above, WdgM will calculate the time between 1st and last checkpoint, therefore, we need at least two checkpoints for aperiodic SEs.
Logical Supervision
Logical supervision is a fundamental technique for checking the correct execution (flow control) of embedded system software and hence achieving the robustness attribute according to ISO 26262-6, 8.4.5
8.4.5 Design principles for software unit design and implementation at the source code level as listed in Table 6 shall be applied to achieve the following properties:a) correct order of execution of subprograms and functions within the software units, based on the
software architectural design;
b) consistency of the interfaces between the software units;
c) correctness of data flow and control flow between and within the software units;
d) simplicity;
e) readability and comprehensibility;
f) robustness;
EXAMPLE Methods to prevent implausible values, execution errors, division by zero, and errors in the
data flow and control flow.
g) suitability for software modification; and
h) verifiability
Logical supervision focuses on control flow errors, which cause a divergence from the valid (i.e. coded/compiled) program sequence. An incorrect control flow occurs if one or more program instructions are processed either in the incorrect sequence or are not even processed at all. Control flow errors can lead to data corruption. For the control flow graph this implies that every time the Supervised Entity reports a new Checkpoint, it must be verified that there is a transition configured between the previous Checkpoint and the reported one.
The following code snippet can be demonstrated by the following control flow graph:
CP0-0 i = 0;
CP0-1 while(i < n) {
CP0-2 if (a[i] < b[i])
CP0-3 a[i] = b[i];
CP0-4 else
a[i] = 0;
CP0-5 i++;
CP0-6 }
;
The exepcted logical supervision:
Expected Transitions :???CP0-->CP1-->CP2-->CP3-->CP5 or CP0-->CP1-->CP2-->CP4-->CP5
Therefore, if the sequence was,
CP0-->CP2-->CP1-->CP3-->CP5, it would indicate, there is a control flow error.
Conclusion
WdgM builds?the local status of supervision and based on local status WdgM calculates the Global Status of supervision. WdgM is a mechanism to detect the temporal violation and control flow violation and react according to the configured handling methods: resetting the ECU or the MCU or inhibiting a certain function. In the next article, a more detailed explanation of live, deadline, and logical supervision will be provided.
Reference
Embedded Software Engineer at Kenotom - Embedded Engineering Excellence
2 年Very insightful!
Helping you get 120% more out of Automotive Systems Engineering | ex-Bosch
2 年highly insightful - thanks
PhD in Electrical Engineering and Telecommunication
2 年Great one
Senior Functional Safety Engineer at Vay
2 年Great article Abdelrahman Hassan