End To End Protection E2E

Introduction 

Behind the visible parts, a modern car is an electronic network of up to 100 electronic control units (ECUs) connected via several bus systems. The realization of a significant part of the functionality is distributed among several ECUs.For example, the software, that controls the lights of the indicator functionality, is distributed over up to eight ECUs in high-end vehicles. Furthermore, some of the future functionality will not be realized in a loose set of side by side ECUs but needs a large number of interrelationships. 

Usually, the functionality within a car is designed as a communication chain from a sensor (e.g. light on switch) to an actuator (e.g. the light). In such communication chains, faults may occur that lead, sooner or later, to failures. One risk is a break of data integrity within such a chain. That can lead to unpredictable, unwanted behaviour of the actuator. Data integrity from the sensor up to the actuator has to be ensured. Depending on the kind of failure, a set of SW mechanisms exists that help either to prevent or to detect them. Some of these mechanisms were introduced in the AUTOSAR standard.  

The common communication errors 

Possible errors in dependable communication can be one of the following : 

1- Data Loss: It occurs when part of the data or the whole data is lost during transmission. The origin can be a various number of faults like an EMI impulse or a partially, respectively permanent, damaged wire.

2- Repetition: The same data information is received in successive messages. As in the data loss, the fault can be of various origins. In this case, a software defect at the sender can also be the origin of repeated identical data. 

3-Timeout/Time Delay:  Timeout occurs if data is not received within an expected timeslot. The Timeout Error can only occur in a system with defined timing requirements. That means that the sender and the receiver have a common understanding of “time”, e.g. if the sender sends data every 20 ms and the receiver expects data every 20 ms.

4- Incorrect Sequence:  is an error that is typical for highly distributed systems and is defined as follows: The data arrives at the receiver in another sequence than originally sent. The cause for this error is often a system with the positive probability that the sequence of the data is mixed up. This can happen in case of buffered communication or communication via several ECUs (e.g. in gateways). 

5- The Insertion of an unintended message: means that an additional message or a part of it is added in the communication stream. This is an error that occurs very unlikely and has its origin in hardware faults of the internal bus systems in a vehicle like in a CAN or FlexRay. 

6- Data Corruption:  Data Corruption is the violation of the information integrity of the transmitted data. The origins of Data Corruption are in most cases random hardware faults, e.g. a bit flip caused by an Electromagnetic Interference (EMI).

 7- An Addressing Error: occurs if data is sent to the wrong destination and treated at the receiver side as correct data. The reason can be a random hardware fault or a systematic software fault. Usually, a system is protecting this by assigning unique IDs to the single data elements or to the sender and receivers. 

8- Constant ”over-” Transmission: It can happen that due to a fault on the hardware level, different or the same messages are sent, again and again, leading to a bus overload. This frequent retransmission is blocking the bus and, therefore, other safety-relevant data could be detained from being sent.

9- Masquerading Error goes: a step further: in this case, a unique DataId mechanism exists in the system but the data is ”disguised” and therefore accepted although the data origin is not the one it pretends to be. In summary: The receiver side accepts data that is not from the intended sender but pretends to be from it (the correct and intended sender). The cause of this error can be a corruption in the DataId that leads to a false acceptance of single data. A security-relevant origin would be that something changes the DataId with intent.

The communication error detection mechanisms 

There are many mechanisms used for single error detection and below are short description

Hardware Redundancy: Sufficient to detect most of the errors. This is part of the system design and is achieved by providing two or more independent hardware communication channels.  

Time Redundancy: The same information is transmitted twice via two different messages in different time slots. 

Checksum: A checksum is created by an algorithm for a data block. This checksum is transmitted and recalculated at the receiver side  

Sequence Counter/Number: The sender adds a Sequence Counter to the transmitted data. This Sequence Counter is then evaluated at the receiver that has stored the last valid received Sequence Counter.  

Message-ID/Data ID: Each unique message in the distributed system has its own Data Id. By check of the Data Id e.g. addressing errors can be detected.  

CRC (Cyclic Redundancy Check): The whole data block is used as a base for a calculation of a polynomial division carried out by a polynomial generator to create a memory-dependent signature that is sent and recalculated on the receiver side.  Including State of sender/receiver in CRC calculation: The sender and receiver have the same amount of corresponding states. These states are numbered accordingly so that the value of the sender state corresponds to the value of the receiver state. If the CRC is carried out with no error, the consistency of the states can be evaluated indirectly.  

Parity bit: One additional bit is added to the data stream. The goal is to produce in the data stream an odd or even amount of digital “0” or “1”. Which digital value is selected and if odd or even parity is used depends on the design.  

EDC (Error Detection Codes)/ECC (Error Correction Codes): They allow not only the detection of corruption errors but also the correction of these errors. There exist several different algorithms for such codes that can be described by their Hamming Distance. The Hamming Distance describes how many bit failures of the code can be detected. A Hamming code with distance d can correct d-1 errors.  

Timeout by a priori knowledge: Detection of delays using the measurement of time on the receiver side with the knowledge of expected timing.

Timestamp: The Time Stamp works only in a system that has a globally defined time base that is synchronized in the system. A Time Stamp is explicitly transmitted and checked on the receiver side.  

Plausibility and acceptance checks: It usually compares if the received value of the data is within the upper and lower boundaries. The boundaries can be static if just accepting certain ranges or they can be dynamic if verifying the plausibility of the data. With the dynamic verification, the first derivative of the data is evaluated to be within the boundaries for the technical plausible positive and negative gradient of data change.

Information Redundancy: The same information is included twice in one message.

Cryptographic techniques to detect unauthorized manipulation: The whole data is encrypted by an algorithm at the sender side and decrypted with an algorithm according to the sender at the receiver side to detect violations of the data.  


Identification procedure: The sender and receiver work with identification keys to check if the destination and source of the data are valid. Usually, this is achieved by the bidirectional exchange of identification messages.  

Retry mechanisms: In case of data loss or data corruption, the receiver sends on a not successful reception a retry message to the sender. This mechanism uses bidirectional communication and therefore contributes to a higher busload.  


Acknowledgement: It is very similar to the Retry mechanism. But in this case the sender sends the message until it gets an acknowledgement message from the receiver.



The  AUTOSAR standard adds 3 main concepts, two of them are allocated in the com module specification and the third concept is realized by an End-To-End protection library. End to End protection (E2E) is a mechanism used between communication nodes to detect some communication error and this will be the next part.


Ehsan Fathi

Embedded System Engineer at JETCO(IKCO)

2 年

For the first error, data loss, which mechanism can detect this error?

回复
Ali ?ahin

Embedded Software Engineer

3 年

it was very helpful.A beautiful article thank you

Ahmed Elkhateeb

Senior Software Engineer at Valeo

4 年

Nice read

Ahmed Elsayed

Senior Embedded Software Engineer at Valeo, Tech Lead for SwIntTeam at ADAS Front Camera product line

4 年

??? ????? ????

要查看或添加评论,请登录

Mohammed Yasser的更多文章

  • Watchdog Manager WdgM "From Functional safety and AUTOSAR Prescriptives" - Part 1

    Watchdog Manager WdgM "From Functional safety and AUTOSAR Prescriptives" - Part 1

    Introduction Modern ECUs contain highly modular embedded software, which can consist of both non-trusted and trusted…

  • Fee Address Translation

    Fee Address Translation

    In Classical AUTOSAR, There is a dedicated stack used to persist data in the non-volatile memory and It is called…

  • Linux kernel system timer & jiffies

    Linux kernel system timer & jiffies

    A large number of kernel functions are time driven. Time and time measurements are very important for the kernel to…

  • Extended Page Table

    Extended Page Table

    Second Level Address Translation SLAT / Nested Paging It is extended layer in paging mechanism it is used to map from…

    1 条评论

社区洞察

其他会员也浏览了