Guidelines for using software libraries in safety applications
I know this is a huge topic, given the variety of software libraries and different ways to cope with them in different programming languages and different operating systems. In order to try cutting a long history short the purpose of this writing is to give a broad overview of:
- how safety standards (IEC 61508 and ISO26262) address the issue of libraries
- what types of libraries exist and, based on that, how can they affect safety and real-time performance of host system
- how are libraries basically "working" (how are they hooked-up in the application, how is memory reserved for them and when is it reserved and what risks for the entire system integrity can all of those expose
- some general statements about what is to be considered when using a library in a safety critical application with some real-world examples (e.g. "safety concept" for POSIX API, or for standard C library)
So, some experienced programmers reading that, would say it only scratches the surface .. as I said, is not meant as a deep-dive, especially due to the high variety of the topic. Essential is to identify when and how a software library would jeopardize entire system execution flow, or system memory mangement, and to provide guidelines on how to prevent that.
Before starting is necessary to shortly highlight the increased use of software libraries across many safety applications, given the high level of their integration and the highly distributed business ecosystem, where many third party suppliers specialize on particular applications.
A software library is basically a collection of data and functions dealing with a particular task and solving a particular problem. It underpins the concept of re-using pre-written code and that of implementation hiding. Third party companies can specialize in developping dedicated software functions which can be hooked-up in existing software applications via some defined interfaces-set, or API, meanwhile preserving the IP value of their internals, or algorithms.
How are then safety standards addressing this issue, of pre-written software to be re-used for safety applications? IEC61508 unfortunately not so much in depth. It rather considers it in the broader context of "qualified" or trustable re-usable software components. There are essentially two clauses (described in part 7 - Overview of techniques and measures) dealing with that: C.4.5 Library of trusted/verified software modules and components and C.2.10 Use of trusted/verified software modules and components. Both of them are referred in part 3 in Table A.4 - Detailed software design. As the name implies, former one enforces empirical evidence as a means to enhance its trust, while latter one goes a bit further to suggest some ways in which this can be achieved. Those are useful guidelines, but in my view they do not quite get to the meat of the matters and how libraries are integrated and contracted, in practice today. They enforce:
- unchanged, or stable, specifications of the library
- proof of usage for different clients or even different industries
- evidence of operating history: generally more than one year of service, but preferrable statistical evidence and along with that, evidence of no safety-related failures
- documented procedures for detecting, registering and removing of faults during its development
Those criteria can hardly apply to nowadays world where is almost impossible for a library developer to track the usage of their product, scattered across dozens of client applications. It is also practically not feasible, if not, completely irrelevant, to log that information. In this sense ISO26262 comes nearer to the common sense issues, drifting away from this statistical proven in use argument and focusing more on the interface, or how the contract with its supplier shall look like.
Before getting to the nitty-gritty it worth saying that ISO26262 considers two aspects of external software to be qualified for usage in safety applications. The difference is clarified in part10 Clause 8 - Safety Element out of Context Development:
Given the highly dynamic and distributed automotive ecosystem, where OEMs, or Tier1 suppliers are increasingly using software libraries initially designed for PC-desktop applications, never with "safety in mind", it is rather the 2nd aspect here above which applies. For their qualification for safety application usage, the standard prescribes the following (in part 8 - Clause 12):
- a requirements specification and the proof it meets them (evidently listing the assumptions which may apply)
- resources needed (response times and memory footprint)
- requirements on the runtime envirnment with a specifcation of its configuration
- a description of its API
- a description of library dependencies with other software components
- error or exception handling description
- reaction to anomalous operating conditions (as for instance library re-entrant code)
How many types of libraries are there? Basically two, statically linked (library functions are known at compile time) and dynamically linked (library code can only be accessed at runtime, so no need to recompile the main program in order to invoke that library).
Apart from "binding" the library into the program, there is also the subtle issue of loading it into the memory. Depending when this can occur, one can speak again about static (loading at program start) and dynamic (at library call) loading. In order to better perceive these those differences, a good reading is the stackoverflow thread here, or for Linux systems, here. Let me now take each of the ISO26262 recommendations above and say how can they apply individually to each library type.
Requirements specification - this is at first sight a common sense issue and applies to all types of libraries. It shall document the library API, along with its "specialities" (I'll come shortly to that below) and library behaviour. A series of "environment assumptions" shall be also included, such as system configurations needed, required hardware processing power and the like. A proof of library code coverage shall also be made available (and ISO26262 stresses on that). This aspect, though very important since it may reveal most of libraries' undeterministic behaviour and corner cases, shall nevertheless be interpreted with care. It highly depends on the programming language is written and the application is meant for. Is not alaways possible to test each state of an object or each way a function can be overloaded (for OO languages), or is not possible to test each and every result a complex math function can deliver.
Resources needed by the library - required memory footprint shall be indicated, along with some results of benchmarking the library performance on some common hardware platforms (specifying their configuration). This varies depending on the libraries. For statically linked ones, since is known at compile time which application will use it, the compiler allocates memory, for code and data, each time a library-function is called. There are consequently no precautions to be taken especially for the library, the ones concerning system memory protection in general, can apply. Each library function call will succeed in the same way as for an internal program function call, with library-function parameters saved on the stack, updates of instruction pointer and all the rest. Temporary data used by the library will be also saved on the stack and IF the library is using the heap memory, this shall be specified. If the library is about complex math computation or any hardware-intensive operations, a proof of benchmarking (as said above) its performance shall be made available, so that the integrator knows when expect the returned results.
With the dynamically linked ones it gets a bit more complicated, since everything labelled as "dynamic" is reluctantly regarded by the safety engineers. They rely mainly on the operating system support to locate and mange them. This one needs to create a symbol table, meaning a data structure containing the memory addresses of functions' entry points, in order to locate library functions at run-time. One needs to note the difference between "knowing" (or binding, or linking) about a library and allocating memory for it, and here's where the static and dynamic loading come into play. Dynamically linked libraries which are statically loaded, are completely managed by the operating system. This one knows about all libraries to be used, and loads them at program start-up. Usually creates a single memory section for a library, so that its code is shared among many processes, this is why are also called shared objects. Here memory management and code re-entrancy are the safety issues at stake, both of them to be handled by the host operating system. The fact that the total amount of memory to be used by an application is not known at compile time, but only at run-time, exposes the system to great degree of undeterminism and is recommended to be handled with care. I think if the memory segment that is to be allocated is fixed (and system overall resources permit it) and if its boundaries are somehow canary-protected, one can get along with that even for safety. The good news, also from security point of view, is that the library code will always be executed in user-mode (so no direct control of HW resources) and that control is centralized in the hands of OS. Same as for re-entrancy, if the OS has locking mechanisms to prevent its perils, then everything could be kept controlled.
For dynamically linked and also loaded libraries the matters are much more complicated and are usually not recommendable for safety applications. On top of all undeterminsm sources, mentioned previously, we need to add the one that each application has to manage on its own memory allocation for libraries it needs. We'll get rid of the "re-entrancy" issue, since separate memory will be allocated each time the library is loaded, by each process, but the safety "checkpoints" will move from OS into applications.
Components dependencies is basically critical only for dynamic libraries, since for static ones the dependencies should be already resolved at compile time and program won't compile if dependencies are not solved. In case of dynamically linked libraries, it is not known, until run-time, what components from the host system will they access (call). On Linux-like system this is solved via a package management system.
Library API specification concerns memory and type safety. Via those APIs the host application can basically handle the keys to its internal memory to the library. You probably heard of buffer overflow issues, which is also a security exploit, arising from mishandling libraries' functions parameters and return value. This is closely connected with the programming language used which can vary depending on the way it is handling data types and their boundaries check. The most used ones in Automotive, C and C++, do not natively support type and bounds checks, but there done revisions and technical reports in order to patch that. On behalf of library supplier, it is his duty to inform the client about those specific APIs exposed to such issues. Functions performing string operations, dealing with types conversions or type arithmetics have to be documented and protective measures have to be specified for them.
A good and very illustrative example in this sense is the POSIX as standard API for UNIX-like systems. Since it is an open API, it was made public a safety concept specifying which functions are unsafe or which are "conditionally" safe. POSIX functions are annotated with keywords "warning" the user about those which may present various "risks", such as being called in multi-threading environment, using dynamic memory (somehow an apriori un-safe feature) or requiring system calls (therefore requesting higher, than user mode, privileges). ARINC 653 as an API specification for aircraft safety critical applications comes even closer to that.
The main point here is that the industry foreseeing the intenser use of distributed software development and library third party suppliers, should come with a standard, as in case of avionics, here above. In automotive, AUTOSAR consortium tackled this to some extent. In case of classic AUTOSAR there are already on the market suppliers of a so-called Safe-RTE (Run-time Environment), though this does not usually cover libraries and in case of adaptive, there are efforts done in safety-fying the middleware API.
I tried to wrap up some of the guidelines in two safety standards and to bind them with the way libraries are working and how they provide their services (functions) to the system, without deeply considering neither one, nor the other. Hope it was intuitive enough, helpful for both sides, library supplier as well as integrator and will smoothen the appliance of ISO26262 in this context.
Expert in S/W Implementation for Functional Safety
5 年Very interesting; the concept also applies to tier-2 companies that are seeking to peddle their software to Tier-1's and OEMs. The SEooC approach is my favourite ;-)
Embedded systems for Electronic Warefare
5 年Great read, shared!
Interesting read Bogdan. Gives me a better view on the issues the engineers have to deal with.
Apply human learning & empathy to understand problems & build solution
5 年Good summary. One of the reasons for 61508 not detailing much on this context could be because of the fundamental difference in the standards (low volume industrial process control vs high volume mass produced as stated in?https://itq.ch/pdf/safety/Compare%2061508-26262.pdf ). But, am sure the standards will evolve further to the growing technology needs as it did in the past (61508 Ed 2 added some details on reuse but a long way to go!)