The magic touch?                  How to safety-qualify open-source software

The magic touch? How to safety-qualify open-source software

With software-defined vehicles we want speed of development. Obviously, the fastest development that you can possibly have is not having to develop anything at all - re-use what's already there. And hey, open-source software exists with high quality and comes for free. Now we always hear the story that this is not possible in automotive because of functional safety.

Is that so? Is it really impossible to use open-source software for safety-related functions?

Before we get started, let's be clear. Safety is not a quality level. Also you don't sprinkle magical "safety salt" on your software component and it is suddenly a "safety component" to be used in safety-related functions. "Safe software components" are software components that have software safety requirements.

This can be either actual functional requirements as e.g. a watchdog component that must trigger a transition into a fail-safe state upon missing a checkpoint, or a requirement on freedom from interference, i.e. the used software does not contribute to a safety-related function but it must not influence a safety-related function in a negative way.

(Footnote: Yes, I am aware that you show safety in a system context - nonetheless you do have safety requirements on individual components to contribute to a system-level safety concept).

Consequently, the first question when we want to use open-source software in a safety-related context must be "What is your safety requirement?". ISO26262 then prescribes a structured development process and, depending on the targeted safety level, runtime mechanisms to handle different errors. Both need to support the case that the safety requirements are fulfilled.

The conflict with open-source

The conflict with open source lies in multiple points.

First of all, to my knowledge, no open-source project exists today that is following an ISO26262 compliant development process. Although a number of OSS projects fulfill a large part of the quality-related requirements (e.g. like test coverage or systematic use of static and dynamic code analysis - SQLite being a positive example), safety artifacts like a technical safety concept (TSC) are not being created, and manual process steps such as formal inspections are not being followed.

Secondly, the governance model of many OSS projects can be problematic for a safety-certification. For most open-source projects, the liability and the responsibility for appropriate use lies with the user. In many cases the release and maintenance are commercialized as paid service. A comparable setup would be required for qualified safety releases - however the required safety artifacts are not developed by the community.

Third, depending on the size of the project and of the community, the frequency of changes may be hindrance to safety qualification. Keeping the pace of the open-source project while updating required safety artifacts can be simply impossible.

What options do we have?

So what are possible paths that we can take?

Retrofit safety

One option is to retrofit safety for an existing open-source project. Take the source code of the open-source project, define your safety requirements (that are hopefully already fulfilled by the project), create all the required safety artifacts and perform the required qualification. Upstream required code changes to the project so that you keep in sync with the community. Of course you do need to make dedicated releases and check any changes from the community to validate whether they impact your technical safety concept or the assumptions on the environment.

Such a process can work for projects that have a low number of contributions so that analyzing changes and updating your safety artifacts is tractable. Also having a governance model with a central owner for approving pull requests helps. Further, this can only work for small projects or projects where only small parts of the software need to be certified because freedom from interference to the rest can be shown.

Fork

What if you cannot effectively control the governance of this project? Option number 2 is to fork. The maintenance of your project will be more difficult as the software versions will diverge. However, for one-off systems where you build a product and only want frozen version support without participating in the continued development of the community, this may work. Still, even with just frozen version support this is a major challenge. If you have to deal with cyber security management and need to address Common Vulnerabilities and Exposures (CVEs), as is the case for any connected device, you are left to your own devices for tracking CVEs for your specific fork. In any case where you benefit greatly by keeping in sync with the community, this is not a tractable model.

Introduce a safety checker

A third option may be not to safety-qualify the open-source software at all. As we typically see with watchdogs or CRC checks, it may be sufficient to check the correct operation of the software. In that case, only the monitor and the safety reaction will have safety requirements but the open-source software can remain as is. In this case you do have stronger requirements on the surrounding system architecture though. You will need to show freedom from interference between your safety monitor and the open source software. You may also need to deal with frequent changes to your monitoring functionality when the open source project introduces changes - especially if the open-source project does not put emphasis on backwards compatibility. However, this approach can be suitable for projects with larger code size, higher change frequencies, and arbitrary governance models than using the retrofitting model. The applicability strongly depends on the provided functionality though.

Automate the qualification itself

A fourth approach that is being frequently discussed is that of automated safety qualification. For an open-source project that you have qualified according to the first approach (retrofitting safety), can you automate the process of re-qualification to a large degree? For any changes that do not impact your technical safety concept, safety architecture, or safety manual (e.g. assumptions on the environment), this largely boils down to re-performing qualification that can be automated. However, the question for this approach remains, whether you can easily determine whether a pull request has an impact on your safety concept? It very much depends on the size of the project and whether any freedom from interference guarantees can be given. In my view this approach is doomed to fail for projects with high complexity and high change frequencies.

The bottom line

It all boils down to your safety requirements, the complexity of the open-source project, its change frequency, and its governance model. Smaller code base and less complexity help. Lower change frequency helps. Stricter forms of governance help. Also automating the process requirements as in the fourth approach help.

In any case, you cannot neglect the environment in which you intend to use the software. How often do you intend to release new qualified versions? Can you show freedom from interference from other system parts?

Stefan Hermann

Senior Systemengineer - the art of system engineering

1 年

Dear Moritz, what is the relation between SW testcoverage and SW fault probability? BR, Stefan

回复
Stefan Hermann

Senior Systemengineer - the art of system engineering

1 年

Dear Moritz, what meens software defined-vehicules? BR, Stefan

回复
Stefano Marzani

Enabling the Software-Defined Revolution

1 年

Cool post and cool comments! :+1:

Mike Allocco, Emeritus Fellow ISSS

System Safety Engineering and Management of Complex Systems; Risk Management Advisor...Complex System Risks

1 年

Good luck...You need to connect the dots? After all the confusion with complexity and AI/ML, IoT, EV, AV, UAS, eVTOL, advance automation, hyper-automation, agile processes, advanced digital and system complexity, advanced air mobility, open systems, deep-tech, and advanced technology, 5G assumptions, meta data, super sensors,?synthetic realities, quantum computing,?and hypes… we need to: Maintain control over all automation and advanced technology; Keep the human in the loop; Actually understand system assurance: human, hardware, software, firmware, logic and the human and environmental integrations and apply system safety, software safety, cyber safety, cyber security, system reliability, logistics, availability, human factors, human reliability, quality, survivability, etc.; Design systems to accommodate humans; Systems will fail, inadvertently operate, increase system risk with complexity; Humans will fail; Design systems to fail safe; Design systems to enable human monitoring; Design systems to enable early detection, isolation, correction and recovery… Mitigate the automation hallucination risk…

回复
Florian Gilcher

Brought Rust to Functional Safety (along with an incredible team)

1 年

As it so happens, we have yesterday released a qualified version of the Rust compiler called "Ferrocene". https://github.com/ferrocene/ferrocene The associated material and documents can be found here: https://public-docs.ferrocene.dev/main/index.html It was a very achievable endeavor. We encountered many of the point you mention though and I think its a combination of factors. I think you _can_ qualify/certify every piece of open source software (in theory), it does get way easier and achievable though if your upstream actually has procedures that informally already fit into yours.

要查看或添加评论,请登录

Dr. Moritz Neukirchner的更多文章

社区洞察

其他会员也浏览了