AN APTITUDE IN QUALITY FOR EMBEDDED SOFTWARE AND SYSTEMS
Walid Negm
Engineering amazing things | Nothing ventured, nothing gained - GenAI, Automotive Software, Cloud-Native & Open Source
There is intense focus by companies on intelligent products that are a source of revenue from software and services as well as brand differentiation.1 Smart healthcare wearables inform on critical conditions---diagnosing and providing an intervention. Connected cars, with driving assistance have over a hundred million lines of code2 that can take control of dynamic driving tasks from collision avoidance, keeping lanes and auto-pilot in certain conditions. Every industry from manufacturing, medical technology, consumer electronics to automotive are shifting to these sorts of software and data driven value propositions.
Underlying intelligent products are embedded systems that combine computer hardware (silicon chips & processors), sensors and software capability. While the global market size for embedded systems is projected to reach over USD165 billion by 20303, the broader economic value that is created is in trillions of dollars4.?In such embedded systems everything must work together with high fidelity. The quality metrics of a smart home device by an OEM could be reliability, energy-efficiency or a better customer experience. Quality is also about compliance with strict product safety standards and governmental regulations. For example, boot time, or time for the software to turn on is often measured in milliseconds in hard real-time systems. So even the smallest faults that impact response times can be catastrophic. The last thing that’s needed is a welding robot to crash into a human operator or the sensors in a vehicle to be incorrectly calibrated.
To keep market share and meet shorter time to market pressures, quality of embedded systems should be a life cycle practice. Organizations need to approach quality as a methodology, standards, processes and tools that span design, production and in-service operations to achieve whatever metrics they choose be that safety, reliability or bug free code. Today, early in the product development phase a battery of verification activities is used to show proof that requirements are met by intermediate software functions and hardware components. Later in the cycle validation activities show whether the whole product meets expectations in terms of customer usage safety and intended behavior in the real-world. However, as device complexity (see Figure 1) rises so does the cost of these activities including interpreting regulations, testing, analysis and inspection, which can collectively exceed 60% of the development cost for high-risk products. 5?
There are several best practices that impact the role of software and hardware developers in testing and how companies achieve higher quality embedded software and systems: ??
Let's dig in...
When a product relies on hardware there is specific attention needed to ensure compliance, performance and reliability of everything combined, be that software, application processors, mechanical and electronics systems. For example, a legal manufacturer of medical devices is obliged to demonstrate it meets a litany of standards, such as IEC 13485, IEC 62304 and ISO 14971 (the list goes on) to ensure patients get effective and safe treatments. The high-degree of regulation in medical devices is due to safety-critical functions and has traditionally meant shying away from cloud-native technologies that pick-up on issues and bugs earlier in the software development life cycle (SDLC). ?
However, the quality of products disproportionally depends on software---which means bugs and faults will exist and must be rooted out early. As an example, in the automotive sector more than 50% of recalls involving embedded systems were software-based defects6. A root cause of limited use of modern software engineering toolchains in embedded systems is that development often happens on local IT infrastructure where real hardware, such as microcontrollers and processers can be loaded with custom bare metal firmware, device drivers, real-time operating systems and application code. A hardware-product approach with a linear process of a fixed set of requirements with a fixed product roadmap is undertaken, often taking years before the start of production. ?
Nevertheless, cloud-native architectures are finding their way to embedded system development along with the benefits of continuous integration and delivery. Before that can happen successfully, hardware and software designs must be decoupled. While some embedded software is highly dependent on hardware interfaces and processor architectures, often there are ways to disaggregate software7 such that changes in a chip would minimally impact the software stack. The thrust is to proceed with parallel development tracks for hardware and software which can be released continuously.?
With hardware abstracted away, cloud-native embedded software development free’s device makers to continuously test, update and manage software releases. The need to recompile the kernel, board specific real-time operating system (RTOS) distribution and re-build the code every time something changes is reduced. To make this real, one approach is to use cloud-based virtual machines based on the Arm architecture which is ubiquitous on devices such as consumer electronics, industrial equipment and automotive ECU's. A developer no longer needs to cross compile code because it is portable from the cloud development environment, let’s say Google’s Tau T2A8 instances, across to the physical devices. Such a cloud-based approach 9would improves workflows for the embedded code including unit, integration and system testing. ARM hosted virtual machines are now available on Azure and AWS.??
Another approach to a cloud-native SDLC is to use a “full-system” simulator 10 where the hardware is software. A virtual model of a processor architecture can be made available to closely conform to the final target hardware – including the full software stack of BIOS, firmware, operating system and software applications. For example, a virtual electronic control (vECU) model for a vehicle can be “emulated” where not only software development, but integration and testing can happen early, without having to wait for the final prototype. More code coverage can be achieved. Faults can be injected at a virtual hardware level to see what happens to the behavior of software functions.??
Finally, run-time embedded virtualization platforms and reference architectures create sufficient isolation of workloads (apps, code and models) that can run on the same processor, multiple cores or multiple processors. Virtualization, container (e.g., Docker, OCI) and other cloud-native technologies are now more accessible at the edge, and for IoT devices, not only to reduce the cost of development but also elevate the quality.??
2. Virtual verification and validation to speed up certification?
Achieving product certification in a timely manner and with a reasonable cost is difficult because a manufacturer will have a huge number of test cases to meet compliance and must ultimately conduct real-world physical testing. To keep pace quality engineering is becoming more about “verification by analysis” where simulations such as physics-based responses, synthetic & virtual scenario generation and hardware emulation evaluate the quality of embedded software, algorithms and digital models of the production hardware.??
Simulations are used to trick a part of the system that is “under test” whether it’s a piece of executable code, an algorithm, mechatronic module or an actuator to show evidence of compliance. For example, a collision avoidance algorithm is presented with millions of variations of scenarios of pedestrians crossing a street in various weather conditions along with a simulation of the steering actuator. How many scenarios are needed depends on target coverage and rules such as speed, breaking behaviors, lane marking adherence and traffic rules. Such software in the loop (SIL) provides physics-based responses, reconstructed real-world images and synthetic traffic environments that resemble the real thing. In simulations mathematical interpretation can be added to align to amended regulatory laws such as UN R157 that allows automated lane keeping at speeds up to 130 km/h from 60km/hr.??
Of course, it is necessary to test software, firmware, control systems and processors within the context of a real-to-life hardware. Such an environment is directly in the loop and physically sitting on a test bench but not necessarily the final configuration. When embedded software is evaluated on these Hardware in the Loop (HIL) systems, they are fed highly precise signals that meet real-time constraints, electrical activity, radio network standards, network interfaces and other physical phenomena.??
End-to-end compliance and certification can be further streamlined and sped-up with digital twin platforms that capture real-world data and bring a holistic suite of simulation to discrete, composite and contextual virtual models with a feedback look of information and insights from the as designed, as built, and as operated environments.?
At the end of the day, there will be a need to run tests on the actual target hardware (someone must fly the plane or drive the car) with the embedded software that is intended for the final release. For example, lighting and surge testing in the energy infrastructure industry is crucial to being able to deploy in these harsh environments.? Vibration and power supply testing in the automotive industry is another.? When dealing with handheld consumer electronic devices, electrostatic discharge testing is critical for device reliability.? Beyond those examples, RF testing on site at complex RF environments such as in the manufacturing or industrial locations ensure that wireless devices deployed into those settings operate as intended.???
?3. Integrated test automation and artificial intelligence?
There is an expectation that efficiencies are gained over a period time because of technology—whether that is automation or artificial intelligence. Whether that’s increasing the testing capacity by scaling the number of tests, proactively identifying defects, reducing the time or effect spent on manual testing.??
When it comes to embedded and real-time devices, augmenting the manual tester is a crucial pathway to increase productivity and bring down the spend on software bug fixing. For example, a test generation model can be used to search for the most relevant corner conditions to ensure continued stability, such as software does not crash, does not run out of memory or other resources, does not overheat, etc.).? Or test case prioritization can be used to find the most effective order of test cases to achieve better software quality in emergency releases. Closer to the developer experience an ability to detect defects within the code before it is handed to a developer can significantly reduce the burden.? Together, these and other intelligent testing use-cases are being integrated into the CI/CD pipeline with an automation platform that cuts across test management, issue management, version control and test orchestration.??
领英推荐
In the realm of mechanical and electronic components life cycle testing is critical to the quality assurance plan. Robotic automation with mechanical actuators is used to perform repeated operations on a device (moving a hinge, pressing buttons, dropping the unit, etc.). These operations identify weaknesses in the product that can then be improved (for example, cracked housing, broken solder joints, connectors that break off, etc.)?? Devices also need to be cycled over temperature and extreme environments (e.g., salt fog, humidity, UV exposure) to make sure the product will hold up over a variety of conditions and still operate with high reliability.???
Finally, there is a need for more sophisticated run-time event monitoring and situational awareness within embedded systems that can detect faulty behavior and the onset of quality and safety signals. The issue of keeping tabs of run-time quality metrics is even more important when product variants and configurations are released to different geographies or customers combine different optional features and software upgrades. ?
4. Trust, security and quality: Computing devices and AI systems?
Product security and quality are intertwined, and one should not be sacrificed for the other. For example, the medical device and automotive industry sectors include security requirements as part of the quality management systems. Eventually every connected product and IoT device will come under threat because of existing vulnerabilities or bad actors looking to steal personal or proprietary data, corrupt over the air updates in a vehicle or take over consumer devices. The good news is an array of built-in security capabilities available for embedded systems from code reviews, HW cryptographic accelerators, hardware anchors, secure boot, secure storage and intrusion detection. ?
While security-by-design is already happening, it must be deeply synchronized with the quality life cycle, so security requirements are part of the verification and validation activities. For example, security requirements that are handed down by regulators should be traced and mapped to safety critical functions on devices and security controls tested to ensure authenticated transactions, data integrity and privacy. ?
While a risk-based approach, which is mandated in most industry sectors, calls for vulnerability assessments on products the links to quality will ensure that weaknesses of application areas are effectively remediated within embedded systems. That could be discovering and fixing lax security controls against network snooping, unauthorized access by viruses or Trojans, modifying device images or product cloning. In addition, penetration testing 11 to exploit devices should be incorporated into the quality methodology to understand the attack surface then adjust the real-world security posture of the system. Finally, given the frequency of over-the-air updates there is a need for embedded systems "cyber range” to explore malware behavior on real hardware.?
While the trustworthiness of electronic systems, control algorithms and embedded software is imperative---there is also a new and dynamic imperative to evaluate and quantify the trustworthiness of AI Systems that operate with some level of uncertainty. There are many examples, especially in a regulated industry, where decisions models that are based on deep neural networks fail to arrive at the right inference and can be seen as negligence or an inability to perform as intended. Companies need a quality engineering framework for machine learning products where they proactively identify all situations of failure, or misuse and determine the implications.?
Conclusion ?
At the heart of the shift to software and services are embedded systems that combine hardware, software and sensors for the next generation of interactions closer to the point of consumption. To keep market share and meet shorter time to market pressures, quality of embedded systems should be a life cycle practice. Organizations need to approach quality engineering of intelligent products and their associated services as a methodology, standards, processes and tools that span design, production and in-service operations to achieve whatever metrics they choose be that safety, reliability and bug free code. To elevate quality in the value proposition organizations, need to pursue several best practices: cloud-native embedded software on and off the cloud, virtual verification and validation to speed up certification, integrated test automation and intelligent testing, trust of computing devices and AI systems.
Contributors:
References: