Safe Embedded Electronics for Mission-Critical Applications
MEN designs and builds reliable, safe computers for highly available and redundant safety-critical systems up to SIL 4 and DAL-A. Apart from that MEN also designs functionally safe computers for redundant, diverse and deterministic systems, for EN 50129 / DO-254 up to SIL 4 and DAL-A.
The Need for Safe Computing
Failures of safety-critical electronic systems can result in loss of life, substantial financial damage or severe harm to the environment. Safe computer systems are typically used in avionics or railway applications requiring particularly high reliability. This also goes for the medical market, while industrial automation environments demand more and more functional safety as technology becomes readily available. One of the key design elements of a safety-critical system is redundancy. The complex architecture of such systems usually requires equally complex software, resulting in very time-consuming and expensive development.
Safe computers from MEN operate with different kinds of on-board redundancy in hardware and in software resulting in fail-safe, fail-silent or fail-operational solutions. They help customers shorten their time to market by providing market-specific certification packages in combination with safe real-time operating system support.
Mission-critical applications require safe computer architectures with a predictable failure behavior to avoid financial damage caused by non-operational equipment and to avoid fundamental damage to people or environment.
Different Requirements in Different Markets
Considerations about mission-critical computer architectures are complex and include safety-critical characteristics, reliability questions, error behavior modes, Safety Integrity Levels (up to SIL 3 or SIL 4) and the major IEC and EN standards, e.g., EN 50128 / EN 50129 for railways or DO-254 for avionics (up to DAL-A or DAL-B). A mission-critical system is affected by safe hardware, a safe operating system and application software, even the tools that are used must be safe. And last but not least there is a dedicated development and validation, production and qualification process. While the architecture concepts for different markets are rather similar, the way of thinking and developing a computer system for a mission-critical application is rather diverse.
Tools and Methods to Achieve Safety
Measures to achieve safe hardware include planning tools with version management, the V-Model for development, risk management, requirement tracing, obsolescence management, product qualification, HASS and HALT. Risk analysis methods describe how safety can be calculated – from the well-known MTBF and MTBR values to Lambda, FMEA and BITE identification. Consequently, a safe system architecture can have different structures of redundant sub-units, enhanced by diversity, and considering the relation between safety and availability.
MEN has gathered vast experience with various architectures used to implement functional safety. It became our goal to make safe computers modular and available "off the shelf", and to make them certifiable.
Proven Techniques in Functional Safety
- Fail-Safe Behavior. In case of a serious failure, the system enters a defined safe state. If it is fail-silent, it shuts down completely.
- Redundancy. Multiplying critical components, such as the CPU, increases the function's reliability.
- Clustering.This does not increase a subsystem's safety, but it raises availability. Backing up a system is using redundancy on a higher level with the aim of keeping your system up even in case of a failure.
- Radiation Resistance. Cosmic radiation can cause memory errors in airborne applications. Special design can prevent effects like Single Event Upsets (SEU) in FPGA and memory components.
- Supervisors. Board management and supervision in safe computers need to go beyond the usual CPU functions. A reliable CPU should have a dedicated monitor at its side rather than supervise itself.
- Diversity. If redundant components are identical, a common cause can make them fail. This is why a system must support dissimilarities both in hardware and in software, e.g., diversely built up I/O or different operating systems on redundant processors.
- Determinism. The need for predictable behavior forbids a number of mechanisms, like interrupts, common in non-critical applications. Design engineers need particular expertise in this respect.
- Event Logging. While this is not a necessary safety function, it can help track back faults in critical systems in case of an incident. Chances are higher to avoid the error cause in the future by taking precautions.
- Qualification. Functional safety demands the highest level in quality, with vendors ideally certified to standards like IRIS or EN/AS 9100, which are more comprehensive than ISO 9001.
- Standardization. Safety-critical applications are dictated by industry standards like EN 50129 or RTCA DO-254, which define the safety aspects for railway or avionics components, including integrity levels like SIL or DAL.
- Certification. Apart from the manufacturer's know-how in standards, the subsystem finally needs to be certified to its SIL or DAL compliance by a body like the German TÜV.