Software development is still, by far, the squishiest segment of the engineering discipline.
This is not because software engineers lack discipline. Certainly some of the most methodical and disciplined individuals I have ever met were in the software engineering profession. The problem is in the discipline itself – software is the most complex component of almost every modern embedded system. As a component becomes more complex, our ability to conceptualize its operation and to design an organized methodology for its development and verification is dramatically reduced.
Aviation is one of the most conservative areas of engineering. The acute awareness that even small engineering decisions and procedures carry a great weight in the potential loss of human life has pushed the community into a cautious rigor that would craze engineers accustomed to working in more liberal fields. This extreme caution, and the regulation that goes along with it, has kept aviation decades behind the technology curve in many areas such as materials, powerplant design, and, most notably, electronics.
When the loose and liberal landscape of software development eventually reached the “document and test every nut and bolt” conservatism of the aviation world, there was bound to be confusion. Over the course of a few decades, that confusion settled into a standard for software development called DO-178B. The standard was developed by the Radio Technical Commission for Aeronautics (RTCA) and the European Organization for Civil Aviation Equipment (EUROCAE). It was later accepted by the US Federal Aviation Administration (FAA) as an acceptable means of developing and certifying software for use in avionics.
The process is based on the understanding that all software is not equal when it comes to its effect on aircraft safety. DO-178B specifies a safety assessment process that considers the effects of a software-related failure in the system in terms of its impact on the aircraft, the crew, and the passengers. Based on this assessment, software is rated at a particular level – A through E.
Level A – “Catastrophic” means that a failure in the system may cause a crash.
Level B – “Hazardous” means that a failure has “a large negative impact on safety or performance, or reduces the ability of the crew to operate the aircraft due to physical distress or a higher workload, or causes serious or fatal injuries among the passengers.”
Level C – “Major” means that a failure “is significant, but has a lesser impact than a Hazardous failure (for example, leads to passenger discomfort rather than injuries).
Level D – “Minor” means that a failure is noticeable, but has a lesser impact than a Major failure (for example, causing passenger inconvenience or a routine flight plan change).
Level E – “no effect” means that a failure has no impact on safety, aircraft operation, or crew workload.
Each of these levels carries with it a corresponding number of development and verification objectives that need to be satisfied and a notion of the separation of the development and verification tasks that must be accomplished “with independence,” meaning that the responsibilities for development and verification must be separated so that the same engineers are not responsible for both the development and testing tasks.
Interestingly, unlike many aspects of aviation-related engineering, and in a world where things such as the proper rotational direction of safety wire are typically specified, DO-178B is objective-based, not process based. The standard says you must have processes, and those processes must meet certain objectives, must have well defined entry and exit criteria, and must be documented. It does not, however, specify what those processes must be. The definition of the actual software development and testing process is left as an exercise for the user. Caveat Engineer. (Let the programmer beware.)
The standard, although complex, is straightforward enough that a reasonable development team, creating a software system from the ground up, would have a reasonable chance of digesting the documents and developing compliant software. Notice, however, that we said “from the ground up”. As anyone in software engineering knows, “from the ground up” is a fictitious situation that probably only really existed when the first line of code was written by the first software engineer – in approximately the year 1212. (NOTE: Don’t come back posting a thousand comments that this date is wrong. We looked it up in wikipedia… or, maybe not.)
At the very least, if we’re developing an air-bound application, we will probably need to be using an operating system. As you might have guessed by now, that operating system is software too, and it must be certified by DO-178B right along with your application. Does that seem like a problem? Here’s a hint: The operating system running on the machine where you’re reading this article almost certainly doesn’t cut it. Luckily, the people who make operating systems for embedded use have thought of this already and have provided support – in varying degrees. RTOS offerings, including Wind River -VxWorks, LynuxWorks – LynxOS-178, Green Hills – Integrity-178B, Micrium – uC/OS-II, and Mentor Graphics – Nucleus, have all been certified under DO-178B and are promoted by their suppliers as “certifiable.” (We too have been called “certifiable”, but we’re pretty sure that is in a different context.)
A “certifiable” RTOS still has to be proven in your system. Most of these vendors supply some sort of a certification package that includes things like “Certification Evidence,” which is the documentation required to support whatever level of certification (A, B, C, or D) that your system is after. There are also a number of consultancies that specialize in helping you get your system certified, and many of them have experience certifying systems with the OS components listed above.
LynuxWorks has taken the process one step farther, going after the relatively new Reusable Software Component (RSC) acceptance procedures outlined by the FAA. These procedures are designed to allow certain certified components to be re-used without re-certification. This means that some software components – like most of a typical RTOS, can be certified in a way that is portable. LynuxWorks LynxOS-178 was the first OS to get RSC certification by the FAA.
The RSC approach is particularly useful in developing portable, safety-critical applications. The re-use of certified software components also enhances overall safety by reducing the proliferation of large numbers of similar components, focusing the verification efforts on shared, portable, re-usable software.
Of course, the need for certification of the parts of your system you didn’t develop doesn’t stop with the OS. Compilers (and linkers and loaders and…) are also in the critical path for software functionality. A compiler bug can easily create a malfunction by generating incorrect code, even when the application developer does things right. For the most common development languages like C, C++, and Ada, compilers and compiler verification tools are available to certify that the compiler is actually creating object code that does the same thing your source code intends.
The FAA cares less about your debuggers and your IDE in general. Since those components are not actively modifying or generating source or object, they are not directly in the critical path for software reliability. You are pretty much free to use your favorite IDE and debugging techniques.
So, next time you’re in a meeting with those marketing folks, and one of them pops out something like “we could take this into the aerospace market as well,” you’ll be ready. Grab a whiteboard marker. Walk them through the basics of DO-178B. It’ll make you sound smart. It’ll almost certainly get you an early adjournment for lunch, and – unless your company is in the aviation software business already, it’ll probably keep you away from the frustrations of flying software.