Electronic design automation has its own secret little cold fusion. An innovation that everyone quietly hopes is possible but publicly disavows. A development that would make life beautiful, dogs and cats live happily together, and money grow on trees. This missing link is “behavioral synthesis,” the direct compilation of untimed algorithmic descriptions into practical hardware architectures. Once this is possible, digital hardware designers, the micro-architectural mavens that create much of the magic in today’s ASIC and FPGA designs, will no longer be necessary. All of their relevant expertise, tricks and techniques will be encapsulated in a powerful software application that will dash out optimized datapaths on Monday morning without a sip of caffeine, and crank out perfectly designed hardware all week without asking for a raise or a better 401-K package. Any competent software engineer will be able to fish a function out of their latest C++ application and recompile it for hardware implementation with a 1000X performance boost.
While this prospect may sound both thrilling and terrifying to experts in VHDL- and Verilog-based hardware design, there has never been reason for serious concern. The technical challenges posed by this problem have created in EDA something akin to Fermat’s Last Theorem, with a similar number of false announcements of success. There have been, in fact, enough premature, false, and exaggerated claims that the term “behavioral synthesis” has become so maligned that marketers won’t touch it. New products that use some behavioral synthesis technology are simply described as “automatically creating RTL” or “from algorithm to architecture”.
The standard for success in behavioral synthesis has remained the same for the past fifteen years. A practical system will be able to automatically generate hardware, from an algorithmic description, that rivals the quality of hand-written RTL created by a competent designer. Much like the problem posed by Fermat, the answer is not a single, simple, elegant solution, but rather a complex compilation of work in many areas attempting to emulate the best creative tactics of leading hardware architects.
The Catapult C system announced by Mentor Graphics this week may move us one more step toward that goal. According to Mentor, Catapult C creates “optimized ASIC/FPGA hardware from untimed C++.” The two keywords here that could represent significant progress are “optimized” and “untimed”. Most approaches to C and C++ hardware generation to date have relied on pseudo-timed input with specialized libraries adapting C and C++ to hardware design by adding scheduling constraints and other hardware-specific information into the source description. Mentor’s approach, by working from completely untimed algorithmic descriptions, gives the compiler the maximum flexibility in creating a hardware architecture that is optimized for the design goals of the project. It also means that C or C++ targeted at hardware is more like the generic code that a software developer would normally write.
According to early adopters of Catapult C, the product is capable of creating results that rival and sometimes beat hand-coded RTL. “Our ability to achieve a 31 percent reduction in gate count, which correlates closely to silicon real estate and power consumption, speaks for itself,” said Peter Nord, Project Leader EDA and Methodology Coordination, Ericsson Mobile Platforms.
“We were impressed by the results. The fact that we could synthesize our untimed, system-level C/C++ source code with minimal modification played an important role in the success of this project. It provided a precise path from our system-level models all the way to RTL, which allowed us to meet our required design goals in significantly less time,” said Rudolf Krumenacker, vice president, system-on-chip design, Siemens ICN.
While quality of results is always important in a synthesis tool, algorithmic or behavioral synthesis compounds the problem. A bad result with an RTL synthesis tool might be 20-50% from optimum, but in behavioral synthesis it is not uncommon to get results that miss the mark by 10-100X. The impact on design size and performance from the architectural-level decisions made during behavioral synthesis are far greater than the narrow range of possibility offered by register-level optimization methods. For this reason, it is important to be able to control results to meet the area or performance needs of a particular application. Catapult C provides the designer with a convenient interface that shows the impact of architectural decisions on both performance and chip area and plots various solutions on a graph for easy comparison.
One of the key problems in algorithmic synthesis is creation of the interface to the behavioral module. The I/O interface to the module imposes constraints that limit the architectural options available to the tool. For this reason, many academic attempts at behavioral-level design that seemed promising fell short in practical use. While a tool could often produce near-optimal hardware architectures for a particular algorithm, the process broke down when real-world I/O constraints were applied at the boundaries. Mentor has attacked this problem with a patent-pending interface synthesis technology that creates a wrapper around the algorithmic code, bridging the gap between the algorithm and any external hardware. On the outside, the wrapper manages the interface with many popular standards such as AMBA bus, and on the inside, the wrapper constrains the Catapult C synthesis system to generate hardware architectures that are optimized for the limitations and timing of the chosen interface. Interface synthesis also allows the designer to switch from one external interface to another and generate optimized hardware for each without modifying the algorithmic C source code.
Catapult C uses a library builder tool to collect characterization data for the target silicon implementation fabric and the chosen RTL synthesis tool. This allows Catapult C to create RTL code that is highly optimized for the particular targets, reducing the need for end-of-cycle timing debug.
While Catapult C may represent a significant step forward in algorithmic compilation technology, hardware engineers can still rest easy in their jobs. Catapult C presents an interface that is easily accessible to the hardware architect, but probably still is somewhat confusing to the software developer. While concepts such as loop pipelining, dataflow dependencies, parallelism, latency, and throughput may seem foreign to most sequential-thinking C programmers, RTL designers will find themselves at home using this tool that is analogous to switching from an axe to a chain-saw for RTL development. Using the interactive feedback, automation, and control facilities of Catapult C, it is easy to see that the company’s claims of significant productivity boosts for RTL designers are well founded.
The solution to Fermat’s theorem was long and protracted. Rather than one breakthrough development, it was like the gradual construction of a bridge where the completion of each span brought the ends closer together. So it is likely to be with behavioral synthesis. Catapult C represents one more significant span in the bridge that will someday link the design automation process seamlessly from algorithm to hardware and facilitate levels of productivity never before seen in hardware design.