In a traditional FPGA design flow, crafting the hardware architecture and writing VHDL or Verilog for RTL synthesis requires considerable effort. The code must follow a synthesis standard, meet timing, implement the interface specification, and function correctly. Given enough time, a design team is capable of meeting all these constraints. However, time is one thing that is always in short supply. Deadlines imposed by time to market pressures often force designers to compromise, resulting in them to settle for ‘good enough’ by re-using blocks and IP that are over designed for their application.
In the past few years, tools and methodologies that support algorithmic synthesis have risen to help designers build and verify hardware more efficiently, giving them better control over optimization of their design architecture. The starting point of this flow is a subset of pure C++ that includes a bit-accurate class library. The code is analyzed, architecturally constrained, and scheduled to create synthesizable HDL.
For algorithmic intensive designs, this approach for creating verified RTL is an order of magnitude faster than manual methods. So if a designer is currently building and verifying 1,000 gates per day, the same designer using algorithmic synthesis can build and verify 10,000 gates per day.
Beyond Behavioral Synthesis
If all this sounds familiar, that is because behavioral synthesis—introduced with significant fanfare a few years ago—promised the same productivity gains. Reality soon caught up with the hype, however, as designers discovered that behavioral synthesis tools were significantly limited in what they actually did. Essentially, the tools incorporated a source language that required some timing as well as design hierarchy and interface information. As a result, designers had to be intimately familiar with the capabilities of the synthesis tool to know how much and what kind of information to put into the source language. Too much information limited the synthesis tool and resulted in poor quality designs. Too little information lead to a design that didn’t work as expected. Either way, designers did not obtain the desired productivity and flexibility they were hoping to gain.
Removing timing and parallelism from the source language is what separates first-generation (behavioral) high-level synthesis from second-generation (algorithmic) high-level synthesis. Algorithmic synthesis tools decouple complex IO timing from the functionality of the source. This allows the functionality and design timing to be developed and verified independently.
There are a growing number of algorithmic synthesis tools on the market today, making it difficult to sort through their competing claims. Basically, one of the primary differentiators is what language the tool requires the designer to use at a higher level to describe the algorithms. Some rely on languages that include hardware constructs, including Handel-C, SystemC, System Verilog and others. Unfortunately, these languages are extremely difficult to write, and are closer to the RTL abstraction level than higher level languages used to describe system behavior such as C++.
C-Based Algorithmic Synthesis Basics
The most productive algorithmic synthesis tools are based on pure ANSI C/C++, making it possible to develop functional hardware in the form of C algorithms and synthesize process-specific and optimized RTL code. American National Standards Institute (ANSI) C++ is one of the most widely used algorithmic modeling languages in the world. It incorporates all the elements to model algorithms concisely, clearly and efficiently. A non-proprietary class library can then be used to model bit-accurate behavior. And C++ has many software design and debug tools that can now be re-used for hardware design.
With algorithmic synthesis based on pure ANSI C++, the source code doesn’t embed constraints such as clock cycles, concurrency, modules and ports, which would result in a rigid description – verbose and bound to a specific technology. Instead the user can apply synthesis directives to specify the target ASIC or FPGA technology, describe the interface properties, control the amount of parallelism in the design, trade-off area for speed, and more.
Synthesis constraints for the architecture can be applied based on the design analysis. These constraints can be broken into hierarchy, interface, memory, loop and low-level timing constraints.
• Hierarchy constraints allow the sequential design to be separated into a set of hierarchical blocks and define how those blocks run in parallel.
• The interface constraints define the transaction-level communication, pin-level timing and flow control in the design.
• Memory constraints allow the selection of different memory architectures both within blocks and in the communication between blocks.
• Loop constraints are used to add parallelism to each block in the design, including pipelining.
• Low-level timing constraints are available if needed.
Once the design is constrained, it can be scheduled. The result is the designer can quickly infer the appropriate amount of parallelism necessary to meet the performance requirements. This allows the same sequential C++ source to be used for creating highly compact (serial) designs to highly fast (parallel) designs.
At the core of every high-level synthesis tool is a scheduler. Once all the architectural constraints are selected, the scheduler applies the constraints to create a fully timed design. The scheduler is responsible for meeting all the timing constraints, including the clock period. One of the biggest conceptual changes between RTL synthesis and algorithmic synthesis is that the design is not written to run at a specific clock speed, rather the high-level synthesis tool builds a design based on the clock speed constraint. Many tools claim to be high-level synthesis tools, but without a scheduler, they are merely translators and much less powerful than high-level synthesis tools.
Once the scheduler has added time to the design, the RTL output can be generated. The RTL generation involves extracting the datapath, control and FSM for the design. Now the design needs to be verified.
An algorithmic synthesis tool knows the full detail about the timing and structure that is added to a design. This means the tool can also allow the original untimed C++ testbench to be re-used for every output from the tool. In other words, the untimed testbench can be re-used for every architecture created from a C++ design. Once it is verified the design can be run through a standard RTL synthesis flow.
Achieving Optimal, Robust Designs
By reducing the amount of effort spent needed to generate code, algorithmic synthesis gives designers more time for architectural exploration. They can efficiently evaluate alternative implementations, modifying and re-verifying C to effectively perform a series of “what-if” evaluations of alternative algorithms. Users need only to change the constraints within an algorithmic synthesis environment to optimize a design for size, performance, or a variety of other variables. After exploring a range of possible scenarios in a relatively short period of time, the designers can quickly determine an optimal implementation within a reasonable schedule.
Equally important, the quality of the source code is greatly enhanced. Since the lower level code is automatically generated, there are fewer bugs introduced into the design—up to 60% fewer. Automatic synthesis eliminates errors that invariably crop up during manual RTL generation, which is key to improving the overall design cycle.
Higher level synthesis tools are even capable of automatic interface creation. Advanced algorithmic synthesis methodologies are able to directly support interface synthesis, because the source code is sequential. Intelligent algorithmic synthesis tools can create all of the parallelism, concurrency and structure in a hardware design, and generate interface protocols that closely match the dataflow needs of the design. The result is more efficient design with fewer errors. In addition, the resulting interfaces are synthesized to meet, but not dramatically exceed, the performance of the chip, saving on valuable silicon real estate.
To enhance the verification effort, the same high-level description is used to automatically create a consistent verification environment from C to RTL, including high speed transaction-level models, ensuring that the intent specified by the system engineer is preserved. This replaces the slow, manual process of creating models and incrementally adding structure, concurrency and parallelism to develop transaction-level, behavioral and RTL models. Called manual progressive refinement, misinterpretations and syntax errors are introduced as the various models are hand-coded, creating a verification and model maintenance nightmare.
Algorithmic synthesis solves the fundamental problem of creating transaction-level models for new signal processing hardware. Algorithmic synthesis tools automatically generate transaction-level models from a pure ANSI C++ description, adding structure, parallelism and concurrency to create models at various levels of abstraction. These SystemC and System Verilog models, with their hierarchy and parallelism, provide design teams with powerful options for system-level verification. Although transaction-level modeling challenges remain for CPUs, third-party IP and embedded memory, algorithmic synthesis frees the designer from manually creating these models for signal processing hardware, resolving one part of the transaction level model creation problem.
Conclusion
Algorithmic synthesis is the first approach that truly supports practical hardware synthesis from sequential languages to timed RTL. Starting with C++, the same language can now be used for software, hardware and system modeling. By allowing more optimization options, hardware designers consistently achieve better results than hand-coded designs in days rather than months for many datapath intensive designs.
Intelligent algorithmic synthesis tools can create all of the parallelism, concurrency and structure in a hardware design, and can even generate interface protocols that closely match the dataflow needs of the design. The result is more efficient design with fewer errors, helping design teams to produce higher quality designs on the most aggressive development schedules.
About the Author: Bryan Bowyer is a technical marketing engineer in Mentor Graphics’ High-level Synthesis Division. In 1999, he joined Mentor as a developer and has been responsible for advances in interface synthesis technology and design analysis. Bowyer has a B.S in Computer Engineering from Oregon State University.