With excellent tools available almost for free from FPGA companies, you might wonder why top notch design teams still pay for high-end FPGA tools from companies like Synplicity. This week, Synplicity helped us out with that question with new improvements to their top-of-the-line synthesis offering – Synplify Premier. Premier is now updated with new capabilities, and — probably more important – support for the industry’s latest, biggest, baddest FPGAs, including Xilinx’s Virtex-5 family, and beta support for Altera’s Stratix III, Stratix IIGX, and Stratix II families.
Synplicity’s Synplify Premier has a technology called “graph-based physical synthesis” that is designed to help you reach timing closure faster – or on more complex designs, to help you get there at all. As we’ve discussed before, delay in FPGAs has changed over the past several years and process generations. Back at the larger geometries, most of the delay was in the logic elements themselves. Because of that, logic synthesis could make a pretty good guess at delay just by adding up the known delays of all the logic elements in a path. Now, however, the majority of the delay comes from the routing paths between the logic elements.
If you’re, say, a logic synthesis tool, this makes it very difficult to predict path delays and correspondingly difficult to generate the right logic modifications to make those delays fit nicely between both all the registers and the corresponding clock edges. The routing delays, you see, aren’t known until after place-and-route has a go at the synthesized netlist. From the synthesis tool’s point of view, that’s way too late.
Physical synthesis puts the horse before the cart – or at least even with it. Placement and logic synthesis are done together, so the delays can be estimated very accurately. Using that information, the tool has maximum latitude to change the design so that it meets timing. At a simple level, critical paths can be placed so the routing runs will be shorter or so that higher-performance routing resources can be used. Getting fancier – logic can be replicated, re-structured, or replaced with faster implementations, based on full knowledge of the timing picture.
Without layout information, timing estimates are only within 20-30% of the eventual numbers – not close enough to make critical decisions like which paths need logic restructuring. Also, replication is most effective when based on placement – putting drivers in close proximity to the loads they’ll be driving. Without accurate placement information, replication is a shot in the dark on improving timing, but the size (and sometimes power consumption) of the design are increased. Also, without accurate timing information, the synthesis tool will often optimize the wrong logic paths, putting unnecessary changes into paths that would have been OK while ignoring paths that will actually have a real problem.
The synthesis and place-and-route problem is truly a technology issue that stems from the organizational chart. Back in the early 1980s, the engineers working on placement and routing algorithms sat together, maybe in a different building from the ones working on logic optimization and synthesis. The result? The logic synthesis and place-and-route parts of the process were implemented as different programs, with different internal data structures, connected only by a structural netlist file. With the advent of physical synthesis tools, we’re trying to retroactively re-unite those domains. Synthesis and place-and-route engineers now even attend some of the same barbecues.
Physical synthesis technology originated in the ASIC world and migrated to FPGAs. As a result, some idio-ASICracies followed it over. For example, in ASIC design, routing varies pretty much linearly with Manhattan distance. If you know the X and Y delta between two ends of a route, you can make a pretty good guess at the routing delay. In FPGAs, however, routing delays are anything but distance-linear. We have “Manhattan subway” routes with super-low impedance that zip long distances across the chip – (in one straight line), and we have “slow” routes that meander around piling on the Z in pretty short order. The net result is that distance is a very poor indicator of delay. This is the reason for the “graph-based” designation added by Synplicity.
Graph-based physical synthesis doesn’t use simple distance as the currency for estimating routing delay. A graph is constructed based on the actual speed of the routing resources available, and the tool does a placement and a global route in order to understand exactly what the physical delay attributes of a logic path will be. The detailed placement information is passed on to the FPGA vendor’s place-and-route software, and the routing is left to a (small) leap of faith (that the vendor’s tool will perform detailed routing the way the physical synthesis tool predicted.) In practice, it usually is right. “About 90% of the routes are accurate within plus or minus 10%,” says Jeff Garrison, director of marketing for implementation products at Synplicity. “The global route assures that routing resources exist for the placement we’re generating, and that gives us much more accurate timing estimates.”
In practice, physical synthesis reduces the number of iterations required to meet timing closure, thus reducing the design cycle considerably. Some iterations, however, are the result of actual design changes rather than trying to clip those last few nanoseconds from a pesky path. Making changes to a design is another historic problem for the interaction between synthesis and place-and-route. In the past, making one tiny tweak to your HDL meant re-synthesizing your entire design from scratch. It was entirely possible (likely, in fact) that the logic synthesis tool would generate a completely different netlist based on your small change, and then the place-and-route tool would make a completely different placement and… guess what? Yep. All those timing paths you worked so hard to close are now gone and replaced with a whole new crop. Welcome to design iteration hell.
As Bryon Moyer discussed in his incrementality article last week, incremental design solutions are now available that address this issue – minimizing the change in the result from small changes in the source. “Synplify Premier addresses the two main reasons that people do incremental design,” says Angela Sutton, Senior Product Marketing Manager, FPGA Implementation at Synplicity. “It offers a quick debug iteration to give you very accurate timing without having to run place-and-route, and it allows you to make small changes to your design without having the results change dramatically. We use path groups – automatic partitions that can be separately optimized — to automatically localize changes to regions impacted by your RTL change.”
The new release of Synplify Premier (9.0) also includes additional enhancements such as a new user interface and additional SystemVerilog constructs. It also includes the usual litany of unspecified performance and quality-of-results improvements we expect with any new rev of a synthesis tool.
We asked about the obvious delay in the release of support for physical synthesis of 65nm devices compared with what we’re used to seeing (immediate support on announcement). Garrison explained, “For this release, we were still getting the general algorithms of the tool up to speed and adding first-time support for many devices. In the future, we expect to have support for new device families much closer to announcement.” Also remember that many device families announced by FPGA vendors over a year ago are not yet (or just now) going into full production. The time from announcement to volume use is still quite long in the FPGA world.
With significant investment by third-party EDA companies like Synplicity in technologies like graph-based physical synthesis, it’s likely that we’ll continue to see a large number of design teams opt for the vendor-independent third-party tools approach. The same economy of scale and concentration of expertise that made EDA a viable industry for ASIC design apparently still applies for FPGAs as well.