One of the more popular board games of the 1980’s was Scotland Yard, a game of co-operation in which each player is a detective… except for the shady “Mr. X,” the villain. Over the course of the game, the team of detectives collaboratively chases Mr. X across the city of London. At various points in the game Mr. X appears to the detectives but then just as quickly disappears again. If the “good guys” are able to work together to execute a containment plan, they can catch Mr. X. If not, Mr. X escapes.
Diagnosing Clock Domain Crossing (CDC) errors in an FPGA design can seem a lot like chasing Mr. X: CDC errors surface and then disappear; emerge again and disappear again.
Such errors pose a significant challenge to the design bring-up process. Diagnosing a CDC error is complicated by its sporadic nature and its tendency to emerge and then disappear after seemingly unrelated changes to either the design or the design flow:
- Upgrading to a new synthesis or P&R tool version
- Switching synthesis tool vendors
- Migrating target technologies
- Insertion or removal of debug components, probes, or other unrelated logic
CDC issues cannot be detected reliably using traditional verification techniques such as static timing analysis and simulation. These traditional tools were originally intended for single-clock designs and they are rarely able to defeat the kind of problems that arise in advanced multi-clock architectures.
Even FPGA prototyping methods will not reliably detect CDC issues. Some design teams rely on FPGA prototyping as the mainstay for verification, but this approach is not suitable to identify all functional issues. Lastly, the age-old method of manually inspecting the RTL is unreliable at best, especially with the increasing number of CDC paths in today’s ever more complex designs.
Why is Mr. X So Elusive?
Metastability is not accurately modeled in simulation, so silicon-accurate metastability behavior cannot be observed in simulation. Static timing analysis ignores paths that cross asynchronous clock domain boundaries. The surfacing of CDC issues depends on technology factors and operating conditions such as temperature and voltage. CDC-related issues may differ from one FPGA architecture to another, from vendor to vendor, or even from one placement to another. Once CDC issues are suspected in silicon, using FPGA probing techniques can also change the characteristics of CDC paths and cause a CDC issue to “disappear.”
In order to catch Mr. X, we must understand his “M.O.” (detective-speak for “Modus Operandi” or “method of operating”). A CDC path is a signal that crosses from one asynchronous clock domain to another. Since the asynchronous transmit signal will inevitably violate the setup and hold timing requirements for the receive register, all CDC receive registers will go metastable periodically. We can use a mean-time-between-failure (MTBF) equation1 to determine how often we’re likely to witness Mr. X in action:
MTBF is determined by technology-dependent coefficients (τ and T0) as well as the frequency of the CDC signal and receive clock. When a register goes metastable, its value is neither a ‘0’ nor a ‘1’ and downstream logic will not see consistent values. The logic may function incorrectly unless it has been designed to tolerate metastability.
How Can We Contain Mr. X?
Effectively tackling CDC errors requires designers and their tools to work together. Only the engineer is familiar enough with the design to understand CDC issues and invoke the appropriate RTL fixes. A good approach combines proven synchronization structures and an appropriate coding style with your synthesis and CDC analysis tools.
Synchronization structures such as a pair of D Flip-Flop synchronizers are among the standard design practices used to avoid CDC issues. There are many types of synchronization structures for mitigating CDC issues and your detective team needs to verify that these are used, and used correctly. Many designers have deployed FPGAs or ASICs only to discover a functional CDC issue in which a path was synchronized incorrectly, or not at all. Fortunately there are tools and techniques that can help designers verify the correct usage of synchronization structures.
When synchronization structures are placed in the code, strategically-placed coding details can simplify CDC verification. For example, including either “SYNC” or “FIFO” as part of the instance name for a synchronizing structure will cause the instantiation paths of all protected clock domain crossing points to contain the keyword. Figure 1 illustrates this approach. Thereafter a quick check of the synthesis tool’s clock domain crossing report can verify that all CDC paths are protected.
Figure 1: Adding “CDCFIFO” to the mnemonics for a synchronizing structure’s instance name will embed the term in the instantiation paths of all protected clock domain crossing points.
CDC Verification Tool s Track Down the Villain
There is another critical tool in the CDC detective’s kit. A dedicated verification solution such as Mentor Graphics 0-In® CDC is the most efficient tool for validating synchronization structures, CDC protocols and reconvergent logic. Figure 2 depicts a sample screen image. When the tool reads in an RTL design, it automatically detects asynchronous clocks, CDC paths and synchronization structures. This structural analysis results will not only identify correct synchronization structures, but also will identify CDC paths with bad or missing structures as shown in Figure 3. Designers will want to review the CDC violations, correct the bad or missing synchronization structures and waive any CDC paths with exceptions. A full-featured verification tool will offer a structured approach to reviewing results and a straightforward way to waiver ad-hoc synchronization schemes and other exceptions; it also will keep track of those waivers for design reviews.
Figure 2: Reconvergence of two or more synchronized signals can result in synchronization problems. When multiple signals reconverge, their relative timing is unpredictable. Logic that receives these signals should account for potential cycle skew.
Figure 3: This screen displays bad synchronization structures such as this one in which combinatorial logic elements are interrupting the path between the transmitting register and the pair of D FF’s that function as synchronizers.
The Truth, the Whole Truth…
Precision Synthesis provides a detailed report for all paths on which a signal crosses from one clock domain to another in the synthesized design. To leverage this report, be sure to define all clocks, specifying the proper clock domains. Then produce the clock domain crossing report after synthesis (shown earlier in Figure 1 which depicts the CDC report for the same design using two different coding styles). Note that with use of proper coding style, CDC protection can be quickly confirmed either by inspection or by executing a simple text-searching script.
Case Closed
Imagine yourself as part of a detective team chartered to identify and avoid CDC errors. Is your team confident about catching Mr. X and all his partners in crime? Or are Mr. X and his gang eluding you again, only to be caught late in your design cycle or found by customers in the field?
As this article has explained, Mr. X is very slippery. He can’t be reliably detected using traditional verification techniques. Designers must confirm in advance that metastability will not cause functional problems in their design logic. Attempting to use CDC design techniques and tools separately will allow Mr. X to elude capture, resulting in CDC bugs in the silicon. Capturing Mr. X requires the attention of savvy designers using the same collaborative, targeted approach a team of detectives might use: proven CDC design techniques, specialized CDC verification tools, and trusted synthesis tools.
1Chaney, Thomas, “Measured Flip-Flop Responses to Marginal Triggering,” IEEE Transactions of Computers, Volume C-32, No. 12, December 1983, pgs 1207 to 1209.