The Role of High-Speed Parallel I/O
As I/O standards continue to evolve toward serialization, high-speed parallel I/O still plays an important role in specific chip-to-chip applications in which either current serial technologies are cost prohibitive, or legacy demands it.
FPGAs are being used increasingly as programmable SoCs, designed-in as an integral part of the system data path supporting NPU, framer and module- based source synchronous I/O standards such as SPI 4.2, SFI 4.1, XGMII, HyperTransport and Rapid IO. However, these applications require devices capable of performing high-speed I/O translation and processing. How can this level of performance be achieved in an FPGA array?
Although electrical compliance and high-speed signal integrity are required features, these alone do not address the bandwidth issue. The FPGA I/O also must have circuitry to manage and maintain the clock and data relationships of these high- speed signals, as well as provide the gearbox functionality necessary to support the transfer of the high speed I/O data to the FPGA fabric.
This article examines how emerging bit-based dynamic alignment logic has become a critical part of overall system level I/O architecture. For example, this logic is integrated and embedded into every I/O block of LatticeSC FPGAs. As a result, the devices are capable of speeds up to 2Gbps per pin.
Dynamic Alignment and Data Transfer I/O Logic
In addition to the need for I/O buffers to achieve increasing levels of electrical performance, today’s high-speed source synchronous interfaces also present three other challenges for the designer:
1) Managing the data to data skew (word alignment)
2) Managing and maintaining the clock to data relationship
3) Clock domain transfer of these high-speed signals to the FPGA fabric
The data to data relationship (word alignment and deskew) portion is fairly straightforward and can be handled by FPGA logic. However, the delay sensitive clock to data relationship and clock domain transfers are more challenging.
For bit and bus deskew, designers traditionally have relied on methods such as matching bus trace lengths, or on PLLs and DLLs to manipulate the clock signal, eliminating clock injection delay and/or phase shifting the clock some pre-determined percentage of the clock cycle in order to maximize the clock to data relationship. While helpful, these approaches are not sufficient at higher speeds, because their compensation is clock-based and applied globally to all bus signals. These methods also have shortcomings in their compensation, because they are static and don’t account for the delay variations that can occur over process, voltage and temperature. Today’s high-speed interfaces require bit-based compensation due to the increased difficulty of meeting and maintaining adequate setup and hold time margins for shrinking clock cycle times. This issue is exacerbated for high-speed parallel protocols, such as SPI4.2, in which dynamic bit-based alignment and word alignment are key elements of the total system solution.
Figure 1 – Parallel Bus Skew and the Effects of Dynamic Alignment
While PLLs and DLLs can be used to align data and clock, the simplest way to address applications in which the clock-to-data relationship is known is by utilizing an input delay block. For this purpose, the I/O logic block provides the user with a 144-tap (40ps step size, typical) delay block that can be used independently in two dynamic alignment modes.
Bus-based Dynamic Alignment
In this configuration, the input delay block is under DLL control to provide bus-based alignment capability for data rates up to ~600Mbps. This mode (Figure 2) preserves a fixed clock/data phase relationship by aligning the incoming clock and data bus under DLL control. Another advantage of this mode is that it automatically tracks/compensates for delay variations due to process, voltage and temperature.
Figure 2 – Bus-based Dynamic Alignment Mode
Although the bus-based DLL control mode is useful for some applications in which the clock to data relationship is known, it has inherent limitations when it comes to dynamic clock to data compensation for high-speed, source synchronous interfaces. This is because the delay compensation in these modes is applied globally to all bits of the data bus, not allowing for the bit-based accuracy needed in applications above 600Mbps. A different approach must be taken for applications that require source synchronous interfaces running >600Mbps, such as SPI4.2.
Bit-based Dynamic Alignment with Closed-Loop Control Circuitry
For higher speed interfaces, a closed-loop monitor and control circuit is needed that dynamically maintains proper setup and hold time margins on a bit-by-bit basis. For this reason, an alignment mode is required in which the input delay block is used in conjunction with a closed-loop monitor and control circuit embedded in each I/O block, as is the case with the LatticeSC FPGA. This is the only way to ensure proper data sampling in high-speed applications where the clock to data relationship is unknown. Another key advantage of this mode is that it handles process, voltage and temperature compensation not only on the receiving FGPA device, but also the variations on the driving device.
This is the most robust configuration (Figure 3) available in which the user can establish and dynamically maintain the clock to data relationship on a bit by bit basis, providing the resolution necessary to support speeds of up to 2Gbps on a single pin.
Figure 3 – Bit-based Dynamic Alignment Mode
Again, the key to this mode is the embedded closed-loop monitor and control circuitry that can be enabled/disabled or updated under FPGA control. This closed-loop design also allows for tracking and compensating for delay variations due to process, voltage and temperature conditions. Here is an example (Figure 4) that shows how the bit-based dynamic alignment circuitry actually works. The SPI 4.2 protocol is used as a reference because it is a popular, high-speed source synchronous interface that requires dynamic alignment.
Figure 4 – Specified Data Valid Window for Bit-based Dynamic Alignment (AIL)
As seen in Figure 4, the user specifies a data valid window around the clock edge, which establishes a setup and hold time margin in which no transitions should occur. Because this is a closed-loop system, once these settings are made and the window established, the closed-loop circuit will continuously monitor and control the clock-to-data relationship of each bit to ensure no data transitions occur. All that is then needed is a GUI-based tool with which the designer can enter the user-defined data valid window. Figure 5 shows the benefit of the bit-based dynamic alignment circuit and the GUI used to configure the circuit.
Figure 5 – GUI Tool to Establish User-defined Data Valid Window
Gearbox Logic for High-speed Data and Clock Domain Transfer
Due to the high speed of these interfaces, gearbox logic must be utilized to slow these signals to manageable speeds for the FPGA fabric. As shown in Figure 6, the FPGA I/O block provides this gearbox logic for either SDR or DDR interfaces.
Figure 6 – FPGA I/O Block with Embedded Gearbox Logic
On-die clock dividers, in which both the divided and non-divided outputs are phase matched, also are provided to support the clocking requirements of the gearbox logic, alleviating the need to use generic PLL/DLL resources.
Table 1 shows an example of the gearbox functionality. Another feature of the gearing logic is to provide the proper domain transfer of the high-speed edge clock to the lower-speed FPGA system clock, guaranteed across process, voltage and temperature conditions. Although an input example is shown, the gearing logic is available for outputs as well.
Table 1 – Example of Gearing for an 8-bit Bus
Note: x1 gearing is used to ensure the guaranteed
transfer of the high-speed edge clock to the FPGA system clock
Clock Domain Transfers
Finally, the high-speed data must be handed off to the FPGA fabric for further processing. I/O block circuitry is needed to ensure the proper transfer of the I/O data from the high-speed edge clocks to the lower-speed FPGA fabric clocks. To accommodate this clock domain transfer, the SDR and DDR elements have two clock inputs, one for the edge clock and one to clock data on to the FPGA fabric clock. This approach guarantees error free transfers over process, voltage and temperature variations.
Summary
All FPGA I/O are not created equal. Parallel I/O interfaces remain an important part of system-level design/data transfer and demand an FPGA architecture that delivers multi-gigabit I/O performance. Conventional alignment techniques are not robust enough to achieve and maintain this high-end performance. I/O logic is needed that deals seamlessly with dynamic clock/data alignment on an individual bit basis, with clocking and gearing resources that manage the processing and transfer of the high-speed signals to the FPGA fabric.
About the author: Ron Warner is the European Strategic Marketing Manager for Lattice Semiconductor. Previously he was an Applications Engineering Manager at Lucent Technologies/Agere Systems and spent 10 years as an FPGA and software design engineer at Westinghouse Electric and Harris Corporation. He received his BSEE from Youngstown State University in Ohio in 1982.