feature article
Subscribe Now

Future Proofing Your FinFET Design

Flex Logix Adds LUTs to your 16nm Chip

Building a custom chip on the latest 16nm FinFET processes is a mind-bogglingly challenging, risky, and expensive proposition. Non-recurring engineering costs have skyrocketed in recent years, and the complexity of doing a design on the latest process nodes is sobering for even the most experienced design teams. Schedules have stretched, and fear of the dreaded respin has ramped up to abject terror as the stakes have continued to rise.

But what if you could hedge your bets? What if you could bring the power and flexibility of programmable logic fabric to the critical parts of your custom design – doing things like accelerating computation tasks, adding ways to customize your design for multiple application scenarios, future-proofing your interfaces, and bringing options for adding or altering custom logic in the field – long after the chip design is done.

Then, you wouldn’t have to wait for that interface standard to be locked down before you started your multi-year design project. You wouldn’t have to do multiple tape-outs for different variations of your design. You wouldn’t have to be quite so worried about a minor logic or specification error causing a respin. And, you might be able to add some ultra-high-performance accelerators that would boost key computing tasks while reducing power consumption. How awesome would that be for everyone involved?

Flex Logix (we’ve written about them before) is an IP company that offers FPGA fabric that you can incorporate into any custom design. Their EFLX cores come in 100 LUT and 2,500 LUT chunks, and you can snap them together to form an almost-arbitrarily large MxN array (from 120 LUTs to 122.5K LUTs), customizing the amount of FPGA fabric you need to fit your application and your available silicon real estate. If you are planning to use the fabric for compute acceleration, you can choose a “DSP” flavored block that includes MACs, or you can select the “all logic” block to maximize your LUTs.

Flex Logix also boasts patented technology that reduces the number of routing resources required to achieve a particular utilization in their LUT fabric, so you can expect a high utilization even on large LUT arrays, an issue that would normally be problematic, because scaling LUT arrays typically requires increasingly large percentages of the area to be allocated for interconnect-related resources. Flex Logix claims that their technology results in a 45% reduction in interconnect area required.

Flex Logix has proven their IP on popular TSMC processes, including TSMC 40 ULP/EF/LP, TSMC 28 HPM/HPC/HPC+, and TSMC 16 FF+/FFC. The 40nm families were rolled out in August 2016, and the current announcement is the debut of the TSMC 16nm technologies. The implementation requires only 5 metal layers for 40nm and 6 metal layers for 28/16nm (remember, this is physical IP, whereas most commercial IP offerings are synthesizable RTL), so adding LUTs to your design shouldn’t add any special process steps or extra masks. 

Of course, large fields of LUTs aren’t going to do much for anybody without a proven tool flow. Flex Logix has partnered with Synopsys to develop support for the popular Synplify FPGA synthesis tool, and they have developed their own place-and-route to finalize your implementation. The tool flow should be straightforward, particularly considering that many design teams are already using Synopsys tools for synthesis on other parts of their design.

From our perspective, acceleration is the killer app for this technology. Most teams today designing custom SoC devices are embedding various high-performance applications processors such as 64-bit ARM cores. But in many application domains, heavy lifting compute tasks can be offloaded into FPGA LUT fabric, sometimes gaining orders of magnitude in performance while drastically cutting power consumption. In other cases, entire algorithms can be executed in the FPGA fabric without powering up the applications processors at all. 

Designing an SoC where arbitrary FPGA-based accelerators can be loaded in at runtime brings a whole new level of performance and power efficiency to SoC designs in areas like embedded vision, cryptography, sensor fusion, software-defined radio, and many others. And, in many of these areas, algorithms are still in a high state of flux, so hard-wiring them into hardened accelerators is not a practical option.

And, putting the fabric on the same die with the applications processor gives some significant power and performance advantages when compared with using discrete FPGAs as accelerators. On-chip connections between processor, memory, and FPGA fabric are far faster, much lower latency, lower power, and less expensive than off-chip interfaces. And all those IOs that you don’t have to use connecting your custom chip to an FPGA can be used for other tasks or can result in a smaller package and simpler board design. 

While there have been numerous attempts in the past to put FPGA fabric onto custom chips, Flex Logix seems to have an approach that is getting traction. Perhaps the combination of cost and performance of current semiconductor processes, and the extreme demands of popular applications in terms of performance and power consumption requirements is causing a perfect storm where embedded FPGA fabric makes both technological and financial sense. 

In numerous cases, custom SoCs and ASICs are seen on boards with FPGAs parked next to them, with the FPGA adding flexibility, performance, connectivity, or sometimes just fixing a problem with the original chip design. If building that FPGA fabric into the chip in the first place makes the design useful over a wider range of applications, makes it viable in the market longer, or brings a level of performance and power efficiency that would otherwise be unattainable, that would be a very compelling use of some comparatively inexpensive silicon real estate.

Leave a Reply

featured blogs
Nov 22, 2024
We're providing every session and keynote from Works With 2024 on-demand. It's the only place wireless IoT developers can access hands-on training for free....
Nov 22, 2024
I just saw a video on YouTube'”it's a few very funny minutes from a show by an engineer who transitioned into being a comedian...

featured video

Introducing FPGAi – Innovations Unlocked by AI-enabled FPGAs

Sponsored by Intel

Altera Innovators Day presentation by Ilya Ganusov showing the advantages of FPGAs for implementing AI-based Systems. See additional videos on AI and other Altera Innovators Day in Altera’s YouTube channel playlists.

Learn more about FPGAs for Artificial Intelligence here

featured paper

Quantized Neural Networks for FPGA Inference

Sponsored by Intel

Implementing a low precision network in FPGA hardware for efficient inferencing provides numerous advantages when it comes to meeting demanding specifications. The increased flexibility allows optimization of throughput, overall power consumption, resource usage, device size, TOPs/watt, and deterministic latency. These are important benefits where scaling and efficiency are inherent requirements of the application.

Click to read more

featured chalk talk

Accelerating Tapeouts with Synopsys Cloud and AI
Sponsored by Synopsys
In this episode of Chalk Talk, Amelia Dalton and Vikram Bhatia from Synopsys explore how you can accelerate your next tapeout with Synopsys Cloud and AI. They also discuss new enhancements and customer use cases that leverage AI with hybrid cloud deployment scenarios, and how this platform can help CAD managers and engineers reduce licensing overheads and seamlessly run complex EDA design flows through Synopsys Cloud.
Jul 8, 2024
37,075 views