“Let me tell you why you’re here. You’re here because you know something. What you know you can’t explain; you feel it. You’ve felt it your entire career – that there is something wrong with FPGA design. You don’t know what it is, but it’s there – like a splinter in your mind, driving you mad. It is this feeling that has brought you to this article. Do you know what I’m talking about? Tabula’s Spacetime architecture… do you want to know what it is?…You take the blue pill, design some cool 100Gig gear with Tabula’s ABAX2 devices, and you believe whatever you want to believe. You take the red pill, you stay in Spacetime, and we talk about how deep the rabbit hole goes.”
With apologies to “The Matrix,” sometimes a new product comes along that bends the brain a bit, challenges your built-in assumptions, and inspires new ways of thinking about old problems. Or, in the case of Tabula’s new ABAX2 devices – new ways of thinking about new problems, like “How the heck am I gonna get this 100Gbps packet processing application to work within my power and cost budget?”
Many of you are asking yourselves that very question right now. Your best answer so far is probably to use one of the latest high-end FPGAs. You may have even done an evaluation of some of those already. What you probably found is that it won’t be easy. You’ll be pushing the parts to their limits. You’ll have a challenging design project where the usual issues dog you – timing closure, tool runtimes, placement and routing, signal integrity, expensive parts – the list goes on and on. Your project will, too.
If you took the blue pill, just know that Tabula is now one of those “high-end FPGA” suppliers. You can add them to the list along with Xilinx, Altera, and (most recently) Achronix. All of these companies make programmable parts that could tackle your 100G (and other similar) design challenges. All of them have robust tool flows, capable silicon, and cutting-edge designs. If you just stop here and view Tabula in that lineup, they’ll probably fare quite well – depending on your design goals.
But the story goes much deeper than that.
We’ve talked about Tabula many times before. When their innovative “Spacetime” architecture was first disclosed, when they shipped their first parts, when they rolled out their tool suite, and, most recently, when they cut a deal with Intel to fabricate the latest version of their parts with Intel’s 22nm Tri-Gate (FinFET) technology.
Now, they’re selling those Intel-made parts, along with some robust reference designs and tools, and the combination of process advantage and the innovative architecture produces a solution that is quite compelling. How compelling? Well, if you’re working on the aforementioned 100G infrastructure (or, as we said, “similar design challenges”), Tabula claims they can, for example, put four 100G streams on a single chip, create a search engine capable of supporting 100G packet traffic, or make a 12x10G-to-100G bridge. Have you tried any of those tricks with your average FPGA? Tabula includes all three of these designs in their reference design suite.
That’s pretty bad-ass, and you could stop reading right there – confident that these devices are worth checking out for your next project.
But we’re just getting started with our look under the hood and at the implications of Tabula’s technology. You see, if you think of Tabula’s devices as “FPGAs,” you’re safe. You can make designs that have a bunch of LUTs, and you can do all the usual stuff with those LUTs, and you can live happily in the FPGA world without going any deeper. In “The Matrix” you would be taking the blue pill. However, if you choose the red pill, the one that bursts you out of the Matrix metaphor and allows you to see the actual, physical truth, get ready for the mind-bending part we mentioned earlier.
Tabula’s devices don’t actually have the LUTs you’re using. Not in a physical sense, anyway. They exist in a virtual world, and you can use them like real, physical LUTs, but if you got a microscope and went scouting around the device looking for all those LUTs you programmed – you’d come up WAY short. This is nothing new in the FPGA world, actually. Xilinx and Altera have been having us use LUTs that weren’t there for years – using an advanced marketing technique known as “outright lying.” OK, that’s not quite fair. Actually, they simply adjust the number of LUTs they report to be the number of 4-input LUTs that would approximately equal the number of wider LUTs they have actually fabricated on their chips – and then they round that up to a number slightly larger than what their competitor’s datasheet states.
Tabula takes the LUT thing to a whole new level, though. How much of a new level? Try twelve times the number of actual, physical, LUT-like devices on their chip. This, oddly, is not cheating. For metaphor stage one – think of Tabula’s devices as 3D FPGAs. Pretend they have the usual grid-like structure of LUTs or logic cells, but that they then have twelve vertical layers of these – like floors in a building. A typical LUT has neighbors on its left, right, front, and back – as usual, but also above and below. This is accomplished by time-multiplexing each LUT – at a very high frequency – around 2GHz with these devices. This time-domain-multiplexing is completely hidden from you as the user, and it is hidden from most of the design tools. Synthesis and place-and-route think of Tabula’s devices as having three physical dimensions, and the metaphor completely works.
In reality, each of Tabula’s LUTs has twelve different versions of its configuration memory. With every cycle of the Spacetime clock, the LUT is reconfigured to the next state. So, during one full cycle, each LUT can act as twelve different LUTs – one for each “vertical” layer of the virtual fabric space. The Spacetime clock is invisible to you as the designer. You design based on a much slower clock – the one you’re working with in your RTL. Your clock may be running at 100MHz or 500MHz (it doesn’t really matter), but the Spacetime clock is behind the scenes, cranking away at 2GHz. For those of you who read our previous articles, you may notice the numbers have changed. We now have twelve configurations (Tabula calls them “folds”) for each LUT instead of eight. The Spacetime clock is now operating at 2GHz instead of 1.6GHz. Moore’s Law (and specifically Intel’s 22nm Tri-Gate process) has allowed Tabula to crank up the gain on their Spacetime architecture.
The extreme cleverness in Tabula’s design pivots on this concept of converting the traditional time-and-space problem of FPGA routing, scheduling, and timing closure – into a space-only problem involving placement and routing in three dimensions. Many nice properties fall out of this time-to-space transform. At a conceptual level, as our devices have climbed the Moore’s Law tree, routing delay has become an increasingly large component of the total delay. Reducing wire length has become the most important tactic in reducing delay, and that means that placing connected objects close together is critical. When you can place objects in three dimensions, your options for “nearness” expand exponentially.
If you look behind the scenes, it’s easy to jump to some incorrect conclusions, however. First, things that might look to you to be asynchronous are actually synchronous in Spacetime. A purely combinational sequence of operations involving LUTs in different folds is actually synchronous. Values are latched by the Spacetime fabric between folds. A connection between two vertical LUTs is actually the shortest/fastest connection. Second, you might think that all that stuff being clocked at 2GHz would push power consumption into the stratosphere. This is also not true. If you think about it – the number of transistor toggles involved in any given logic operation will be about the same. The fact is, Tabula has a smaller number of transistors than a big FPGA, but they’re working much harder. Yes, there is some overhead to the Spacetime machinery, and that consumes power. But, the fact of having an order of magnitude fewer physical transistors on the chip for any given function means much lower leakage current. The power equation comes out basically a wash.
One pleasant surprise is the effect on timing. Timing closure becomes much easier. Instead of having to align your RTL clock to precise boundaries, the faster Spacetime clock provides something like a “fudge factor” for timing closure. If a signal is arriving too late for one of your clocks, the tools will just slide that event to the next Spacetime clock cycle. Your RTL can be retimed without moving your actual user register boundaries. That means no more manual retiming of RTL because you have too many combinational elements chained in one cycle. The Spacetime clock lets you shrink fast cycles and expand slow ones. You control the Matrix. You can freeze those timing-closure bullets in mid-air. It’s pretty cool.
Over the long haul, one can design IP blocks that specifically take advantage of Spacetime – and in certain cases you can get spectacular benefits from that. Regular FPGA IP will work just fine, but if you tap into the Spacetime architecture, you can sometimes do extra-clever tricks that dramatically improve the performance of your function. For example, Tabula’s memories, because of the Spacetime folds, can act as up to 12-port memory – completely for free. Think of the design things you can do with free 12-port memory.
Ironically, one of the issues Tabula faces is communicating with the outside world. When you spend so much time in the Matrix, it’s hard to connect back to reality. Tabula people, having spent so much time thinking about the implications of their architecture, can end up way off on the horizon to folks that are just coming into their world from conventional FPGA design. Spacetime breaks so many of the “normal” assumptions that we have become accustomed to in logic design that it really does benefit one to take the red pill and dive down deep. Of course, if you don’t have time for that, just stick with the blue pill, design your 100Gbps gear with ABAX2, and be happy that it’s fast, easy, and cheap.
Tabula’s tackling 100G applications with their new Intel-fabbed ABAX2 devices. How do you think they’ll stack up against the competition?
I read through this article now a few times. This is the first time that I have the impression a white paper was converted without the writer understanding all of the background – unfortunate. Please review and update the article. At 2GHZ switching frequency I probably have to bring my own power soure – no value given.
Hi Juergen,
I’m not clear on what you’re saying. Do you think the article is inaccurate? I think I’ve got a pretty good handle on how Tabula’s architecture works, but I’m sure mistakes are possible… What do you think should be changed?
Kevin
Kevin is correct that this is a programmable logic device that runs at 2 GHz.
Look into the long list of comments on “Dimensional Breakthrough”, and you will see how crazy people were 3 years ago. I was crazy too then.
When I first read the article, I thought it was another SF written by Kevin. At that time I thought 1GHz asynchronous FPGA as Achronix’s was understandable, but not a time-space one. Now I can understand it.
Let’s wait and see what the real “Movers and Shakers” can do in this War.
Kevin,
You don’t really need to puncture the time/space continuum to perform 100G Ethernet processing with FPGAs. Today’s Xilinx Virtex-7 FPGAs and 3D ICs will do the job. Xilinx demonstrated 4x100G and 400G applications in Anaheim last week at OFC (http://j.mp/YWIb0O) and announced 100G packet-processing SmartCORE IP earlier this month (http://j.mp/XlkBj5).
–Steve Leibson