When we first start learning math (or, for those across the pond, “maths” – however many of them there are), we learn about amounts. Simple numbers that describe how much of something there is at a given time. But when we grow up, we start to think about how fast those numbers change, and we enter the bewildering world of calculus and the first derivative. (Through the unfortunate mechanism of epsilons and deltas, which immediately confounds all but the most analytical folks and gives the whole thing a bad name… but I digress.)
A few years ago we took a look at Teklatech’s Power Shaping technology. Originally associated with their floorplanning focus, it evolved to be their primary purpose: to reduce the amplitude of noise on the power rail. They say that such “rail-aware” analysis is now a standard thing. So we’re done, right?
Nope. Efforts to date have focused on the amplitude of power noise, and the effect of the tools is to lower that amplitude. A good thing, yes. But apparently it’s no longer enough: now the slope of the noise events is an issue. We’ve grown up and graduated to the first derivative.
The power shaping concept is about rescheduling clocks so that they’re not all happening at the same time. The more items in your circuit that clock at the same time, the more noise you create. In fact, if your CTS tool creates the perfectly balanced clock tree, then everything will clock at the exact same time. Once upon a time, that might have been a good thing; no longer.
Obviously, that extreme is unlikely – if for no other reason than all of the tweaks for hold time that end up sliding clocks around on a local basis. But you can still have too many clocks hitting at the same time, and this is where the “shaping” comes in. By going in and further “randomizing” the clocks within the permissible window, you spread the energy around and lower the peaks of the noise spikes.
This next step, which Teklatech calls “pulse softening,” is about how those clocks are distributed within a given window of time. It may not have been strictly random before, but how things distribute matters. My own initial intuition was that you’d want to space the events evenly throughout the window. And… it’s not that simple. Ideally, according to Teklatech, you want a sinusoidal distribution – sparse near the edges of the window, denser in the middle. Ideally. And still… it’s not that simple in reality.
In fact, the full practical solution isn’t derived purely analytically. There are heuristics involved, and, even then, there are still many options that might work. So they have to explore that complete design space to figure out which configuration is best. The analysis runs quickly enough that this can be managed in a reasonable time. “Reasonable” being a relative thing, of course. Designers are using this at the block level, not for a full chip. A small block can run in a few hours; a block in the million-cell range will be an overnight thing. The runs can accommodate multi-mode/multi-corner and different use cases.
Of course, you’re thinking, what’s the cost? Moving clocks around normally means adding buffers, which take up space. Teklatech sees overhead typically in the range of 0.1-0.5% (lower than their initial estimates a couple of years ago).
But in some cases it can actually cause a die size reduction. How the heck can that be? Turns out there are a number of benefits that accrue to someone doing this, and some of them can result in a smaller die. (Slightly… this isn’t going to change your business model, people… and you can’t count on it.)
For example, at really aggressive nodes, power density is a bigger issue. So more metal is made available for power, and that metal has to get to all the places that the power is needed. So those fatter lines are taking up space that signal lines could otherwise use. That means the die area has to be fluffed out a bit to provide extra room for the fatter lines and the signal lines. Which makes the die larger.
By running the analysis, you can find out which parts of the layout have more IR drop margin, and you can tighten down those metal lines and cinch things in – which could save some area.
In other cases, you might have a couple of unbalanced legs of a clock tree, and the normal CTS approach would be to balance them by adding buffers. The analysis may tell you, in fact, that you shouldn’t balance them – that the imbalance serves the scheduling purposes of the pulse softener. So you don’t need to add a buffer. Each buffer you don’t need either reduces space or helps to compensate for those other places where you are adding buffers.
Another area where this analysis can help is for improving timing-related yield issues arising because of variation. The two primary sources of yield trouble are gate length variation (which Teklatech doesn’t do anything about) and dynamic voltage drops. By smoothing out the rail, you’ve reduced a source of variation. You’re welcome.
Once the analysis is done, it’s technically possible that you might decide that things have been over-optimized, and you can go back and dial down the aggressiveness of the tool. The loop goes back only to the CTS step, so it’s not a huge redo. But they say they’ve never seen anyone actually do that.
Now, those of you thinking ahead will note that the chip package plays a huge part in the whole power scheme. The tool at present doesn’t include the package – largely because, when you’re designing at the block level, you may not have access to the package data. Or the package might not even have been selected yet.
They do have plans, however, to incorporate packaging into the analysis in a future release. They can also work with 2.5D and 3D packaging – in theory. There have been no requests for that from their customers or prospects, so it’s not there today.
For the time being, the tool reflects that transition to adulthood; it’s graduated to the first derivative.
More info:
Are you one of the designers that needs second-order power noise management? What applications or other drivers have made this an important issue?