Chip testing is always a delicate balance between testing enough and not testing too much. In reality, you want to find the “necessary and sufficient” set of tests for best quality at the lowest test cost. That’s a tough thing to get right.
Throw on top of that goal the fact that SoCs and other modern digital chips require automation to generate test vectors. Even if you find that perfect test balance, if you can’t figure out how to craft an algorithm to implement that balance automatically, it becomes an academic exercise.
Mentor wrote an article a couple of months ago outlining their “cell-aware” fault modeling and contrasted it conceptually to stuck-at fault models, but then ran experiments showing coverage and number of vectors as compared to a gate-exhaustive approach. Their conclusion was that the cell-aware approach struck a better balance.
What they didn’t really discuss is how this all works. So here we’re going to delve into the details a bit more, even deigning to run through a simple specific example (courtesy of Mentor).
Let’s start by baselining with some review. A stuck-at fault model is intended to prove that the individual nets in the circuit aren’t stuck high or stuck low. If you have a three-input AND gate and you want to prove that the output isn’t stuck, you can use one vector with one or more inputs low to verify that the output isn’t stuck high, and you can use another vector with all inputs high to prove that the output isn’t stuck low.
While this is really simple and has been automated for years, it leaves out lots of other faults. After all, nets can be shorted – or “bridged” – to other nets, especially with today’s weird lithography artifacts. Or there might be an open in a line. And these shorts and opens can be “clean” or resistive. If they’re clean, then presumably you can detect some static logic problem. If resistive, then you’re more likely to see the circuit establish some stable state, but slowly, so you have to do a dynamic test.
But, given a generic gate, how do you get at these other faults? One approach has been the so-called gate-exhaustive model. In this case, you apply all combinations of inputs to the gates to assure that the output is correct for all cases. For the three-input AND gate, that would mean 23, or 8, vectors instead of the two we identified above.
But for a more illustrative example, let’s look at something a little more complex than a single AND gate: a 3-input mux. To an automatic test program generation (ATPG) tool, it may look logically like Figure 1:
Figure 1. Logical view of mux
From this logical representation, the ATPG tool can derive a set of vectors to cover the stuck-at faults, shown in Figure 2:
Figure 2. Vectors for stuck-at coverage
These vectors are derived from the gate-level primitives – and because of the staged two-input-muxes, they’re not as symmetric as you might expect. But this is just a logical view and involves no actual knowledge of what might realistically be bridged or open in an actual circuit.
That’s where the cell-aware concept comes in. The idea is to use the actual layout to figure out where the real issues might be and ensure that there are vectors covering those issues. Figure 3 shows the layout for the mux. And you can see that there’s one place where two yellow nets come close together, running the risk of bridging (in red). This is but one of many such fault candidates, but it’s the example we’ll follow.
Figure 3. Mux layout showing potential bridge (red)
So how do you automate a way to find such potential bridges – or opens, for that matter? It’s easy for us to do by inspection, but that won’t work in the real world. It turns out, however, that there’s a straightforward way to identify these points. In fact, you already do this: parasitic extraction.
The whole point of parasitic extraction is to identify a) items that are close enough to affect each other capacitively and b) runs of metal or poly that are resistive. Each of those capacitors is a potential bridge, and each of those resistors is a potential open.
So let’s go, first, to the full cell schematic, shown in Figure 4. Here you can see where the bridge in Figure 3 is, although the fact that those two lines are close to each other here is purely coincidental – in other cases, they could be nets that are adjacent in the layout, but more distantly related in the schematic.
Figure 4. Full cell schematic
Figure 5 shows a portion of the full schematic with parasitics included. I’ve isolated only that portion that’s relevant to this particular fault (and I’ve simplified it a bit). You can correlate the various capacitors and resistors with the layout in Figure 3. Each of those capacitors could bridge, and each of the resistors could be open. In our particular example, the capacitor that models the influence between Net B and Net D becomes a 1-Ω resistor (although you could also use a higher resistance to model a resistive short).
Figure 5. Portion of the schematic including parasitics. The bridge is a 1-Ω resistor.
Now the question becomes, what happens with this bridge? What goes wrong, and how can you detect it? And to figure that out, you simulate – at a low level. You have to simulate the real circuit because many times you will end up with two drivers fighting, and you have to figure out which will win. According to Mentor, it took 1500 individual simulations to cover this particular multiplexor cell.
For the specific fault shown above, the tool identified four different vectors that can detect this fault, shown in Figure 6.
Figure 6. Four vectors that can detect the fault.
Now, you might wonder, “why do I need four different ways to detect the fault?” In truth, in a final set of vectors, you don’t: you need only one.
But here’s the deal: all of this characterization and simulation is done once when building the cell library. Those four extra vectors are included as part of the kit.
Fast forward to an actual circuit design using this cell. An ATPG tool will go through and do its standard cell-unaware vector generation. When that’s done, it can go back into the cells to see what special cases need to be covered based on the actual cell layout. If you’re lucky – and your chances are good here – some vector will already be in place that provides the extra coverage, so you won’t need to add any more. That’s why having multiple possible vectors (four in this example) is useful: it increases the chances that you will be fortuitously covered. If it turns out you weren’t lucky, then the ATPG tool picks one of the target vectors to add to the test suite.
It’s because of this targeted addition of vectors – which may not even have to be added – that the cell-aware approach provides a better balance than the gate-exhaustive approach, which simply blasts vectors, many of which don’t do anything useful. Whether it meets the specific goal of both necessary and sufficient is harder to say, but anything closer to that goal is goodness.
What methods do you currently use to cover as many faults as possible with as few vectors as possible?