A well-worn literary device in any self-respecting bedtime story ensures us that, no matter how evil the villain, there’s a hidden speck of virtue whereby he or she can achieve redemption.
OK, so I’m not going to go so far as to ascribe literary qualities to these pages, lest credibility be forever lost. But we’re going to take one of the up-and-coming anti-heroes of our time and expose the surprising goodness lurking within.
And who might this beneficent ne’er-do-well be? That infamous processing bugaboo, variation.
Silicon fabrication, like all manufacturing, is about applying order to things. Remaking the world according to our specifications. Entropy is the enemy.
Except when it isn’t.
There is one realm wherein randomness is feted with great fanfare: security. The more there is an order to secret things, the more likely it is someone will discover that order and thereby uncover the secrets. The more disorder, the more it just looks like so much fuzz, so much noise, nothing to see here, move right along.
While designers and fab technicians struggle to impose order on an increasingly poorly-behaved set of physical laws to ensure that each chip is identical to the next, there are people out there taking advantage of those few (and increasing) ways in which each chip differs from its sibling.
The solutions to a number of problems could benefit from variation. Ensuring that devices don’t disappear mysteriously from the supply chain. Ensuring that extra devices don’t mysteriously appear in the supply chain. Preventing reverse engineering and cloning. Validating that the software about to run is authorized to run on the hardware it’s about to run on. Checking whether the content about to be delivered is authorized for the machine towards which it’s headed.
And doing all of this without storing any critical secrets in the system itself, since they could be copied or spoofed. And because non-volatile technology for such storage may not even exist below 40 nm.
These techniques rely on physical attributes of devices that are unique to each device – no one can take them from a legitimate device and copy them to another and have it still function. As such, they are referred to generically as “physically unclonable functions,” or PUFs.
There’s one further restriction: this can’t be a completely random phenomenon. While there should be no correlation device to device, there must be perfect correlation within the same device: this function must be repeatable. In other words, results for two devices should be completely different; results for the same device should be completely the same.
This is actually harder than it might seem. The kinds of phenomena that can be exploited are subtle. That means that measuring them can be noisy. You can think of them as a fingerprint for a chip, which is accurate in that it’s unique. But it also shares the trait that, for any given fingerprint lifted, there may be artifacts or dust or bits of bun from the burger eaten an hour before, and you’ve got to get rid of that to get a solid read of the print. So error correction plays a big part, not only in getting this to work, but even in enabling some use modes.
Three companies, Verayo, Intrinsic-ID, and Veratag, utilize three completely different physical mechanisms for their PUFs. Even though they may be used to protect a chip, card, or even a system, it is a piece of silicon that provides the characteristics needed. And while we referred to fingerprints above, “biometrics” is probably the better analogy: one company relies on fingerprints, another on iris scans, and yet a third on, oh, dental records.
It’s incredibly easy to get tied up in knots with this stuff; various claims and counter-claims of differentiation sometimes end up seeming less different than represented. And there are numerous use models, so it’s easy to munge them together and get lost. So… work with me here as we try to tease this stuff out in as cogent a form as possible.
Wake-up call
Intrinsic-ID uses the start-up state of an SRAM as the basis for their PUF, which has been available since 2008. Because of the “noise floor” when reading the SRAM (about 8%), they apply proprietary error correction techniques (which can correct to about a 20% noise floor) to stabilize the result. When a device is first used (which could happen during manufacturing), it is “enrolled.” This process yields an error-correction code that’s used to read the SRAM signature during normal usage. Given this code, called an “activation code,” you just power up the SRAM, read it, perform the ECC magic on it (with the activation code), and you get a stable result. While you have to enroll a device the first time, you can always re-enroll again later.
The activation code is typically stored in on-chip or nearby off-chip non-volatile memory (NVM). The code is different for each chip, so any attempt to take the code from one system and use it in another system will fail; this thwarts cloning. Each device must be individually enrolled, whether during production or in the field. The ECC function happens in their Quiddikey block, a piece of IP that can be configured for many different arrangements.
It’s possible to use existing system SRAM to generate the signature if it’s just going to be checked at power-up. However, that provides less security since system SRAM is typically accessible by debuggers and other methods, so the start-up state could be read out. It’s more secure to use a separate small SRAM – typically 1K bytes – and keep it off the map.
The result of the readout during normal operation provides a secret key that isn’t stored and can’t be interrogated. The key may or may not be known outside the device, but it is not intended to be read from the device. A device can be enrolled with a random key that no one will know, or it can be enrolled with a specific key. In the latter case, the activation code essentially steers the ECC algorithm to take whatever the SRAM readout is and ensure that it calculates the desired key.
With a random key, you can, for example, have a processor encrypt all data that goes into memory. Each device will have its own key that never leaves the system. But no one else needs the key: the processor encrypts the data when writing and decrypts it when reading. Only the processor needs to know how that’s done, and each processor can use a different key.
Where communication has to be more coordinated – say between processors – having secret individual keys can’t work, since all participants need to use the same key. This is where you would program in a specific key on all devices that have to communicate. Again, the activation code will be different for each chip, but they will all share the same key and can therefore use encrypted communication.
Intrinsic-ID points out a couple more interesting examples of how such a key can be used. The first is referred to as hardware/software binding. If an embedded system has critical software that you don’t want proliferated, you can encrypt the image that’s stored in NVM. The activation code is stored in the same memory – that’s critical, since if someone tries to copy the memory, you want the activation code to go with it. The software image can be decrypted and run properly only if the correct key is applied.
You can actually approach this either of two ways. You can do a “bulk encryption” of the software image with a specific key and then enroll each device with that key. Alternatively, you could have a different key for each device, but then you’d have to encrypt each image individually for each system. If you wanted to do this with a secret random key, you could, but the system would have to do the encryption and load the NVM code store itself during manufacturing, since the key would be available only on the system itself.
The other example they give is one they refer to as secure booting. This is similar to the prior case, except that it determines whether the system can boot. In the example they lay out, the unencrypted system software image has a hash-based message authentication code (HMAC) appended to it. On bootup, the system calculates a new HMAC with the hardware key and compares it to the stored HMAC; if they match, the image can be copied to RAM and executed. If not, then the system doesn’t run.
Finally, you can use more than one activation code. Tim Smith, Intrinsic-ID’s marketing VP, points to a set-top box customer that wanted five activation codes so that they could have five different keys. One was for the silicon vendor; one was for the box builder; one was for the operator; and two were for any access vendors that might want to enroll the device.
Muxes and oscillators
Verayo uses asynchronous circuits as the basis for their PUFs, which have been around for about 5 years. There are actually two versions of this: mux-based and ring-oscillator-based. The mux version takes a signal and routes it through two paths, each going through a series of multiplexers, one going to the data input of a latch, the other to the gate. Depending on which gets there first, you might latch a one or a zero. The PUF is made of a bunch of these, with the subtle natural delay variations being such that no two devices will have the same output (with a small noise level). The inputs to the muxes are fed by an input word – the “challenge” word. More about that in a bit, but you can control the result of the PUF by changing the challenge word. Even so, the response to a given challenge word will be different for each device.
The ring oscillator approach fundamentally takes two ring oscillators, subtracts them, and then multiplies by a challenge bit. In actual practice, there’s an array of such oscillator pairs and a rather more sophisticated recombination function performed on the oscillator differences. This phenomenon also depends on race conditions, since the loop delay for each ring oscillator will vary ever so slightly from that of its neighbor in a manner that will be, in the aggregate, unique to each device.
Think of these as a giant pachinko machine, where a bunch of pachinko balls are lined up and fall towards the bottom. If the machine is big enough, then no two machines will have the balls land in exactly the same places.
These asynchronous approaches can be implemented both in silicon chips and on FPGAs (their so-called SoftPUF). They can be used to generate secret keys just as the SRAM-based ones can, but there’s no activation code, and so, presumably, you can’t set up a specific key; you can only [B1] get a random key.
Additionally, authentication appears to be a common use model for these PUFs based on the challenge word capability. By presenting a challenge, you get a response from the PUF that’s unique to the device. You record the response so that, in the future, you can use that challenge/response pair to authenticate that you’ve got a legitimate device. Done during manufacturing, for example, this protects against over-building, since only authorized devices will be properly recorded. Any unknown responses would be flagged as illegitimate.
The challenge and response could simply be used as a permanent handshake, but that could be easily snooped. So what’s more typically done is that many pairs of challenges and responses are recorded. These are stored in a database (which could be located in a reader or in the cloud, depending on the end application). Each challenge/response pair is used only once and then discarded. The number stored ahead of time can vary by application, and, if the pairs are exhausted, it’s possible to generate new ones in the field.
Note that Intrinsic-ID also has a ring-oscillator PUF that has been tested on both Xilinx and Altera FPGAs, but their experience is that FPGA customers have gotten too used to things being free, so it’s mostly military-oriented applications where there is a viable business.
It’s also possible to create a challenge/response system with the Intrinsic-ID solution, but you would build it on top of whatever security you build in; the SRAM PUF wouldn’t specifically participate in the authentication process.
Bridges and cantilevers
Veratag, founded in the winter of 2006/7, goes in a completely different direction: they use a MEMS approach. They’re not building circuits, they’re building Lilliputian bridges and cantilevers (think diving boards) and getting them to vibrate. The idea is that, if you get enough of these going, each unit will generate a unique spectrum by which it can be identified.
This, of course, puts it in a very different category from the prior two approaches. It’s not impossible to put CMOS and MEMS together on the same chip, but not all fabs can (or want to). So this often ends up being its own device with an antenna that’s then read by a reader that processes the signal to generate the fingerprint.
In theory, because it’s a full complex analog wave that’s transmitted, it is the richest of the mechanisms, having infinite content. This can be appealing to those looking for high security, since infinite content provides more ways to discriminate. If used for RFIDs, however, it requires a different reader, and revamping the mainstream reader infrastructure would be a challenge.
It is possible to create a MEMS-based system that can accept a challenge input by using a massive array of disk resonators connected by thin bridges. But this doesn’t appear to have been put into frequent practice; the primary MEMS model is simply to read and process the spectrum.
Areas of contention
This technology seems not to have spent a lot of time on the front pages, and yet it appears to be intensely competitive, especially with the two electrical versions.
The challenge/response system may appear to be cumbersome; indeed, during production you may be filing challenge/response pairs in a database in real time. And during actual authentication, you have to validate the response. So that database has to be accessible; it may be in the reader or it may actually be in the cloud. So there could be an authentication delay. In defense of the system, however, if you’re going to authenticate something, you have to know what the right answer is, so it has to be available somewhere. The only other way to self-authenticate would be to store the right answer on the chip or in the system itself, which would nullify the whole point of PUF-based codes.
Another possible weakness of the challenge/response system is that, if too many pairs are published and snooped, then it might be possible, using some beefy computing, to model the pairs and deduce the underlying structure. In a paper by Mandel Yu, a senior designer at Verayo, he points out that more sophisticated recombination functions in the ring-oscillator version, for example, will make the structure realistically impenetrable. The mathematics here get pretty obscure for those of us not steeped in encryption, but presumably this has all been figured into the IP.
The manufacturability of asynchronous circuit and SRAM PUFs also appears to be debated. Intrinsic-ID says that the asynchronous circuits have to be carefully tuned and balanced for each process in each fab, while any standard SRAM process can be used. Verayo avers, rather, that they use all standard muxes and inverters (and any other components) and that no such test runs or validation have to be performed.
Verayo, on the other hand, says that the 6T SRAM cells have to be carefully balanced. It will still work fine as an SRAM cell if it’s slightly unbalanced, but, if you want a completely bias-free signature, then the cells do have to be extremely precisely tuned. In other words, you want the cells to be right on that top-dead-center tipping point so that the slightest variations will push them to one side or the other. If they’re already tilted to one side, then that side will be a far more likely outcome than the other.
In response, Intrinsic-ID says that all the memory they use has no bias (with the exception of one, where it was done intentionally for power savings) – and they can use software to test any memory to validate that it will work. And, as the process nodes get tighter, the natural variation – normally a problem – swamps out the bias so that it becomes less of a concern. Even so, there are far more memory bits than bits in the key. This means that, through the error correction, any subtle biases can be washed out; essentially, there’s so much redundancy in the memory that the occasional biased bit won’t be evident.
So while others fret about process variation, these guys take it to the bank. PUFs are the lemonade made out of everyone else’s lemons. The evil underbelly of technology is harnessed and put to work for the forces of good.
Ah, just like a good bedtime story. My eyelids are getting heavier already.
(Hopefully yours aren’t.)
More info:
9 thoughts on “A PUF Piece”