In the 1966 movie Fantastic Voyage, a team of doctors and scientists gets miniaturized and injected into the bloodstream of a human patient. They and their yellow submarine navigate past heart valves, battle corpuscles, and swim in tear ducts. It provides an inside look into biological workings most of us never see.
An enterprising Hungarian engineer, Can Bölük at Verilave, has done something similar, probing the obscure nervous system of Intel’s fabulously complex x86 microprocessors. He’s probed for unused and undocumented opcodes in the chips’ instruction sets. And he’s uncovered quite a bit. His code is available on Github.
If you’ve ever programmed an x86 chip in assembly language, your first thought might be, “I didn’t know there were any unused opcodes.” After all, the architecture has a spectacularly complex and richly well-appointed instruction set that’s been growing steadily since the 1970s. Surely every instruction has been done by now? The opcode map must be full to overflowing?
Nope. There are still plenty of holes in the x86 opcode map, and not all of them are actually empty. Some are just… undocumented. Some instructions appear to be deeply buried maintenance functions while others look like Easter eggs, forgotten bugs, or partially implemented instructions that never quite saw the light of day. A few seem to provide remarkably godlike powers that appear to circumvent on-chip security or even to rewrite the chip’s internal microcode.
Bölük didn’t find all of these omissions himself, but he did automate the process of uncovering them, and his approach is as remarkable as it is nonintuitive. It’s a cleverly crafted program that relies on an unlikely ally: speculative execution.
For starters, finding unused opcodes is a lot harder than it might sound. You can’t just stuff memory with every value from 0x00 through 0xFF, execute it, and watch what happens. That’s especially true of x86 processors, which have variable-length instructions and a convoluted system of code prefix bytes, suffix bytes, modes, and internal register designations. Instructions that work with some registers don’t work with others. Some memory modes are supported by some instructions but not others, and so on. It’s unabashedly nonorthogonal.
The same instruction (that is, the same binary encoding) can be interpreted and executed different ways depending on whether the chip is operating in 16-bit mode, 32-bit mode, or 64-bit mode. It can depend on privilege level (CPL0 through CPL3), the number of CPU cores, or the MMU configuration. A given two-byte sequence might not work, but a three-byte variation will. An instruction might work only when it’s preceded by a certain instruction or followed by another instruction. CISC? Yeah, you could say it’s a complex instruction set.
On top of all that, an instruction might be legal but not observable. How do you know a random instruction actually did anything? NOP is a legitimate instruction but has no effect (that’s the point). The CLI instruction disables interrupts, but how do you know?
Conversely, some instructions will have too much of an effect and ruin your exploration. The processor might crash, or counters get reset, or memory wiped out. What if it halts and catches fire? Randomly plugging in opcodes to see what happens is like defusing a bomb by cutting random wires.
Instead, Bölük took a methodical approach. He doesn’t ask the chip to actually execute any suspected hidden instructions. He just wants it to fetch them. Then, he can tell what’s a real instruction and what’s not by observing some subtle side effects.
His trick relies on speculative execution after a conditional branch. As we know, all modern processors try to eke out extra performance by executing a handful of instructions immediately after a branch instruction. Once the branch is resolved, if the chip guessed correctly, then that’s all free work. If the chip guessed incorrectly, all the speculative work is discarded, and any results are canceled as if they’d never happened.
That means that if you can convince the processor to execute a mystery instruction speculatively, there should be no side effects. If it’s an invalid instruction, the chip won’t generate the illegal instruction fault. But if it is a valid instruction, it won’t affect the chip’s behavior or do random and unexpected things. Either way, you’ve tested an unknown instruction without causing problems. It’s the perfect dry run.
Enter Schrödinger’s cat. If there are no observable effects one way or the other, how do you tell a valid but unknown instruction from an invalid one? That’s where some little-known performance counters inside the chip come into play.
Intel’s x86 chips don’t actually execute the x86 instruction set we all know. Instead, they convert the familiar assembly-level instructions (MOV, PUSH, XCHG, STOSB, FXTRACT, et al.) into even more rudimentary micro-instructions. Yes, Virginia, inside every CISC processor is a RISC processor trying to get out. It’s been this way for more than 20 years. AMD’s chips work the same way, although their internal micro-instructions are different.
The conversion from complex assembly instructions to simpler micro-instructions is controlled by an internal microcode engine. In fact, there are two of them, one for relatively simple instructions and one for the really complex operations. This is something that even die-hard assembly coders never see and don’t care about. But, if you dig around enough, you’ll find that there are a couple of internal CPU status registers that report on the activity of these sequencers. And that can tell you a bit about what the CPU is trying to execute – even if it later flushes and disregards the results. Voila! A peephole into the mysterious realm of undocumented x86 opcodes.
Bölük details how his program skips over known x86 instructions and known interactions among those instructions. He also expends a lot of effort removing measurement artifacts. After all that, he discovered more than 50 previously undefined opcodes. He also discovered some unanticipated side effects with some instructions, such as the fact that loading CPU control register CR2 doesn’t serialize as expected (that is, it doesn’t pause speculative execution).
Bölük’s method is a riff on the Spectre and Meltdown bugs, as well as AMD’s recently disclosed PSF vulnerability, but for a good cause. They all leverage speculative execution to force subtle but observable side effects within the processor. More evidence that every action has an equal and opposite reaction.
whatever the technical critique, spectacularly successful businesswise