Once upon a time
Let’s take ourselves back a few centuries, to a time when technology is rather more rough and ready. We still have silicon to work with, but no semiconductor fabs. And yet, being the aggressive can-do species that we are, we decide we want to build a large-scale non-volatile memory.
“Why,” you ask. Well, why do anything? Why build a pyramid? Why build a sphinx? Do we need a reason? We do? Ah… to make the ancient Egyptian VCs happy… Got it. OK: Perhaps to record the scores of the highly-popular possumball tournaments, or something. Huge market; we’re gonna kill it*.
OK, now that we’ve got the business plan thing out of the way, we need to figure out how to do this. Being in the desert or near the Mediterranean, we have a huge resource to rely on: sand. And lots of sand is heavy – by contrast with no sand, which is not heavy. So we could build a grid, and on each x/y intersection, we could either have a pile of sand or not.
How do we know whether there is sand or not at a grid point? By using a scale to weigh it. Bed Bath and Beyond isn’t quite open yet, so we need to fashion something out of the materials at hand. How about a big see-saw thing with a large rock on one end and the platform for the pile of sand on the other? Let’s say that our pile of sand weighs a ton: if the rock is around a half-ton, then the position of the see-saw will tell us whether there is sand there or not. (I frankly can’t remember whether the ancients used the metric or English system, so I’m going to assume they did what we do here in Murca.)
And when it’s time to read the memory? We can just station slaves – er – memory technicians at each scale, and when it’s time to read, they shout out, in serial order, the state of their scale. I’d tell you what they shout, but I’m a bit rusty on my ancient Egyptian or Numidian or whatever they’re speaking.**
So we have a storage mechanism and we have a read mechanism; now we need a way to write to the memory. So… let’s say we have these big carts, and they get pushed to a big sand dune or the beach, and silica technicians shovel sand into the cart. How many shovelfuls? Who knows; simply go till it’s full. And then transport technicians can push them over to their designated grid position. Yes, labor intensive. But this is back in the good ol’ days when the free market truly was free, when “living wage” meant, “if you work, you get to live.”
So this is all great, but over eons, real estate becomes more expensive, and the owners of these big scoreboard memories start to think of how much money they could make by evicting the memories and erecting tourist attractions and T-shirt shops instead. And so they try at least to shrink the memories. The piles go from a ton to a half ton (and the rocks shrink to a quarter ton), and then to a quarter ton, and this continues until shovels are replaced with spoons and increasingly delicate scooping implements.
One super-aggressive guy figures out how to use multiple rocks with one pile of sand so that, instead of just measuring pile-or-no-pile, he can measure one-third and two-thirds piles too. This is like combining two piles into one, further shrinking the memory footprint.
But the limits of scaling are becoming evident. The number of grains of sand starts to matter. A full “cart” (now no more than a slice of reed leaf) is 100 grains of sand, roughly. No one counts them directly, but from experience, the loaders know how much sand to pinch to deposit 33, 66, or 100 grains – typically plus or minus 10 or so grains.
And it’s getting harder and harder to be sure that the counterbalancing rocks are of the right size. The rocks are selected carefully so that the scales are mostly accurate. But the multi-level ones are tougher; the readers sometimes have to measure those twice or have two people look at them to make sure they don’t read it wrong or that a puff of wind didn’t shift the balance.
But as they keep shrinking this thing, they can now see the end coming. The furthest they can go would be three grains of sand for a full load of a multi-level cell. If you were literally counting the grains of sand, it would be easy: look for 0, 1, 2, or 3 grains. But they’re not counting; they’re weighing them. So this is tougher – variations in grain size and rock size toss in lots of uncertainties.
Even with a full load of, say, 10 grains, it’s still problematic. At the 100-grain level, plus or minus 10 grains is manageable. Now plus or minus 10 grains is deadly. And that’s just for reading the memory. How do you ensure that the silica technicians load exactly the right number of grains? They’re not counting either; they’re scooping with micro-shovels. And if any of the grains fall off the cart when being transported to the memory location, then that further screws things up.
Absent a means of determining an exact count of the number of grains of sand, this way of implementing a memory is nearing the end of its time.
Meanwhile, in the real world
Replace grains of sand with electrons, and we find ourselves at a crossroads in the real world. We have devised increasingly clever ways to shove electrons into some place and keep them there. We have concocted mechanisms for firing them through barriers that were supposed to be impenetrable. Who knew that, if an electron walks up to the wall with enough confidence, it can walk right through. Apparently J.K. Rowling knew something about tunneling.
Of course, we’re not counting electrons either. We’re trying to, but we need a proxy mechanism. Instead of weighing them, we measure their electrostatic effects to guess how many there are. And we use empirically-derived voltages and currents to write a reasonably-well-controlled number of electrons into our electron holding cell. And this has worked as long as the number of electrons was high.
And that is no longer the case. We are now dealing with exceedingly small numbers of electrons. The realm of ten electrons is not fiction. And the uncertainties inherent in trying to count whether you’ve stored 10 or 9 or 6 electrons by our usual proxies is also far from fiction.
Which has brought us to an inflection point in technology that feels rather significant, as pointed out in imec’s recent San Francisco edition of their technology forum: “We’re running out of electrons.” Which means we need to find a different way of storing state.
The heir apparent to electro-enumeration is resistance. Instead of stashing electrons, you change the properties of a conducting material so that its resistance tells you the stored value. Simple principle, but how to do this in real life?
We’ve already seen one way: using spin torque transfer in MRAMs. Here the arrangement of magnetic materials and their polarization establishes the resistance. But another mechanism has started looking attractive: resistive RAMs, or RRAMs. The mechanism is different from that of an MRAM, but it’s similar in that a material stack can be altered to change its resistance.
Crossbar has recently announced the first commercial RRAM offering. Their basic approach is well known; their secret “value added” is surprisingly “simple.” The idea is to stack three materials together: electrode layers at the top and bottom and then a middle material called the “switching medium.” Writing to this device involves getting ions to diffuse from an electrode into the switching medium, changing the resistance. Of course, this has to be reversible – we’re not talking about catastrophic avalanche breakdown.
With Crossbar, the bottom electrode is epitaxial silicon; the top electrode is silver; and the switching medium is amorphous silicon. Silver ions migrate into the amorphous layer forming filaments; these filaments can be pulled back (at least during the working life of the memory) when erasing.
These three layers can be placed over anything, including CMOS logic. They can even be placed over each other, allowing stacking in a way that’s much easier to create than 3D NAND memory; you have to etch only one memory layer at a time (rather than doing a deep etch through everything after all the layers are in place).
And they say they do this with no complex physics or chemistry. They didn’t even buy any new equipment. So why hasn’t everyone done this? They say that the tricky bit is keeping the layer interfaces clean – and they’ve patented the ways they came up with to do that. That’s their primary technological contribution, and, at this point, they’re not talking details.
So what?
OK, so we know that they can make these. (Or they say they can; we’re taking their word for it for the moment.) What can these do that other memories can’t do?
- Well, for one thing, you can write to them more quickly than you can to NAND. Like, 20 times faster. Write and erase times are 2 µs (I think that’s at 25 nm).
- They’re byte-addressable.
- They don’t need ECC (although they could add it to tease multiple levels out of a single MLC cell, which could be implemented by trimming the write currents for intermediate states).
- While large-scale NAND devices have endurance of less than 1000 cycles, Crossbar claims to have proven 10,000 cycles with 10-year retention for high-density data storage (1M cycles for embedded).
- They don’t need wear-leveling, although they could get higher density if they used it. (The aging mechanism is that cells get stuck on… which sounds to me like the ions in the filaments get tired of going back and forth and refuse to budge, leaving a filament in place).
- Read latency is less than 30 ns for embedded and code-storage applications (1 µs for data storage).
- Performance isn’t compromised by stacking or by implementing an MLC.
- No disturb issues.
- Two two-bit MLC cells stacked on top of each other give the equivalent of a 1 F2 footprint.
- 56% smaller die size than NAND (at 25 nm).
- CMOS-friendly: low-temp back-end steps for stacking over CMOS.
- They neither need external high voltages nor generate internal high voltages.
- The technology scales to below 10 nm.
- They can do a 1-TB device in 20-nm technology.
- Minimal temperature dependence.
- They claim lower power, although they haven’t been specific on that.
They did a proof of concept by purchasing a microcontroller from TSMC, adding tungsten plugs to establish connectivity to a top layer and then building the memory over it. This was, of course, to prove that this is doable, not to start up production. Their initial business model is to sell IP into embedded applications, not to manufacture memories. (If they were going to build them, they say they could do NOR-sized devices, but they’d want to work with a partner for high-volume production of 1-TB devices – no such arrangement is currently in place.)
But what’s really interesting is what can happen to the increasingly complicated system memory hierarchy. At present, on-chip memory co-residing with a CPU consists of registers and caches (two or three levels) built out of SRAM. At that point, you have to exit the chip and talk to DRAM for byte-addressable random access. But DRAM is volatile, so we need to go to SSDs for smallish permanent storage, potentially going through a Flash cache to speed up access; if super high density storage is needed, then we have to go to a traditional hard drive. That’s a 7-8-layer hierarchy.
RRAM’s performance is better than that of other non-volatile options and, in some cases, could even compete with DRAM. The proposal being that the system memory stack can be simplified: Once you leave the CPU chip with its caches, you then have DRAM – but less of it – and then RRAM. And that’s all. Two less layers in the hierarchy. Which sounds great (unless you’re making the stuff that goes into the eliminated layers…).
If you don’t use ECC or wear-leveling, then the RRAM controller is also much simpler than a NAND controller.
In other words, lots of things get simpler. Which should make the adherents of Occam happy.
Now… I know there are skeptics out there. Plenty of great ideas have come and gone before. Crossbar has lots of proving to do. But if successful, they could give MRAM a run for their money in this new resistive-memory era.
*Subsequent research questioned the possum population in ancient Egypt, but the local patricians were unaware of that, since, unbeknownst to them, their personal pet possums were all imported. So they bought the story. And they never went back to review the original market numbers, so it was all cool.
** OK, so our copy editor asked, “Why can’t someone just look at the piles to see if they’re there? Why all the weighing and such?” (Not her wording, to be clear…) So… OK… how about the dirt has to go in a closed box and each box can be empty or full but you can’t tell by looking at it? So you have to weigh it. Why is the box closed? If I had more time I’d think of a really good reason, but since I don’t, I’m going to invoke parental prerogative: “Because I said so.” Work with me here…
More info:
What do you think about Crossbar’s new RRAM?
Anything to give NAND some competition can only be good for us all.
How about the use of RRAM as programming memory for FPGA’s – implemented in layers above the logic? Of course the structure would have to make the bit states to be always available.