Embedded File Systems: a Tricky Task

“Fast, cheap, or reliable: pick any two” – Racer’s adage

Storing data used to be so easy. You’d MOV or POKE a few bytes into RAM and leave ’em there. If you had to store a lot of data, maybe you’d use an EEPROM, hard drive, or flash memory. Retrieving the data was almost as easy… assuming you remembered where you’d stored it.

Now, we need entire file systems to keep track of all the stored data, even on a little MCU. Plenty of embedded systems run real databases, where storing and retrieving data is a big deal. Everything from the lowly utility meter or thermostat, up through jetliners and communications satellites now use full-on file systems to keep track of all the data they’re juggling.

Still not hard, though, right? Your flash supplier probably gives away free file-system software, so you’re halfway there already. Yeah, about that…

Embedded file systems are much tougher, more complicated, and more expensive than most programmers think. It’s not about organizing sectors, blocks, and write cycles. It’s really about reliability, fail-safes, and damage recovery. That’s not something you hack out in a weekend coding session. In fact, commercial software companies have been working at this for more than a dozen years and still haven’t solved all the problems. Like rowing a boat, it looks really easy until you try it.

Most of us don’t try to invent our own file system hierarchy, but, instead, stick to an existing one, like FAT or NTFS. That has some advantages: You can read and write USB sticks, you can mount external drives, you can accept SD cards, and so on. Commercial developers also lean toward the industry standards rather than inventing something incompatible. Microsoft’s FAT file system has been around longer than many of us, but it’s still popular as the lingua franca among storage media. Sort of the lowest common denominator. The company’s exFAT file system is a lot newer (though still a dozen years old) and was designed for flash media like SD cards.

The FAT format is essentially in the public domain now, but exFAT is still covered by Microsoft’s patents, and it requires a license. Last month, Microsoft published its exFAT specification, but using the file system still requires legal paperwork, and money has to change hands. The exception is Linux, where Microsoft carved out an exemption for political reasons; everyone else still has to pay up.

One big problem with embedded storage is the storage medium itself. Most SD cards and USB thumb drives are cheap and unreliable. One of these is good news for your product and one isn’t. Flash cards are easily available and have standard interfaces, so end users tend to treat them all as interchangeable and generic commodities. In reality, they’re anything but.

Dave Hughes, the CEO at file-system vendor HCC Embedded, says his company tests flash storage devices all the time and finds huge differences in reliability. Their custom test rig finds some failures within a few minutes; better-quality SD cards might fail within a few hours. “If it runs for a few weeks, that’s a good card,” he says.

Naturally, the cheapest cards tend to perform the worst. A top-quality SD card might cost 10x as much as a cheap one but offer measurably better reliability. The problem, says Hughes, is identifying the good cards before you buy them. “Two seemingly identical cards from the same manufacturer with the exact same branding can be very different because of differences in supply chain and semiconductor fabrication.” Unless you’re obsessive about checking lot numbers, you’ll never know what you’re getting. And even then, it’s a bit of a crap shoot, he cautions.

There’s also no way to gauge the quality of the storage medium at runtime. Customers can promise on a stack of K&R manuscripts to buy only the best SD cards, but if some intern slips in a cheap replacement that he found in his desk drawer, all bets are off. Software has to assume the worst.

Even good SD cards fail – anything can break – but the difference is that good ones guarantee their behavior under adverse conditions. When the power fails or the voltage drops, cheap SD cards get crazy unpredictable, but good cards fail a bit more gracefully and predictably – and software drivers can take advantage of that.

Reliability is generally users’ biggest concern, which makes sense. Cost comes in a close second, while relatively few customers worry about performance. Yet even the most robust file-system software can’t protect against a badly timed watchdog reset, power outage, or random hardware failure. That’s why HCC counsels its customers to buy only top-quality storage media, to beef up their power supplies, and to provide an interrupt signal of impending power failure so the file system can do one last “panic write” before the storage medium becomes unusable.

Hughes also cautions that file-system errors can be hard to diagnose. “Developers can be far down the road before they see their first data error or storage problem. And then it’s a long road back to discover that the root cause was a flaky file-system manager from the silicon vendor.” Just because the silicon vendor is reputable and reliable doesn’t mean their free driver software is of equal quality.

True file-system reliability involves the whole system, including power, storage, and software stack. Even Hughes admits that HCC’s software isn’t a magic bullet. Slapping a reliable file system onto an unreliable hardware platform won’t prevent data loss. Like so many other things, it has to be a design concern from top to bottom, not an add-on or an afterthought.

Another problem is abstraction. Operating systems like to treat storage – hard disks, flash drives, in-memory databases, whatever – as abstract objects and let the drivers handle the gory hardware details. That’s particularly tricky for flash media because of their asymmetrical read/write times and their tendency to “wear out” over time. Maximizing an SD card’s lifetime means minimizing the number of write cycles, but that’s tough to do at the file-system level. If the OS requests a write to storage, you pretty much have to comply and write to storage. Buffering only makes the problem worse. If the power fails, you’ve lost everything in the buffer as well as (potentially) the block you were writing. Buffers increase longevity but decrease reliability. Finally, some operating systems stipulate that writes must be sequential, or have other rules that lower-level software must follow but that are inconvenient for flash storage.

Because of that, some file systems come in different flavors tuned for different use cases. If you’re worried about storage lifetime, the file system will do everything it can to minimize write cycles. If you want speed, it’ll use caching and larger buffers. Fail-safe reliability requires the opposite approach, where write transactions are short and immediate. The bottom line here is that the operating system, and not just the underlying file system, has to be onboard with your strategy.

Even our utility meters have file systems now, and in some municipalities they retain data for a decade or more. File management is ubiquitous. But that doesn’t mean it’s easy.

One thought on “Embedded File Systems: a Tricky Task”

Thom Denholm says:

September 19, 2019 at 7:23 am

Thanks Jim for the interesting article. While media can be a large part of the problem, there are file system choices for reliability – data integrity differs from fail-safety. Additionally, detection of a problem can lead to possible solutions – or at least more graceful failures. I’ve collected my thoughts in this blog reply:
https://www.tuxera.com/blog/embedded-file-systems-trickier-than-you-think/

Log in to Reply

Embedded File Systems: a Tricky Task

Related

One thought on “Embedded File Systems: a Tricky Task”

Leave a Reply Cancel reply

featured chalk talk