Some embedded applications are much tougher, however. There are cases when we need to deliver copious amounts of computing power while remaining off the grid. Last week, at Supercomputing 2005 in Seattle, there was ample evidence of just such compute power gone mad. Gigantic racks of powerful processors pumped piles of data through blazing fast networks and onto enormous storage farms. The feel of the place was about as far from “embedded” as you can get, unless your idea of embedding somehow involves giant air-conditioners and 3-phase power.
Behind the huge storage clouds, teraflop racks, and nation-sized networks, there was considerable embedded computing activity going on, however. Although not its main event, high-performance embedded computing (HPEC) was hanging out at the show and getting a good deal of quiet attention. It seems that not all of life’s difficult problems will hold still long enough for you to ship them off to a supercomputer facility. Sometimes, massive processing power is required to interpret images in real-time, process radio signals on the fly, or solve complicated algorithms from inside a moving vehicle. It’s those applications that put the “E” in HPEC, and several forward-thinking companies were at the show, working to help the rest of us see the light.
To briefly trace the history of embedded systems architectures, we have moved rapidly from systems-in-chassis to systems-on-board, then into system-on-chip (SoC) integration over the past decade. Each time we’ve integrated, our power density has increased as our form factors shrank. Interestingly, today, embedded systems have more in common with supercomputers than with commodity desktop and laptop machines. As we highlighted last week in “Changing Waves,” both supercomputers and embedded computers have hit the wall of diminishing returns on single-thread, Von Neumann processors and have moved into the domain of multi-core and alternative architecture processing.
The HPEC folks have just hit the wall a little earlier and a little harder than the rest of us. Supercomputing in embedded applications is a challenging engineering problem with little wiggle room for tradeoffs and compromises. Three primary solution tracks are in evidence today, one with multi-core embedded processing, one with specialized processors such as DSPs, and one with reconfigurable accelerators. Supercomputing 2005 showed us a rich crop of companies targeting multi-core development, and many of the compiler and OS technologies that serve the massively parallel grids and clusters are similarly applicable to HPEC.
A variety of embedded boards and systems from companies like Nallatech, Starbridge Systems, and Annapolis Microsystems were on display. Most of these combine conventional processors feeding DSPs or FPGA accelerators with generous helpings of memory for caching and fifos, and various high-performance I/O connections to hook up to the outside world. Performance claims and demonstrations on many of these devices were impressive, often rivaling or beating non-embedded supercomputers at the same task.
Unlike the HPC strategy of fitting the algorithm to the hardware, however, the HPEC community tends to fit the hardware to the algorithm. The reasons are economic. A typical supercomputer installation justifies its cost by lending its processing power to as many high-value problems as possible. These problems may be highly diverse, with their only commonality being the need for trillions of CPU cycles. In the embedded supercomputing domain, however, the machine is almost always optimized to solve one specific problem. It doesn’t have to be working on DNA sequence comparisons one day, hurricane forecasting the next day, and seismic data analysis on the third. This luxury allows for some serious specialization, and HPEC designers seldom fail to capitalize on that angle.
Like a race car, the HPEC can be fine tuned for precisely the problem it was conceived to solve. In the extreme, a custom ASIC can be designed for massive hardware acceleration for specific compute-intensive tasks with minimal power and space utilization. If more flexibility is needed, programmable logic devices can be used to provide reconfigurable algorithm acceleration with a slight power penalty (compared to ASIC). In any event, making supercomputing embedded almost always involves some additional acceleration beyond simple multi-core processing.
With any of these acceleration strategies, however, there is a formidable programming problem. Supercomputing 2005 was ready with a number of solutions to those issues as well. Mitrionics was debuting their “Mitrion-C” compiler that takes a C-like parallel programming language and generates a hardware-accelerated executable that can run on a variety of machines from Cray XD-1 supercomputers to custom embedded HPEC equipment with FPGAs. Celoxica showed continued success with their Handel-C environment for hardware acceleration of compute-intensive algorithms, aimed squarely at the embedded high-performance computing area. Starbridge Systems demonstrated their “Viva” graphical language compilers generating re-usable applications to run on a variety of hardware platforms from accelerated HPCs to FPGA development boards.
While many of us may never need the gigaflops of compute power available with HPEC systems, it is still good to see the state of the art push ahead, giving everyone some extra breathing room. Even though we may not need the power today, it takes only a small market shift to turn a compute-intensive incremental feature into a must-have. If nothing else, Supercomputing 2005 showed us that the embedded MIPS will be there when we need them.