Tell a crowd of nerds that software is coming to an end, and you’ll get laughed out of the bar. The very notion that the amount of software and software development in the world will do anything besides continue on an exponential growth curve is unthinkable in most circles. Examine any industry data and you’ll see the same thing – software content is up and to the right. For decades, the trend in system design has been toward increasing the proportion of functionality implemented in software versus hardware, and the makeup of engineering teams has followed suit. It is not uncommon these days to see a 10:1 ratio of software versus hardware engineers on an electronic system development project. And that doesn’t count the scores of applications developed these days that are software alone.
Yep, it’s all coming to an end.
But, software is one of those five immutable elements, isn’t it – fire, water, earth, air, and code? Practically nothing exists in our technological world today without software. Rip out software and you have taken away the very essence of technology – its intelligence – its functionality – its soul.
Is software really all that important?
Let’s back up a bit. Every application, every system is designed to solve a problem. The solution to that problem can generally be broken down into two parts: the algorithm and the data. It is important to understand that the actual answer always lies within the data. The job of the algorithm is simply to find that answer amongst the data as efficiently as possible. Most systems today further break the algorithm portion of that down into two parts: hardware and software. Those elements, of course, form the basis of just about every computing system we design – which comprises most of modern technology.
We all know that, if the algorithm is simple enough, you can sometimes skip the software part. Many algorithms can be implemented in hardware that processes the data and finds the solution directly, with no software required. The original hand calculators, the early arcade video games, and many other “intelligent” devices have used this approach. If you’re not doing anything more complex than multiplication or maybe the occasional quadratic formula, bring on the TTL gates!
The problem with implementing algorithms directly in hardware, though, is that the complexity of the hardware scales with the complexity of the algorithm. Every computation or branch takes more physical logic gates. Back in the earliest days of Moore’s Law, that meant we had very, very tight restrictions on what could be done in optimized hardware. It didn’t take much complexity to strain the limits of what we were willing and able to solder onto a circuit board with discrete logic gates. “Pong” was pushing it. Anything more complicated than that and we leapt blissfully from the reckless optimism of Moore into the warm and comfortable arms of Turing and von Neumann.
The conventional von Neumann processor architecture uses “stored programs” (yep, there it is, software) to allow us to handle arbitrarily complex algorithms without increasing the complexity of our hardware. We can design one standard piece of hardware – the processor, and using that we can execute any number of arbitrarily complex algorithms. Hooray!
But every good thing comes at a price, doesn’t it? The price of programmability is probably somewhere between three and four orders of magnitude. Yep. Your algorithm might run somewhere between 100 and 10,000 times as fast if it were implemented in optimized hardware compared with software on a conventional processor. All of that business with program counters and fetching instructions is really a lot of overhead on top of the real work. And, because of the sequential nature of software, pure von Neumann machines trudge along doing things one at a time that could easily be done simultaneously.
The performance cost of software programmability is huge, but the benefits are, as well. Both hardware cost and engineering productivity are orders of magnitude better. An algorithm you could implement in a few hours in software might require months of development time to create in optimized hardware – if you could do it at all. This tradeoff is so attractive, in fact, that the world has spent over half a century optimizing it. And, during that half century, Moore’s Law has caused the underlying hardware implementation technology to explode exponentially.
The surface of software development has remained remarkably calm considering the turbulence created by Moore’s Law below. The number of components we can cram onto integrated circuits has increased by something like eight orders of magnitude since 1965, and yet we still write and debug code one line at a time. Given tens of millions of times the transistors, processors have remained surprisingly steady as well. Evolution has taken microprocessors from four to sixty-four bits, from one core to many, and has added numerous performance-enhancing features such as caches, pipelines, hardware arithmetic, predictive execution, and so forth, but the basic von Neumann architecture remains largely unchanged.
It’s important to note that all of the enhancements to processor architecture have done – all they ever can do – is to mitigate some of the penalty of programmability. We can make a processor marginally closer to the performance of optimized hardware, but we can never get there – or even close.
Of course, those eight orders of magnitude Moore’s Law has given us in transistor count also mean that the complexity of the algorithms we could implement directly in optimized hardware has similarly increased. If you’re willing and able to make a custom optimized chip to run your specific algorithm, it will always give 100x or more the performance of a software version running on a processor using similarly less energy. The problem there, of course, is the enormous cost and development effort required. Your algorithm has to be pretty important to justify the ~24-month development cycle and most likely eight-figure investment to develop an ASIC.
During that same time, however, we have seen the advent of programmable hardware, in the form of devices such as FPGAs. Now, we can implement our algorithm in something like optimized hardware using a general purpose device (an FPGA) with a penalty of about one order of magnitude compared with actual optimized hardware. This has raised the bar somewhat on the complexity of algorithms (or portions thereof) that can be done without software. In practical terms, however, the complexity of the computing problems we are trying to solve has far outpaced our ability to do them in any kind of custom hardware – including FPGAs. Instead, FPGAs and ASICs are relegated to the role of “accelerators” for key computationally intensive (but less complex) portions of algorithms that are still primarily implemented in software.
Nevertheless, we have a steadily rising level of algorithmic complexity where solutions can be implemented without software. Above that level – start coding. But there is also a level of complexity where software starts to fail as well. Humans can create algorithms only for problems they know how to solve. Yes, we can break complex problems down hierarchically into smaller, solvable units and divide those portions up among teams or individuals with various forms of expertise, but when we reach a problem we do not know how to solve, we cannot write an algorithm to do it.
As a brilliant engineer once told me, “There is no number of 140 IQ people that will replace a 160 IQ person.” OK, this person DID have a 160 IQ (so maybe there was bias in play) but the concept is solid. We cannot write a program to solve a problem we are not smart enough to solve ourselves.
Until AI.
With deep learning, for example, we are basically bypassing the software stage and letting the data itself create the algorithm. We give our system a pile of data and tell it the kind of answer we’re looking for, and the system itself divines the method. And, in most cases, we don’t know how it’s doing it. AI may be able to look at images of a thousand faces and tell us accurately which ones are lying, but we don’t know if it’s going by the angle of the eyebrows, the amount of perspiration, the curl of the lip, or the flaring of the nostrils. Most likely, it is a subtle and complex combination of those.
We now have the role of software bounded – both on the low- and high-complexity sides. If the problem is too simple, it may be subsumed by much more efficient dedicated hardware. If it’s too complex, it may be taken over by AI at a much lower development cost. And the trendlines for each of those is moving toward the center – slowly reducing the role of classical software.
As we build more and more dedicated hardware to accelerate high compute load algorithms, and as we evolve the architectures for AI to expand the scope of problems it can solve, we could well slowly reduce the role of traditional software to nothing. We’d have a world without code.
I wouldn’t go back to bartending school just yet, though. While the von Neumann architecture is under assault from all sides, it has a heck of a moat built for itself. We are likely to see the rapid acceleration of software IP and the continued evolution of programming methods toward higher levels of abstraction and increased productivity for software engineers. Data science will merge with computer science in yet-unforeseen ways, and there will always be demanding problems for bright minds to solve. It will be interesting to watch.
How would AI create a spreadsheet program?
Or design the network drivers for TCP/IP? Do we just show it the RFC documents to read?
It’s worth remembering that software development is a detailed specification… whether a compiler bakes some of it into a FPGA bitstream or a bunch on machine codes for a CPU/GPU is part of the production.
Isn’t hardware just frozen software anyway? [andy grove]
Then there’s the minor argument about power consumption. FGPAs don’t get anywhere near as parsimonious on power as microcontrollers are. Yes, eventually algorithms in hardware, via FPGA or another method, will get better eventually, but microcontrollers won’t stand still either. The NREs on hardware design will remain higher for the foreseeable future, ie. much of the literature revolves around assumptions of high volume.
Yes, programming (as we know it) is likely to become obsolete. It’s just horribly inefficient programming in general purpose languages like C++ on serial processing hardware. The same goes for designing hardware at RTL level.
The problem is that the academics have completely failed to come up with an easy environment for parallel programming, and the hardware guys are stuck in a 20+ year old design paradigm which software guys won’t use.
Fortunately hardware design looks a lot like the problem of building neural networks, so the CS guys are suddenly taking an interest in how to deal with fine-grained task parallelism.
http://parallel.cc
An interesting outcome in the FpgaC project, was realizing that GCC did such a wonderful job at logic flow optimization, that the real input to FpgaC should probably be the binaries that GCC produced, or at least some pseudo code/binary from a custom GCC backend targeting the FPGA execution environment. A pure logic/data/sequencing/timing description without all the clutter of the C language. And at the same time realizing that the OpenMP variant of C, was even a better HDL … as are the binaries produced that contain optimized parallelism.
I don’t think there is anything sudden about taking an interest in fine-grained task parallelism … we started down that path a half century ago with light weight threads/processes in UNIX from the early days of dual/multiple processor systems … and left that a legacy in POSIX. And ultimately pthreads was a critical part of parallel application design for large scale NUMA machines and their supporting UNIX/Linux kernels.
The problem with fine grained parallelism, is that everything in the global name space (and anything else shared in a thread) requires serialization for sanity and race condition removal. For years, that was something special that few people would wrap their brains around … today it’s much more mainstream, and expected …. with much better language and automated tools support. Thankfully much of that is hidden inside OpenMP, and the programmer only needs to worry about higher level serialization of processes and messages.
ROTFLMA …. There are actually three critical parts to programming, the last two (code/test and deployment) as Kevin notes continue to be highly optimized. Applications which used to require high levels of custom development in the old days, shift into the main stream as commodity products, and become transparently embedded into existing and new larger projects/products that the customer/enduser doesn’t even know about as a specific technology or application. Trillions of lines of code, that are completely transparent to the user and are just part of their iPhone, Car, TV, ALEXA, etc.
Code and test has evolved since the 1940’s from designing bit level sequencer controls (even as crude as the IBM series 400 accounting machine control panel programming I did early in my career), to assembly language in the 50’s 60’s and 70’s, to RPG/Fortran/Basic/Cobol in the 60’s 70’s 80’s and 90’s, to C/Pascal/Algol/SQL in the 70’s 80’s 90’s, to ADA/C++/PERL/Oracle in the 80’s 90’s 2000’s, to Java/PHP/Python/Ruby/PostgreSQL/HTML which drove the Internet explosion in the 90’s and 2000’s. And the march toward richer and higher level development and deployment environments hasn’t stopped.
But the critical first part of programming, often starts well before before the first line is written, and continues concurrently with code/test … and that is the problem definition phase. Determining a need. Determining the right human factors. Determining how the physical pieces, the data, the algorithms, the hardware, the whole application just fits together as transparently as possible so that humans can adopt, assimilate, and use the technology behind the programs. When AI reaches this threshold, we will no longer be using AI, AI if it so chooses, may just be using humans to maintain it’s existence for a short while.
I’m a believer that we do not EVER want to reach that point … because humans will lose.
A more interesting question is when will electrical engineering as we know it become non-existent for mainstream technology evolution. We are already at the point where some simple software (by today’s standards) can allow a non-technical designer to specify design specifications, including fit and form, and a fully automated system could then assemble the low level chip, board, assembly and production design process to finished product. Software automates well know processes … and these are well known processes.
Consider this automated electrical engineering application as something that starts with a developer using well known lego blocks (Arduino, Raspberry PI, and all the shields/HATs in the market, along with a wide variety of sensors, controls, actuators, etc) and allows that developer to specify the lego blocks, and the form/fit of an integrated single board solution, such that the connections, pc board layout, assembly, test are all completed automatically, along with enclosure design, where turnkey finished packaged systems arrive in the mail a few days later. The designer then transfers the software from the lego block prototype to the integrated and packaged product and is up and running.
Iterate this lego block design process down the chip level, along with automated knowledge of power/rf/etc processes.
Iterate this lego block design process/system down to the semi-custom VLSI chip level with all the lego block IP available to a short run, multiple project, maskless fab that can return packaged die on a cost reduced board/enclosure packaged system in a few days/weeks … and transition to medium volume reduced cost production with a 30 day lead time … up to a few ten/hundred thousand packaged systems per month. Software automates well known processes … and these are well known processes.
Kevin states “Your algorithm might run somewhere between 100 and 10,000 times as fast if it were implemented in optimized hardware compared with software on a conventional processor. ” I believe this expectation is completely wrong for 99.99% of applications with critical algorithms.
Amdahl’s Law put’s a serious damper on these expectations … especially with the high core count, large multilevel cache, and integrated on chip SIMD processor solutions available for “conventional processors” today. The conventional processors today come in a wide variety of configuration options, which allow one to fit the available parallelism to parallel on chip resources in these “conventional processors”.
The high degree of foot print and power optimization for these resources in conventional processors, is significantly better today than you can get trying to run most algorithms on an FPGA. There are a few, very few, algorithms that can be made to perform significantly better on a FPGA, and most of those will depend on dedicated IP blocks available to the FPGA that are not available on today’s conventional processors.
Where FPGA’s win, is removing serial memory latency, and replacing with high speed parallel registers and small local memories. And for a few algorithms, replacing long combinatorial logic chains (instruction sequences) with dedicated logic that is a little bit faster. Because ALU and Cache latencies are highly optimized in conventional processors, large local memory (cache operations) are probably faster in a conventional processor, unless there is some form a extreme parallelism available … especially for handling wide (128/265bit) data paths, especially floating point paths. A decade ago when most processors had 32bit data paths, FPGA solutions had a significant advantage … with on processor SIMD solutions backed with wide data paths, not so much these days. And certainly nothing on the order of the bandwidth available to high end GPUs like the current Geforce 1080Ti’s bandwidth that’s available on board and on chip.
Take a parallel programming class at your local university (or online) and learn about classification of parallel algorithms and problems, and how to fit available architectures to the problem.
Learn how to identify the resources and performance of your execution platform(s), and how to specify better platforms for your application.
And learn out to restructure the algorithms in your application to exploit available parallelism. Make those changes while moving to OpenMP extensions to C, and exploring how to use MPI distributed algorithms on compute farms. Learn how to use SIMD instructions in your processors … and use OpenCL and CUDA on GPU optimized hardware.
@TotallyLost – I agree with you on almost all points, particularly the “when will electrical engineering as we know it become non-existent…” as I believe that has already happened on a large scale. Many consumer applications that would have been created by EEs in the past are now implemented with no direct EE involvement. As you point out, these take the form of smartphone apps, specific software on pre-made boards such as Arduino, Raspberry Pi, etc.
Regarding my claim about hardware vs software implementation of algorithms, though, I think you misunderstand my claim. First, my 100x-10,000x claim is about fully custom hardware (ASIC, etc) not FPGA implementation. FPGA implementation steals at least an order of magnitude, maybe more. And, I don’t mean adding a custom accelerator to a larger application running on a conventional processor – with all of the incumbent memory architecture, latency, and other consequences of making the processor/accelerator interaction happen. I’m talking about if your entire application is hardware versus software. But, I agree that few interesting applications these days are small enough to implement that way. The evolution of bitcoin mining, for example, shows a related curve as it went from software to GPU to FPGA to ASIC implementations.
Bitcoin mining is one of the rare “applications” that doesn’t require memory … just a relatively few registers to maintain state as it iterates in the solution path. Real applications have significantly more memory/data, and the interaction of the logic and data has built in Amdahl’s Law serialization that is unavoidable. The exceptions are processing data streams with digital filters and decoders … for communications and radar, and of course computer vision. These data stream processing applications are very very very very rare in the real world, as a percentage of production lines of code.
So rare, that the presumption in the above article isn’t likely. Everyone is having fun with deep neural networks as classifiers, but they behave too unpredictably to be used as core control logic, except for simple systems that also have a better logic solution, that is smaller and cheaper. Maybe different in 10 years, but today that doesn’t seem likely.
In the end, I’m pretty sure the programmers will still have a job, putting everyone else’s well defined routine jobs at risk.
or all the programmers will be lynched in the great socialist intellectual purge that follows from mass unemployment.
It took less for the advanced civilizations from thousands of years ago to be wiped from existence … where only grand buildings and public projects remain, without any other clues about their lives. We have trouble deciphering their stone tablets … I suspect those that follow us will have even a harder time deciphering the bits in our storage devices.
An extinction event? A plague? A war? or even the evolution of a more aggressive predator?
Musing over the future is rather dark … more fun to actually create and build things. Hopefully things that have a positive outcome.