feature article
Subscribe Now

IPUs – A New Breed of Processor

Machine Learning Platforms for AI

Last year, Jim Turley wondered why we have ranges of different processors. Now I want to bring another TLA to the processor table – the IPU (Intelligent Processing Unit). This is the brainchild of Graphcore, a company that came out of stealth mode last November with the announcement of $30m Series A funding from investors including Bosch, Samsung, Amadeus Capital, C4 Ventures, Draper Esprit, Foundation Capital, and Pitango Capital.

Graphcore is based in Bristol, in the West of England, and, if its management team does not include “all the usual suspects” in the Bristol silicon and parallel-processing hot spot, it certainly includes many of them. CEO Nigel Toon was previously CEO at XMOS (where he is still Chairman) and Picochip, and, before that, he was a founder of Icera, a 3G cellular modem company. These last two were sold to larger companies. The CTO is Simon Knowles, another Icera founder and before that at Element 14 – a fabless semiconductor company created by people from Acorn Computer and the ex-Inmos team at STMicroelectronics. And others in the engineering team share similar backgrounds

Graphcore is targeting the area of machine learning, which, the company argues, is not well served by the existing forms of processors but is going to be an essential tool for future developments in data analysis and artificial intelligence (AI) applications such as self-driving cars. (As an aside, another of the Bristol usual suspects, Stan Boland of Element 14, Icera and Neul, has recently announced FiveAI, a company working on artificial intelligence and computer vision for autonomous vehicles.) Toon says, “There is not an application that will not be improved by machine learning.”

Before we get into detail, let’s look at machine learning. A very simple example is something you see every day – predictive texting in a smart-phone. Your phone arrives with some predictive ability, based on how most people construct words and combine words into sentences. As you use it, it begins to recognise words and combinations of words that you use frequently, speeding up your texts, tweets and emails. A similar approach, only much more complex, is behind the voice recognition software driving Siri and Alexa.

More advanced machine learning is needed, for example, in remote monitoring of older people. As we get an increasing number of older people in the population, they want to remain as independent as possible. This is also, to put it bluntly, cost efficient for society as a whole. Where we used to have their children, or even paid servants, living with them to monitor, we are now, at least in advanced societies, moving to using sensors and wearable devices to provide remote monitoring. Let us assume that the monitoring system shows an increase in pulse rate and body temperature. This could be a sign of distress, but if we know that the person has just cycled back from the local shop, then, as long as temperature and pulse return to normal in a reasonable way, there is no need to worry.

Graphcore argues that intelligence is a capacity for judgement, informed by knowledge and adapted with experience. The judgement is an approximate computation, delivering probabilistic answers where exact answers are not possible.

Knowledge can be expressed as a data model – a summary of all the data previously experienced – and can be expressed as a probability distribution.

In human learning terms – you construct a model of what happened in the past and use that knowledge to predict what is likely to happen next. Of course, as humans, we don’t abstract it like that, but, essentially, that is how it works. For machines, we need to create an abstraction.

The data model of knowledge can be constructed as graphs, with each vertex a measure of the probability of a particular feature and the edges representing correlation or causation between features. Typically each vertex links to only a few others so the graph is described as sparse.

Massively parallel processing is commonly used in applications with graphs, allowing work on multiple edges and vertices at the same time and is a clear choice here. What is unusual is that the resolution of the probabilities is very small – only when these are aggregated is there a higher resolution output. Calculations are carried out in small words – often half-precision floating-point – so we are looking at low-precision data in a high-performance computing environment – very unlike traditional high-performance computing.

As machine learning is still an early-stage technology, the detailed models and the algorithms for processing them are still evolving. Today, people are turning from CPUs to GPUs when developing new approaches to machine learning, but these are still expensive, and, compared to the speeds needed for machine learning to take place in real time, are at least two orders of magnitude too slow. Microsoft is using FPGAs in its exploration of machine intelligence, with the argument that they need the flexibility to change as they gain greater understanding of the issues. (They are reported to be a major user of Altera FPGAs, and this was one of the drivers behind Intel’s acquisition of Altera last year.)

Google has taken the route of developing a custom ASIC, the Tensor Processing Unit (TPU – yet another TLA) for its machine-learning applications using the TensorFlow software library. The TPU is an accelerator that is used in machine-learning applications alongside CPUs and GPUs. (And it was used in the system that beat the Go master Lee Se-dol.)

Graphcore calls itself a chip company, based around its IPU. But it is offering more than that – it is offering the Poplar development framework that exploits the IPUs. Within Polar are tools, drivers and application libraries. It has C++ and Python interfaces, and there will be seamless interfaces to MXNet – an open-source deep-learning framework, which has been adopted by Amazon – and TensorFlow, the Google software that is also available as open source.

The IPU itself has been optimised for massively parallel, low-precision floating-point compute, and so it provides much higher compute density than other solutions.

Like a human brain, the IPU holds the complete machine-learning model inside the processor and has over 100x more memory bandwidth than other solutions. This results in both lower power consumption and much higher performance. 

During 2017, Graphcore will be releasing the IPU-Appliance, which the company is aiming at data centres, both corporate and in the cloud. It aims to provide an increase in the performance of machine-learning activities by between 10x and 100x compared to today’s fastest systems. The roadmap, discussed only in very broad terms, looks at downward scaling through the IPU-Accelerator, a PCIe card to improve server-based learning applications, and eventually to moving into edge devices to carry out learning at the edge of the IoT.

While Graphcore came out of stealth only last year, CTO Simon Knowles has been working on approaches to machine learning for over five years, and the team began to be assembled over two years ago. In that time, they have had conversations with a lot of AI players, and there are strong hints that there are serious engagements as soon as the IPU-Appliance is ready for shipping.

Artificial intelligence can be compared to a gold rush, like that in California in the late 1840s. There is considerable investment and much hard work to be done. Some players will be successful and get a great return; others will be left by the wayside. However, many of those who did well out of the gold rush were not the miners, but those who supplied the tools to carry out the prospecting – picks and shovels, in the main. Graphcore’s mission is to supply the picks and shovels for the artificial intelligence gold rush.

4 thoughts on “IPUs – A New Breed of Processor”

Leave a Reply

featured blogs
Dec 19, 2024
Explore Concurrent Multiprotocol and examine the distinctions between CMP single channel, CMP with concurrent listening, and CMP with BLE Dynamic Multiprotocol....
Dec 20, 2024
Do you think the proton is formed from three quarks? Think again. It may be made from five, two of which are heavier than the proton itself!...

Libby's Lab

Libby's Lab - Scopes Out Silicon Labs EFRxG22 Development Tools

Sponsored by Mouser Electronics and Silicon Labs

Join Libby in this episode of “Libby’s Lab” as she explores the Silicon Labs EFR32xG22 Development Tools, available at Mouser.com! These versatile tools are perfect for engineers developing wireless applications with Bluetooth®, Zigbee®, or proprietary protocols. Designed for energy efficiency and ease of use, the starter kit simplifies development for IoT, smart home, and industrial devices. From low-power IoT projects to fitness trackers and medical devices, these tools offer multi-protocol support, reliable performance, and hassle-free setup. Watch as Libby and Demo dive into how these tools can bring wireless projects to life. Keep your circuits charged and your ideas sparking!

Click here for more information about Silicon Labs xG22 Development Tools

featured chalk talk

Machine Learning on the Edge
Sponsored by Mouser Electronics and Infineon
Edge machine learning is a great way to allow embedded devices to run applications that can collect sensor data and locally process that data. In this episode of Chalk Talk, Amelia Dalton and Clark Jarvis from Infineon explore how the IMAGIMOB Studio, ModusToolbox™ Software, and PSoC and AURIX™ microcontrollers can help you develop a custom machine learning on the edge application from scratch. They also investigate how the IMAGIMOB Studio can help you easily develop and deploy AI/ML models and the benefits that the PSoC™ 6 Artificial Intelligence Evaluation Kit will bring to your next machine learning on the edge application design process.
Aug 12, 2024
56,208 views