April 22, 2008
Getting Back to a Simpler Life
A New Multicore Communications API is Ratified
Thanks to a late arrival, I walked briskly through the jetway and out into the terminal. Glancing out through the window, I confirmed, to my relief, what the map in the airline magazine had shown: my connection, while technically in another terminal, was in fact quite close. I could see it from here, although I was separated from it by about 40 yards and two layers of glass. But as I hurried in that direction, I realized that you could only get from one terminal to another on some tram-like affair; you couldn’t walk it without going back out through security. With a sigh of annoyance, I quickened my pace past six gates to where the tram station was. Which was when realized it was one of those trams that circles in one direction along its route. If I was in luck, mine would be the first stop. I wasn’t in luck; the tram went the opposite way, so my stop was the last stop. Aligning all my chi to will the tram faster, I got to my stop, dashed out the door, skipped steps down the escalator, and vaulted towards the gate. Just in time. In 20 minutes I had managed to travel a net 40 yards.
There are many mechanisms for broad travel, connection, or communication that suffer when the full scale of the mechanism isn’t required. In a computing environment that uses multiple processors, inter-process communication (IPC) is required to ensure that all of the processes can communicate necessary status and data with each other. Most such protocols need, for generality’s sake, to account for the fact that the processors may reside in different chips or boxes, or even different cities. The kinds of protocols used for this include TCP/IP, TIPC, and others. There are several assumptions that general-purpose protocols must make:
- The connection between processors may fail, especially if “the cloud” is involved in carrying messages long distances.
- The interconnections between nodes may change during system operation.
- The tasks and workload may change during system operation.
- Bandwidth between processors is limited.
- Memory usage and performance are of less importance than insuring that the above issues are handled properly.
Protocols like TCP/IP are stuffed full of requirements necessary to ensure that broken connections, topology remapping, and task reallocation can be accommodated reliably in the context of limited bandwidth.
In an embedded multicore application, however, the situation is pretty much exactly the reverse: physical connections will generally break only as a result of a catastrophic chip failure; topology and workload are typically static; bandwidth is high; and memory and performance are limited and precious resources. Trying to use TCP/IP, for example, to do multicore IPC would absolutely overwhelm the system with the overhead required to handle events that will never occur. It’s the equivalent of requiring a Washington DC Metro system to be built in order to handle the transportation needs of a small three-block-square Midwestern town. The situation cries out for a protocol better suited to the extremely tightly coupled arrangement that multicore provides.
The Multicore Association was formed to address issues like these, and the first actively-pursued project was the development of an IPC protocol suitable for embedded multicore. The philosophy was to construct a basic protocol that didn’t require fancy mechanisms but allowed more sophisticated capabilities to be built on top of it. The results of these efforts have recently been ratified as the first release of the Multicore Communications API (MCAPI), which should be publicly available for download in May. The MCAPI is intended to provide a simple low-overhead means of allowing the cores in a multicore processor to communicate with each other. An implementation will provide runtime MCAPI services, allowing an application developer simply to use the API to gain access to those services.
Using the API, you define a multicore topology in terms of nodes and endpoints. The arrangement is defined at compile time and is static, so there’s no name service or discovery in the protocol. A node is essentially an independent thread; it could be a process, core, specific thread, or any other element that essentially has its own program counter. Ultimately, communication happens between nodes. The fact that a node is this general means that applications using MCAPI can be easily ported between different systems; in one system, two nodes may execute within a single physical core, whereas on another system they may execute on different cores. The MCAPI implementation handles the differences; the application simply knows it’s sending a message to another node, regardless of where that node physically resides.
Endpoints provide greater granularity for various types of communication to and from a single node. Each endpoint is associated with a node and can be given attributes like priority and buffer size. A node can have multiple endpoints; for instance, there may be a pair of low-priority endpoints for sending and receiving, respectively, and a pair of high-priority endpoints. A specific endpoint is referred to by its node ID and endpoint ID; this is analogous to specifying an IP address and port number, except that the endpoint is resolved at compile time.
MCAPI allows you to communicate in one of three ways: connectionless, packet, and scalar. Connectionless communication is intended for delivery of messages that are likely to be infrequent and discrete. They carry higher overhead, since they have to specify the endpoint and buffer information, and the path from start to end has to be figured out at the time the message is sent. It’s like an individual sending a FedEx package; you have to fill in the shipping information manually and figure out how to pay for it.
For more frequent communication, you can reduce overhead by setting up a channel; the packet and scalar modes use channels. You define a channel once by specifying sending and receiving endpoints. Once a channel is set up, the endpoint and buffer characteristics are known, so that information isn’t needed in the header. The route is also figured out at the time the channel is created, so no route discovery is needed when the message is sent. It’s like having a FedEx account and pre-printed shipping labels, so you don’t have to go through all that work for each shipment. Traffic can go in only one direction; for duplex traffic, two channels in opposite directions would need to be created. The two endpoints must be compatible for a channel to be created; that philosophically means that things like priority need to be the same, but in practice, for the time being, there is no strict definition of compatibility, and this test is left to the MCAPI implementation.
The packet mode allows delivery of arbitrarily-sized messages in a first-in first-out manner for the receiver. The scalar mode allows a single data chunk of fixed size to be delivered, eliminating the need for a header. While it may be convenient to set up and tear down the channels at initialization and termination, respectively, there’s no such requirement; they can be created and destroyed at any time.
For the time being, given the focus on keeping things simple, all message delivery modes are unicast – one source to one destination. Additional code could be written above an MCAPI implementation to provide multi-cast (one source to several specific destinations) or broadcast (one source to all possible destinations). This allows the overhead for such behavior to be eliminated from the basic protocol itself, being used only if truly desired in the system and only when actually needed. Explicit API support for multicast messages is being considered for a future revision.
Of course you’ve got to have space to store the messages for sending and receiving. The API allows you to manage the buffers and queues needed for holding send and receive data. Message buffers are allocated by the sending application; channel buffers are allocated by the MCAPI services. MCAPI copies message or packet contents from the sending buffer to the destination buffer. Any headers required in communication will be marshaled by MCAPI, but the payload of the message is marshaled by the application. Zero-copy operation, where only a pointer to the message or packet is sent (presumably in shared memory), isn’t explicitly provided by the API, but it may be considered in the future. In the meantime, it can be implemented in a layer above MCAPI.
You can send and receive messages on a blocking or non-blocking basis. A blocking operation means that, for example, if a 64-byte buffer is being sent, when the send instruction is executed in the code, the following instruction will not be executed until all 64 bytes have been sent. The sending call is unblocked when the send is complete, not when the corresponding receive operation is complete. With a non-blocking call, as soon as the send operation begins, code execution proceeds to the next instruction. API calls are provided to test or wait for completion of non-blocking operations, allowing timeouts if something gets hung up.
MCAPI doesn’t provide any explicit message delivery acknowledgment. In fact, a successful send operation means only that the data was successfully sent, not that it was successfully received. This is because the returned success codes are seen only by the calling process. So if process A sends data, then as soon as the data is sent, process A sees that it was successful. However, process B receives the data, and may or may not receive it successfully. Process B sees the success code from its own receive call; process A does not. So just because a send was successful from process A does not mean that the data was correctly received. In fact, the semantics are such that a receipt failure due to a missing node or broken connection is not considered a sending failure, but rather a receiving failure, and only the receiving node – if it exists – will know about it. If a receiving node is disconnected or destroyed before all of the data in its queue is consumed, packets or data may be lost, and there will be no notification of that loss to the sender. So in a sense, messages are being tossed over the wall. If that’s a problem, it is possible to layer an acknowledgment mechanism above MCAPI to communicate complete end-to-end success. In the most conservative case, a sending process could issue a send instruction (blocking or non-blocking) followed by a blocking receive instruction intended to receive an acknowledgment.
Many protocols provide a flow control mechanism to ensure that the receiving end of a channel can keep up with the data being sent. If the receiver starts to get overwhelmed, then “backpressure” is asserted on the sender to slow down or suspend the sending of data until the receiver can catch up. This can get very complicated, having to take into account such things as buffer space, the amount of data in transit, and the time required for a “Slow Down!” request to get back to the sender. MCAPI has no such explicit capability. The sending endpoint can query the amount of buffer space available in the receiving endpoint before sending data, but anything more sophisticated must be layered on top of the basic MCAPI implementation.
Further work on the MCAPI spec is anticipated. In addition to zero-copy transport and multicast, debug, statistics, and status calls are also on the docket for consideration. Within the Multicore Association, separate work is also expected on a Resource Management API, which will allow the management of such resources as shared and private memory, allocating and destroying regions, managing semaphores, and registering and locking generic resources. A separate effort is anticipated on task management (creating, destroying, suspending, resuming, allocating, and setting attributes) and debug capabilities.
So now that a spec is available, implementation can begin. The hope is that commercially available MCAPI packages will allow programmers simply to write code using the API, and include MCAPI libraries in the build. The execution of the APIs should be quick and resource-thrifty. Essentially, we can now create more walkable spaces without requiring large useless infrastructural investments. I’ll be able to mosey over to my nearby connecting flight with less stress and effort. The process has started, and we’ll take a look at some early results in a later article. At the same time, it provides a basis for further abstraction so that, while implementation can be efficient, applications programmers can eventually program without even having to be aware of whether the underlying implementation is tightly or loosely coupled. More thoughts on that later as well.