When I was a student at university, in those dim and distant days we used to call the 1970s, my degree in Control Engineering involved two periods out in industry. I think they call this a co-op course in the USA. We used to refer to this sort of thing as a “sandwich course” in the UK. Some programs were known as “thin sandwiches” because they featured multiple cycles of six weeks at college followed by six weeks in the real world. My course was of the “thick sandwich” variety, in which we spent a year at university, six months in industry, a year back at university, six months in a different industry, and a final year at university.
My first placement in industry was at a Rolls Royce facility in the town of Filton, which is about five miles north of the city of Bristol. Sir Humphry Davy moved to Bristol to study science in 1797, and he later became renowned for creating laughing gas (a.k.a. nitrous oxide). Later, one of the greatest figures of the Industrial Revolution, the Victorian engineer Isambard Kingdom Brunel, became associated with the city, designing the Great Western Railway between Bristol and London Paddington, two pioneering Bristol-built ocean-going steamships (SS Great Britain and SS Great Western), and Bristol’s famous Clifton Suspension Bridge. But we digress…
While at Rolls Royce, the group of students of which I was part were fortunate to take an apprenticeship course that usually had a duration of three years, but that was compressed into six months just for us. This is where I was trained in the use of mills, drills, lathes, and grinders, where I learned how to wield welders (electric arc, argon arc, and oxyacetylene), and where I fought epic battles with pneumatic and hydraulic systems.
My second period in industry was at the research and development facility associated with a major glass company. In this case, I was their only student intern. The first thing I was asked when I arrived was if I knew anything about computers. I remembered a quote by some famous Hollywood actor (Errol Flynn?) that if someone asked if you could ride a horse, you said “Yes” and then quickly learned how to do so before they asked to see you in action. Thus, I did my best to convey that I knew a thing or two about computers without actually telling an untruth that would cause my all-seeing, all-knowing mother to purse her lips and shake her head in disapproval.
You have to remember that this was still the early days of microprocessors and microcomputers. It turned out that someone had ordered a microcomputer based on the 16-bit TMS9900 microprocessor from Texas Instruments, but that person had then accepted a position with a rival company prior to the computer’s arrival. As a result, they now had an embarrassingly expensive piece of equipment on their hands, but — since most of the control systems in glass factories at that time employed analog techniques or relay-based sequencing logic — no one had the faintest idea how to use it or what to do with it.
Die shot of Texas Instruments TMS9900 microprocessor (Image source: Pauli Rautakorpi/Wikipedia)
Thus, my task was to (a) learn how to use the beast and (b) think of something to do with it that would justify them buying the little rascal in the first place. As a result, I had an awesome time bouncing laser beams off liquid glass and detecting them with diode array cameras and suchlike.
All of my programs were created in assembly language. I was already familiar with the concepts of binary and two’s complement numbers, which is the most common method of representing signed integers in computers. However, it was while working with the TMS9900’s instruction set that I really began to understand the power of these little scamps.
If you are a software developer, you may consider the remainder of this column to be “old hat.” Having said this, even some of my cunning coding chums have said, “I never knew that!” with respect to some of the following pertinent points.
What do you think when you see a number like 42? You might observe that it’s the natural number that follows 41 and precedes 43 (thereby revealing yourself to be a tad pedantic). You might also be tempted to note that it’s the answer to the ultimate question of “Life, the Universe, and Everything” (thereby revealing yourself to be a fan of The Hitchhiker’s Guide to the Galaxy). Did you bother to make mention of the fact that it’s a positive value? Probably not, because by convention and by default (or vice versa) we automatically assume that a number without an associated sign is positive.
Bearing this in mind, and just to ensure we are all tap-dancing to the same drum beat, there are two main ways of representing integers that are commonly used inside a computer — unsigned and signed. An unsigned integer can represent only positive values; for example, an 8-bit unsigned integer can represent values in the range 0 to 255. By comparison, an 8-bit signed integer can represent both negative and positive values in the range -128 to 127.
The really clever thing is that, due to the way in which two’s complement numbers work (I’m not going to get into the nitty-gritty details here), operations like addition and subtraction are performed inside the computer in exactly the same way, irrespective of whether we consider the values to be unsigned or unsigned. If I had you sitting in front of a whiteboard, I could dazzle you with the beauty of the two’s complement format, but we have other fish to fry.
It was while I was working with the TMS9900’s instruction set that I became aware of some additional ways in which these values can be manipulated. For example, basic shift left instructions at the assembly code level always shift a 0 into the least-significant bit (LSB), while the most-significant bit (MSB) conceptually “falls off the end.” Contra wise, a logical shift right shifts a 0 into the MSB while the LSB “falls off the end.” The interesting one is the arithmetic shift right, which — whilst performing the shift — takes a copy of the original MSB (the “sign” bit) and replicates this value in the new MSB.
If you consider a value to be an unsigned integer and you wish to shift it to the right, then you will perform a logical shift right (using the TMS9900’s SRL assembly language instruction, for example). By comparison, if you decide to treat the same value as being a signed integer, then you will perform an arithmetic shift right (using the TMS9900’s SRA assembly language instruction, for example).
Once again, I don’t want to get into the nitty-gritty details here. Suffice it to say that the result of using an arithmetic shift right of one bit on a signed value is to divide the value by 2, irrespective of whether it’s a positive or negative value. For example, the signed integer 00111000 in binary (56 in decimal) shifted right one bit is 00011100 in binary (28 in decimal). By comparison, the signed integer 11001000 in binary (-56 in decimal) shifted right one bit is 11100100 in binary (-28 in decimal).
Now, keep in mind that all of the above is the way we see things at the assembly language level. It was much later that I became exposed to the higher-level C programming language, which was developed in the early 1970s by Dennis Ritchie at Bell Laboratories of AT&T (American Telephone & Telegraph).
The C programming language has two bit-shift operators: << shifts a value to the left by the specified number of bits, while >> shifts a value to the right by the specified number of bits.
Also, when you declare an integer value in C, you can specify it as being either signed or unsigned. Personally, I would have expected that if I were to declare an integer value as being unsigned, then the >> operator would implement a logical shift right (i.e., shift 0s into the new MSBs). By comparison, if I were to declare an integer value as being signed, then I would expect the >> operator to perform an arithmetic shift right (i.e., shift copies of the original MSB into the new MSBs). You can only imagine my shock and horror to discover that whether the >> operator applied to a signed value performs a logic or arithmetic shift is undefined.
Actually, it’s more nearly correct to say that the result is “implementation (i.e., compiler) dependent,” which means the result may change when you take your original C program from one machine and recompile it using a different compiler for use on a different machine. Call me “old fashioned” if you will, but I find this to be a tad worrisome.
Of course, professional software developers have a cunning trick they employ to resolve this situation, which is that they never use the >> operator on a signed integer. Although pragmatic, this leaves me somewhat unsated. Would it have killed the keepers of the C specification to simply and unequivocally state that: “Using the >> operator on a signed integer will perform an arithmetic shift”?
There’s so much more to all of this that I could waffle on for hours. For example, one of C’s basic type specifiers is the int (short for integer); for example:
int Fred; // At least two bytes (16 bits)
However, there’s no formal definition of the size of the int. This will depend on the compiler you are using, which will — in turn — be fine-tuned to make best use of the targeted processor’s data path width and internal architecture. All the C specification says is that an int will be at least two bytes in size. Similarly, there’s a short int whose minimum size is also two bytes, and a long int whose minimum size is four bytes (some compilers also support a long long int, but I fear that way lies madness); for example;
short int Bert; // At least two bytes (16 bits)
long int Tom; // At least four bytes (32 bits)
In the case of the short int and long int declarations, you can omit the int part if you wish (the compiler will know what you mean); for example:
short Bert; // At least two bytes (16 bits)
long Tom; // At least four bytes (32 bits)
By default, int, short, and long are considered to be signed values (like 42 is considered to be positive), but you can explicitly state this if you wish (like adding a ‘+’ sign to a number and writing +42); for example:
signed int Fred; // At least two bytes (16 bits)
signed short int Bert; // At least two bytes (16 bits)
signed long int Tom; // At least four bytes (32 bits)
Once again, you can omit the int part from the short and long declarations if you so desire; for example:
signed int Fred; // At least two bytes (16 bits)
signed short Bert; // At least two bytes (16 bits)
signed long Tom; // At least four bytes (32 bits)
Similarly, in all of the previous examples, you can replace signed with unsigned in order to declare unsigned values:
unsigned int Fred; // At least two bytes (16 bits)
unsigned short Bert; // At least two bytes (16 bits)
unsigned long Tom; // At least four bytes (32 bits)
There is also the char type, which is an 8-bit quantity that is intended (and predominantly used) to represent ASCII characters; for example:
char Gina;
char Jane;
The thing is that it’s not uncommon to perform mathematical operations on variables of type char. If so, will the compiler treat these variables as being signed or unsigned? Would you believe it? This is another case where the folks in charge of the C specification left this as being “implementation dependent,” which means the average user doesn’t have a clue. Happily, you can also prepend the char type with signed or unsigned qualifiers; for example:
signed char Gina;
unsigned char Jane;
Learning that the sizes of the int, short, and long data types can vary from computer to computer may be disconcerting to some. If you know you absolutely want an 8-bit signed integer called James and an 8-bit unsigned integer called Harry, then you can use the fixed-width integer types that have been available since the C++11 release; for example:
int8_t James; // Signed and definitely 8 bits (one byte)
uint8_t Harry; // Unsigned and unequivocally 8 bits (one byte)
There are also 16-bit (int16_t and uint16_t), 32-bit (int32_t and uint32_t), and 64-bit (int64_t and uint64_t) equivalents.
In the past, I’ve used a lot of 8-bit microcontrollers like the Arduino Uno for my hobby projects. More recently, I’ve started using 32-bit microcontrollers, like the Teensy 3.2 and the Seeeduino XIAO (see also Say Hello to the Seeeduino XIAO).
I come from the days when every byte of memory and every clock cycle were considered to be precious commodities. Some embedded systems designers still work with extremely limited resources. So, here’s a question for you. Should you spend a lot of time working out the maximum values you expect to store in your variables and then use the smallest fixed-width integer types that will satisfy the requirements of each variable? Surprisingly, the answer is “No” (or “Maybe” depending on whom you ask).
The problem is that, unless you fully understand the way in which your compiler works, and unless you are fully conversant with the implications of the underlying processor architecture, then — if you aren’t careful — using a smaller variable with the desire to save memory may negatively impact the program’s execution speed (don’t ask).
In some cases, it may be mandatory to use a fixed-width type, such as a uint32_t for NeoPixel color values. Generally speaking, however, it’s typically best to use int (or short or long) and let the compiler perform its optimization magic based on its knowledge of the target processor’s architecture.
Having said this… for those embedded systems designers riding the cutting edge of the performance and memory utilization wave, C99 compliant compilers also support minimum width and fastest minimum width types; for example:
int8_t fred; // Fixed width signed 8-bit integer
int_least8_t Bert; // Minimum width signed 8-bit integer
int_fast8_t Tom; // Fastest minimum width signed 8-bit integer
In this case, int8_t instructs the compiler to give you a signed integer of exactly 8 bits, irrespective of the underlying processor architecture; int_least8_t instructs the compiler to give you the smallest type of signed integer that has at least 8 bits, thereby optimizing for memory consumption on the target architecture; and int_fast8_t instructs the compiler to give you a signed integer of at least 8 bits, but to use a larger type if it will make your program faster (due to alignment considerations, for example), thereby optimizing for speed. (Yes, of course there are 16-, 32,-, and 64-bit equivalents.)
Phew! I don’t know about you, but my head hurts. As one final note, as I’ve mentioned before, I’m a hardware design engineer by trade, so when it comes to software, you shouldn’t believe a word I say (sorry). Thus, if you are one who dons the undergarments of a software guru and strides the coding corridors of power, it would be “Awesome Possum” (as the Americans amusingly say) if you were moved to expound, explicate, and elucidate on any of the above in the comments below.
Implementation dependency and my own laziness have been the ingredients of some miserable times, often only solved with a debugger or print out from the code when I have learned that what I thought was in the variable was very different from what was there. The pattern to this misery is one of first blaming the hardware, then the language and only eventually when all the innocents have been acquited myself. I have found with interrupts and the above sins, the opportunities to get really weird stuff is compounded many fold. The only blessing is that when things are solved I feel I have done something useful, till I remind myself that any fool can make work for themselves.
I know what you mean. The joy on my face when I finally track a problem down is soon mirrored by the scowl that comes when I realize the problem was of my own making LOL
FYI I just started reading a secondhand book from 1988 that was recommended to me — “C Traps and Pitfalls” by Andrew Koenig — all I can say is that every page is a gem!
Multiplying integers is worse. The product of two integer factors has as many bits as the sum of the number of bits in the two factors. C assumes that the product is the same size as the factors, so returns just the lower half of the product. In assembly, the product of two integers is returned in two registers, one for each half of the product. To get the full 64-bit product of two 32-bit integers, it may be faster to cast the factors to 64-bit integers to get the lower half of the 128-bit product in a single register.
That’s what I typically do — if I’m working with 8-bit integer values, for example, then I cast them to 16-bit values when multiplying and store the result in a 16-bit variable (it had never struck me until you mentioned it that the system will actually generate a 32-bit result, but only return the least-significant 16 bits — hmmm, I learn something new every day LOL