What is Compiler optimization and embedded systems?
In this article, we will take a deep dive into what compiler optimization is and how it works in embedded systems. Along the way, you will find answers to many common questions, such as “Can I rely on gcc compiler optimization for my embedded project?” …and more.
The first part of this article focuses on explaining what compiler optimization is and the basics of how it works. The second part discusses some commonly encountered problems when compiler optimization gets applied to embedded software development.
I’ve received a lot of positive feedback from people saying that they really like this approach, so I hope you enjoy it too! 🙂
What is Compiler Optimization?
Software optimization in (C/C++ and Assembly) programming languages refers to a group of techniques used by programmers to change the way instructions are executed inside software, in order to meet specific goals. The main goal is usually to maximize performance, i.e., minimize execution time or resource usage.
Before Compiler Optimization: Source Code
The following example shows how code would be written without compiler optimization:
If we wanted to add two numbers together and return the result, we could write something like this: _Z4addiii: push {r4-r7} mov r4, r1 @ move int32 into register 4 add r5, r1 , # 1 @ add value at R1 with signed 8-bit immediate value (1) add r6, r1 , # 1 @ add value at R1 with signed 8-bit immediate value (2) mov r7, r0 @ move result value into register 7 pop {r4-r7} bx lr @ return
The above example demonstrates how assembly instructions are mapped to the source language statements. The push and pop instructions save and restore registers that were modified by previous instructions (in this case, it is required for the compiler because C/C++ uses call stack to manage function invocations).
Another thing worth mentioning here is how additional work must be done when building larger applications. For example, if you had two files in your application – one named foo.c, and another named bar.c – both using the same register variable r8, then the code would have to be compiled into two separate regions that are identical except for register r8.
Instead of manually changing registers on a per-region basis, you could define preprocessor macros that map custom names to specific register variables, or even use compiler intrinsics to assign numeric values instead of symbolic constants.
Without compiler optimization: Assembly code
The following example shows how assembly instructions are mapped to C/C++ statements with no optimization enabled: _Z4addiii: push {r4-r7} mov r4, r1 @ move int32 into register 4 add r5, r1 , # 1 @ add value at R1 with signed 8-bit immediate value (1) add r6, r1 , # 1 @ add value at R1 with signed 8-bit immediate value (2) mov r7, r0 @ move result value into register 7 pop {r4-r7} bx lr @ return
As you can see here, the work done by the compiler is extra. This code needs to allocate stack space for variables that are only used locally inside functions. It also needs to save and restore registers before and after function calls. And of course, it also has to translate assembly instructions into C/C++ statements
It’s worth noting that many compilers perform some level of optimization even with simple code like this, but that’s beyond the scope of this article. If you want to take a look at how GCC works, then you should check out the following resources:
What is Compiler Optimization in Detail?
This section covers some of the benefits gained from compiler optimization by using higher-level languages such as C/C++. The next two subsections are at a very high level, so if you are not familiar with these concepts, then I suggest studying them first before moving on.
Higher-Level Programming Languages & Compiler Optimization
The main reason why compilers optimize higher-level programming languages is because they have no choice! When writing assembly instructions, there’s only one way to do something. In higher-level languages, there are often many different ways of performing the same task.
To give a simple example, a byte could be read from a stream using any of these methods:
char c; while((c = getchar()) != EOF) { process(c); } char* strPtr = fopen(“input_file”, “r”); int numBytesRead = 0; while(numBytesRead < 10){ c = strPtr [numBytesRead]; // allocate space for each character in string process(c); ++numBytesRead; } char buf [10] ; if(read(fd, buf , 5) == 5) { for (int i = 0 ; i < 5 ; ++i) { process(buf [i]); } }
As you can see, these methods are all different at the source level. However, they all describe the same operation to the compiler by using higher-level languages. To be able to perform this optimization, the compiler needs certain types of information about how variables are being used in each method. These can include things like:
Is a variable read before it is modified? How many times is it written to? Which instructions write to it after reading from it? Is a variable initialized first before being read or written to later? Information like this tells compilers which registers need saving and restoring during function calls. It also helps them decide between multiple possible paths through the code.
An example of this kind of optimization can be seen in the following function: int32 loadFloat(int16* ptr) { int32 result = 0; while(*ptr != 0x8000){ result += *ptr++<<8; } return result; } In this code, variables written to are shown using a bold font and variables that are read from are shown using an italics font. As you can see, the variable ptr is both read from and written to here. That means it needs to be stored in a register before the function is called and restored afterwards. This would not happen if we used code like this instead: uint32 loadFloatNoOptim(uint16* ptr) { uint32 result = 0; while(*ptr != 0x80000000UL){ result += *ptr++<<8; } return result; } In the previous example, the variable ptr is only read from. This means that it can be stored in a memory location which is faster to access. We save some time here by not having to save/restore registers before/after calling this function.
Optimizing Compilers & High-Level Languages
If you look at the assembly instructions generated for functions written with higher-level languages, you’ll notice that they are not very optimal compared to hand-written code! However, there’s still more work that needs to be done by compilers even after these optimizations have been applied. There are often many things that could be done to these functions to make them faster or smaller.
For example, we saw earlier how variables were stored in registers (and memory locations) depending on how they were used. There is a sense of “correctness” about which variables should be stored into which registers and memories. If you go back and look at the sections about data types, you’ll notice that certain data types can be stored in specific registers and memories based on their sizes. This information about variable usage and memory placement forms part of the function signature for the compiler: it tells the compiler what kind of values will come in, where values are stored and how long they need to be stored before being retrieved again later. Intuitively speaking , this information forms part of the “function signature” for the compiler .