How a CPU Works
Out-Of-Order Execution (OOO)
Contents
Remember that we said that modern CPUs have several execution units working in parallel? We also said that there are different kinds of execution units, like ALU, which is a generic execution unit, and FPU, which is a math execution unit. Just as a generic example in order to understand the problem, let’s say that a given CPU has six execution engines, four “generic” and two FPUs. Let’s also say that the program has the following instruction flow in a given moment:
1. generic instruction 2. generic instruction 3. generic instruction 4. generic instruction 5. generic instruction 6. generic instruction 7. math instruction 8. generic instruction 9. generic instruction 10. math instruction
What will happen? The schedule/dispatch unit will send the first four instructions to the four ALUs but then, at the fifth instruction, the CPU will need to wait for one of their ALUs to be free in order to continue processing, since all its four generic execution units are busy. That’s not good, because we still have two math units (FPUs) available, and they are idle. So, a CPU with out-of-order execution (all modern CPUs have this feature) will look at the next instruction to see if it can be sent to one of the idle units. In our example, it can’t, because the sixth instruction also needs one ALU to be processed. The out-of-order engine continues its search and finds out that the seventh instruction is a math instruction that can be executed in one of the available FPUs. Since the other FPU will still be available, it will go down the program looking for another math instruction. In our example, it will pass the eight and the ninth instructions and will load the tenth instruction.
So, in our example, the execution units will be processing, at the same time, the first, the second, the third, the fourth, the seventh and the tenth instructions.
The name out-of-order comes from the fact that the CPU doesn’t need to wait; it can pull an instruction from the bottom of the program and process it before the instructions above it are processed. Of course the out-of-order engine cannot go forever looking for an instruction if it cannot find one. The out-of-order engine of all CPUs has a depth limit on which it can crawl looking for instructions (a typical value would be 512).
