Since previous generation (6th generation), Intel processors use a hybrid CISC/RISC architecture. The processor must accept CISC instructions, also known as x86 instructions, since all software available today is written using this kind of instructions. A RISC-only CPU couldn’t be create for the PC because it wouldn’t run software we have available today, like Windows and Office.
So, the solution used by all processors available on the market today from both Intel and AMD is to use a CISC/RISC decoder. Internally the CPU processes RISC-like instructions, but its front-end accepts only CISC x86 instructions.
CISC x86 instructions are referred as “instructions” as the internal RISC instructions are referred as “microinstructions” or “µops”.
These RISC microinstructions, however, cannot be accessed directly, so we couldn’t create software based on these instructions to bypass the decoder. Also, each CPU uses its own RISC instructions, which are not public documented and are incompatible with microinstructions from other CPUs. I.e., Pentium III microinstructions are different from Pentium 4 microinstructions, which are different from Athlon 64 microinstructions.
Depending on the complexity of the x86 instruction, it has to be converted into several RISC microinstructions.
Pentium 4 decoder can decode one x86 instruction per clock cycle, as long as the instruction decodes in up to four microinstructions. If the x86 instruction to be decoded is complex and will be translated in more than four microinstructions, it is routed to a ROM memory (“Microcode ROM” in Figure 3) that has a list of all complex instructions and how they should be translated. This ROM memory is also called MIS (Microcode Instruction Sequencer).
As we said earlier, after being decoded microinstructions are sent to the trace cache, and from there they go to a microinstructions queue. The trace cache can put up to three microinstructions on the queue per clock cycle, however Intel doesn’t tell the depth (size) of this queue.
From there, the instructions go to the Allocator and Register Renamer. The queue can also deliver up to three microinstructions per clock cycle to the allocator.