Inside Pentium M Architecture
Memory Cache and Fetch Unit
Contents
As we mentioned, Pentium M’s L2 memory cache can be of 1 MB (130 nm models, a.k.a. “Banias” core) or of 2 MB (90 nm models, a.k.a. “Dothan” core), while it has two L1 memory caches, one of 32 KB for instructions and another of 32 KB for data.
The fetch unit is divided into three stages, as we explained in the previous page. In Figure 2, you can see how Pentium M’s fetch unit works.
As we mentioned before, the fetch unit loads one line (32 bytes = 256 bits) into its Instruction Streaming Buffer. Then the Instruction Length Decoder identifies the instructions boundaries within 16 bytes (128 bits). Since x86 instructions don’t have a fixed length this stage marks where each instruction starts and ends within the loaded 128 bits. If there is any branch instruction within these 128 bits, its address is stored at the Branch Target Buffer (BTB), so the CPU can later use this information on its branch prediction circuit. The BTB has 512 entries.
Then the Decoder Alignment Stage marks to which instruction decoder unit each instruction must be sent. There are three different instruction decoder units, as we will explain in the next page.

