Memory Cache Organization
The memory cache is divided internally into lines, each one holding from 16 to 128 bytes, depending on the CPU. On the majority of current CPUs the memory cache is organized in 64-byte lines (512 bits), so we will be considering a memory cache using 64-byte lines in our examples throughout this tutorial. On the last page we will present the main memory cache specs for all CPUs currently found on the market.
So a 512 KB L2 memory cache is divided into 8,192 lines. Keep in mind that 1 KB is 2^10 or 1,024 bytes and not 1,000 bytes, so the math is 524,288 / 64 = 8,192. We will be considering a single-core CPU with 512 KB L2 memory cache in our examples. In Figure 5 we illustrate this memory cache internal organization.
The memory cache can work under three different configurations: direct mapping, fully associative and set associative (a.k.a. n-way set associative). The later is the most used configuration nowadays, but let’s take a look at how these three configurations work.
Direct mapping is the simplest way of creating a memory cache. In this configuration the main RAM memory is divided into the same number of lines there are inside the memory cache. If we have a system with 1 GB of RAM, this 1 GB will be divided into 8,192 blocks (assuming the memory cache uses the configuration we describe above), each one with 128 KB (1,073,741,824 / 8,192 = 131,072 – keep in mind that 1 GB is 2^30 bytes, 1 MB is 2^20 bytes and 1 KB is 2^10 bytes). If our system had 512 MB the memory would be also divided into 8,192 blocks, but this time each one would have 64 KB. And so on. We illustrate this organization in Figure 6.
The main advantage of direct mapping is that it is the easiest configuration to implement.
When the CPU asks for a given address from the RAM memory (e.g., address 1,000), the cache controller will load a line (64 bytes) from the RAM memory and store this line on the memory cache (i.e., addresses 1,000 through 1,063, assuming we are using 8-bit addressing scheme just to help our examples). So if the CPU asks again the contents of this address or of the next few addresses right after this address (i.e., any address from 1,000 to 1,063) they will be already inside the cache.
The problem is that if the CPU needs two addresses that are mapped to the same cache line, a cache miss will occur (this problem is called collision or conflict). Continuing our example, if the CPU asks address 1,000 and then asks address 2,000, a cache miss will occur because these two addresses are inside the same block (the first 128 KB), and what was inside the cache was a line starting from address 1,000. So the cache controller will load the line from address 2,000 and store it on the first line of the memory cache, cleaning the previous contents, in our case the line from address 1,000.
The problem goes on. If the program has a loop that is more than 64 bytes long, there will be a cache miss for the entire duration of the loop.
For example, if the loop goes from address 1,000 to address 1,100, the CPU will have to load all instructions directly from the RAM memory through the duration of the loop. This will happen because the cache will have the contents from addresses 1,000 through 1,063 and when the CPU asks for the contents from address 1,100 it will have to go the RAM memory, and the cache controller will load addresses 1,100 through 1,163. When the CPU asks address 1,000 back it will have to go back to the RAM memory, as the cache doesn’t have the contents from address 1,000 anymore. If this loop is executed 1,000 times, the CPU will have to go to the RAM memory 1,000 times.
That is why direct mapping cache is the least efficient cache configuration and not used anymore – at least on PCs.
On fully associative configuration, on the other hand, there is no hard linking between the lines of the memory cache and the RAM memory locations. The cache controller can store any address. Thus the problems described above don’t happen. This configuration is the most efficient one (i.e., the one with the highest hit rate).
On the other hand, the control circuit is far more complex, as it needs to keep track of what memory locations are loaded inside the memory cache. That is why a hybrid solution – called set associative – is the most used one nowadays.