Inside Intel Nehalem Microarchitecture
Introduction
Contents
Nehalem is the codename of the new Intel CPU with integrated memory controller that will reach the market next month and that will be called Core i7; this architecture will also be used on CPUs targeted to servers (Xeon) and, a few years from now, it will also be used on entry-level CPUs. CPUs based on this architecture will have an embedded memory controller supporting three DDR3 channels, three cache levels, the return of Hyper-Threading technology, a new external bus called QuickPath and more. In this tutorial we will explain what’s new on this architecture.
Below we summarized a list of Nehalem main features, and we will explain what they mean in the next pages:
- Based on Intel Core microarchitecture.
- Two to eight cores.
- Integrated DDR3 triple-channel memory controller.
- Individual 256 KB L2 memory caches for each core.
- 8 MB L3 memory cache.
- New SSE 4.2 instruction set (seven new instructions).
- Hyper-Threading technology.
- Turbo mode (auto overclocking).
- Enhancements to the microarchitecture (support for macro-fusion under 64-bit mode, improved Loop Stream Detector, six dispatch ports, etc).
- Enhancements on the prediction unit, with the addition of a second Branch Target Buffer (BTB).
- A second 512-entry Translation Look-aside Buffer (TLB).
- Optimized for unaligned SSE instructions.
- Improved virtualization performance (60% improvement on round-trip virtualization latency compared to 65-nm Core 2 CPUs and 20% improvement compared to 45-nm Core 2 CPUs, according to Intel).
- New QuickPath Interconnect external bus.
- New power control unit.
- 45 nm manufacturing technology at launch, with future models at 32 nm (CPUs codenamed “Westmere”).
- New socket with 1366 pins.
It is important to remember that Core 2 CPUs manufactured under 45-nm technology have extra features compared to the Core 2 CPUs manufactured under 65-nm technology. All these features are present on Nehalem-based CPUs are the most significant ones are:
- SSE4.1 instruction set (47 new SSE instructions).
- Deep Power Down Technology (only on mobile CPUs, also known as C6 state).
- Enhanced Intel Dynamic Acceleration Technology (only on mobile CPUs).
- Fast Radix-16 Divider (FPU enhancement).
- Super Shuffle engine (FPU enhancement).
- Enhanced Virtualization Technology (between 25% and 75% performance improvement on virtual machine transition time).
Now let’s discuss in details the most significant differences introduced by this new architecture.
