[nextpage title=”Introduction”]

Penryn is the codename for the core that will be used by Core 2 and Xeon CPUs based on Core microarchitecture manufactured under 45 nm process. Besides the new manufacturing process, this new core will bring some new features that we will be explaining in this article. We will be also show you the latest Intel roadmap with what Intel will release up to 2010 and will be also talking briefly about the next microarchitecture from Intel, codenamed Nehalem.

All those names may be confusing, so let’s break them down for you. The latest Intel microarchitecture is called Core and is used on Core 2 and on the latest Xeon CPUs. Read our Inside Intel Core Microarchitecture tutorial to learn more about it. Currently the CPUs using this microarchitecture are manufactured using the 65 nm process.

Intel will start using the 45 nm manufacturing process in the second half of this year, when it will release a new CPU core based on Core microarchitecture, codenamed Penryn. Please read our Details on Intel’s Forthcoming 45 nm Manufacturing Technology article to learn what is new on this manufacturing technology. Intel also has reveled yesterday that this new core will bring some new features, which we will be explaining in this article.

In 2008 Intel will release a new microarchitecture, codenamed Nehalem. The first CPUs using this new microarchitecture will also be manufacturing using the 45 nm process. Intel revealed one of the new features of this new microarchitecture, which we will be explaining later.

For 2010 (or by the end of 2009) Intel plans to release a new CPU core based on Nehalem microarchitecture but manufactured using the 32 nm process. This new core is codenamed Westmere.

Also in 2010 Intel will release a new microarchitecture, codenamed Gesher. The first CPUs using this microarchitecture will be manufactured under 32 nm process.

In Figure 1 we present a foil summarizing this roadmap.

Intel RoadmapFigure 1: Roadmap for the next Intel cores and microarchitectures.

As you can see, Intel is now committed in delivering a new CPU microarchitecture every two years, on even years, with an enhanced version of each new microarchitecture being released on the following year, i.e., on odd years.

[nextpage title=”Penryn Core”]

Besides the new manufacturing process we have already explained on our Details on Intel’s Forthcoming 45 nm Manufacturing Technology article, Penryn core updates Core microarchitecture, bringing some new features:

  • Larger L2 memory cache (up to 6 MB for dual-core CPUs and up to 12 MB for quad-core CPUs).
  • Split load caches.
  • Faster buses (up to 1,600 MHz).
  • New SSE4 instruction set (which brings 47 new SSE instructions to the CPU).
  • Deep Power Down Technology (only on mobile CPUs).
  • Enhanced Intel Dynamic Acceleration Technology (only on mobile CPUs).
  • Fast Radix-16 Divider (FPU enhancement).
  • Super Shuffle engine (FPU enhancement).
  • Enhanced Virtualization Technology (between 25% and 75% performance improvement on virtual machine transition time).

Intel Penryn CoreFigure 2: New microarchitecture enhancements brought with Penryn core.

On Figures 3 and 4 you can see a summary of the CPUs that will be launched for the mobile, desktop and server markets based on the new Penryn core.

Intel Penryn CoreFigure 3: Mobile and desktop CPUs based on Penryn core.

Intel Penryn CoreFigure 4: Server CPUs based on Penryn core.

Let’s now talk about some of the new features brought by the new Penryn core.

[nextpage title=”Enhancements for Mobile CPUs”]

Penryn Core brings two new features for mobile CPUs: Deep Power Down Technology and Enhanced Dynamic Acceleration Technology.

Deep Power Down Technology

In order to save power, Intel mobile CPUs can reduce their voltage, disable their clock and even disable their memory cache when they are idle. This is particular useful in laptops, where any savings in power translates into a longer battery life.

There are three basic power-saving modes, called Halt (or C1), Stop Clock (or C2) and Deep Sleep (or C3). When no power saving mode is being used, the CPU is fully active and it is said to be in its C0 mode. These power-saving modes are generically called C-states.

With the first dual-core CPU based on the Pentium M core, called Core Duo, Intel allowed these modes to be configured on a per-core basis, meaning that if one of the CPU cores are idle, the CPU can reduce the voltage and turn off the clock for this core, while the other core is fully active. So the two cores can be on a different C-state. Also with this CPU Intel introduced two new C-states, Deeper Sleep (C4) and Enhanced Deeper Sleep (DC4), which can only be activated for the two cores at the same time.

Penryn core brings a new C state, called Deep Power Down (C6). When the CPU enters this mode, the CPU voltage is reduced a lot, the clock signals are disabled and both memory caches are turned off. This mode saves more power (i.e., battery) than all other C modes available to date, but on the other hand the CPU delays more to go back to work at full speed.

In Figure 5, you can see a comparison of the new Deep Power Down mode to the other C- modes currently available on Core Duo and Core 2 CPUs.

Intel Penryn CoreFigure 5: Deep Power Down mode compared to other C-modes.

This new C mode is only available on mobile CPUs.

Enhanced Dynamic Acceleration Technology

When one of the cores enters one of the deeper power saving states (C3 state on) the new Penryn core allows the other core to increase its core clock (i.e., to overclock itself) keeping the CPU inside its TDP (Thermal Design Power). Since the inactive core will be consuming less power, the active core can dissipate more heat and consume more energy and still maintain the CPU inside its thermal and power envelope: the whole CPU will still be consuming the same amount of energy (or less) and be dissipating the same amount of heat (or less).

Intel Penryn CoreFigure 6: Enhanced Dynamic Acceleration Technology.

Like Deep Power Down, this new feature is only available on mobile CPUs.

[nextpage title=”FPU Enhancements”]

The new Penryn core brings two enhancements to the CPU floating-point unit (FPU), one for its divider engine and another for its shuffle engine.

Fast Radix-16 Divider

This is an enhancement on the way that the CPU floating-point unit (FPU) handles division operations. On Core 2 CPUs, division operations process two bits per clock cycle. The new divider circuit implemented on Penryn is able to process four bits per clock cycle, meaning it is two times faster on division operations that Core 2 CPUs.

In Figure 7, you can see a comparison between the FPU of the Core 2 Duo CPU and the FPU of the new Penryn core. The “y” axis represents clock cycles, so the lower the bars, the better (less time is spend processing an instruction). On the “x” axis you can see the several division instructions selected for this comparison.

Here is a small glossary for understanding Figure 7 if you are not familiar with CPU instructions:

  • int = Integer
  • SP = Single Precision (32-bit numbers)
  • DP = Double Precision (64-bit numbers)
  • EP = Double Extended Precision (80-bit numbers)

Intel Penryn CoreFigure 7: Performance comparison of the new divider engine used on Penryn Core.

Super Shuffle Engine

This is an enhancement on the way the CPU floating-point unit (FPU) handles shuffle operations used by SSE data formatting instructions, allowing Penryn-based CPUs to perform some instructions in less clock cycles compared to the core currently used by Core 2 Duo processors (Merom).

In Figure 8, you can see a comparison between the number of clock cycles these two cores take to perform each one of these instructions. The smaller the bars, the better – less clock cycles means less time spend, thus higher speed.

As you can see, several 128-bit SSE instructions that took more than one clock cycle to be processed are now processed in just one clock cycle, improving SSE performance. SSE (Streaming SIMD Extensions) is used by multimedia applications that implement this kind of instruction.

Intel Penryn CoreFigure 8: Performance comparison of the new shuffle engine used on Penryn Core.

[nextpage title=”Nehalem Microarchitecture”]

As for the next Intel microarchitecture, Intel revealed some key points of the forthcoming Nehalem microarchitecture.

Concretely what we know so far:

  • It will be based on Core microarchitecture.
  • Capacity of up to eight cores per CPU.
  • It will bring Hyper-Threading to Core microarchitecture. As Hyper-Threading is a feature available only on Netburst microarchitecture (used on Pentium 4), it was renamed to Simultaneous Multi-Threading (we need to wait and see if this will be the final name they will use). This technology simulates two cores for each CPU core. So a quad-core CPU would be recognized as an eight-core CPU by the operating system.
  • Some models will have an integrated video controller. Yes, on-board video controlled by the CPU. It should be a lot faster than the current on-board video solutions controlled by the chipset. With the acquisition of ATI by AMD, we should expect something similar coming from AMD side as well. This isn’t a new idea, by the way. Cyrix used this idea back in 1997 on their MediaGX CPU, which had an embedded video and memory controller. Cyrix division in charge of MediaGX was sold to National, which developed its Geode processor line based on MediaGX. Even more curiously, AMD bought Geode line from National in 2003.
  • It will have the north bridge chip embedded in the CPU, just like what happens with AMD64 CPUs.
  • Multi-level cache, whatever this is. Probably each core will have its own L2 memory cache plus an L3 cache that will be shared by all cores. This is just a speculation; we need to wait for more details.

Intel listed several generic and very obvious features that simply means nothing to us, so we will need to wait until they disclosure more on Nehalem microarchitecture to be able to translate what they mean by this:

  • “Dynamically scalable for leadership performance on demand with energy efficiency”.
  • “Leadership system and memory bandwidth”
  • “Performance enhanced dynamic power management”
  • “New system architecture for next-generation Intel processors and platforms”
  • “Dynamically managed cores, threads, cache, interfaces and power”