The Nehalem is based on the architecture of the 45nm Core 2 models, which themselves had some significant improvements over the earlier 65nm C2 models. Now, I'm not a tech, and know little about the intricate innerworkings of processors, but it seems to me that a more accurate description is that the i7 is "loosely" based on the C2, considering the notable differences between the two.
Probably the most significant difference to we lay people is the addition of an integrated memory controller, controlling triple-channel DDR3 memory. No longer does the CPU depend on a single external bus, the FSB, for both memory and I/O requests, there are now separate buses for memory and for communications with the rest of the motherboard. This allows increased performance due to dual datapaths for memory and I/O, and also because the CPU no longer has to communicate first with an external controller anymore.
The external bus is called "QuickPath Interconnect" or QPI, and provides two datapaths, one for incoming data and one for outgoing data, similar to AMD's HyperTransport Bus (HT). The original generation of QPI will run at 3.2 GHz transferring two 16-bit data per clock tick, which equals to a theoretical transfer rate of 12.6 GB/s on each direction.
The memory controller has three channels running in parallel, in other words it accesses three DDR3 memory modules at one time, providing a 50% increase in available bandwidth over dual-channel architecture.
The Core 2 Quad has three memory caches, the L1 cache is shared by all cores and has 32Kb for instructions and 32Kb for data (32Kb+32Kb). It has two L2 caches, one for each pair of cores.
The Core i7 has the same L1 cache, but there are separate 256Kb L2 caches for each of the four cores. In addition, there is an 8Mb L3 cache shared by all four cores.
Improved CPU Pipeline
Without going into a lot of detail, Intel has made some significant improvements in the way the i7 processes data compared to the Core 2. Programs are written using "X86 instructions" that the CPU execution units do not understand, they must first be decoded into "microinstructions" or "micro-op". The Core 2 introduced "macro-fusion" which translated two X86 instructions into a single micro-op, which in theory allowed the CPU to do half as much work, with some limitations, and greatly increased performance, lowered power consumption and heat. One drawback was that macro-fusion could only be performed when the CPU is working in 32-bit mode.
The Nehalem improves upon this in a couple of ways. First, it adds support for several branching instructions allowing even more things to be happening at the same time. Also, macro-fusion is performed both in 32-bit and 64-bit mode.
Improved Loop Stream Detector
The Core 2 has a small 18-instruction cache between the fetch and decode units. When the CPU is running a loop, a series of instructions that is repeated several times.the CPU doesn't have to fetch the instructions again from the L1 instruction cache. The CPU actually turns off the fetch and branch prediction units while running a loop to save power.
The i7 greatly improves on this by placing the decoder before the instruction cache, and increasing the size to hold 28 micro-ops. Now the cache holds much more than twice the instructions and they are already decoded, allowing the decoder to also be turned off while running a loop.
In addition, Nehalem architecture adds an extra dispatch port and has 12 execution units, allowing the i7 to execute more microinstructions at one time than any earlier processor. Intel has also added a second Translation Look-aside Buffer (TLB), and a second Branch Target Buffer (BTB), both allowing for increased CPU performance. There is also a new power control unit.
Intel Turbo Boost Technology
The i7 has a new feature called Intel Turbo Boost. Keep in mind that not all applications will use all four cores. When applications are using only two or three cores, the remaining core(s) can enter a sleep state. Turbo Boost gives the capability to free up the unused power from those cores and uses it to overclock the working cores. Turbo Boost is part of Enhanced Intel Speed Step and requires no additional software, merely enable it in the BIOS. Isn't that cool?