Author: William Henning
Editor: Howard Ha
Publish Date: Sunday, November 2nd, 2008
Originally Published on Neoseeker (http://www.neoseeker.com)
Article Link: http://www.neoseeker.com/Articles/Hardware/s/nehalem_core_i7_review/
Copyright Neo Era Media, Inc. - please do not redistribute or use for commercial purposes.
Today we get to publish our review on the long awaited Core i7 "Nehalem" 920, 940 and 965 Extreme, along with its Intel DX58SO motherboard based on the new X58 Northbridge and ICH10 southbridge.
Before talking about the Core i7 family (formerly known as Nehalem), I'll try to pre-empt some of your questions:
The above processor pricing is in 1000 piece OEM quantities.
The Core i7 is quite a beast comprised of 731 million transistors, with a 263mm^2 die.
Nehalem is the 'tock' following the 'tick' of the Penryn (Core 2), and will itself be followed by the 'tick' of Westmere in 2009 and 'tock' of Sandy Bridge in 2010. Both Westmere and Sandy Bridge are planned to be built with a 32nm process, with Westmere being a "mere" shrink and optimization of Nehalem, and Sandy Bridge being a new architecture. But enough of the future for now.
For Nehalem, Intel's design philosophy consisted of an enhanced core with new performance features and the re-introduction of multi-threading. Of course a high preformance processor core - never mind four multi-threaded ones - need a lot of data to keep busy, therefore Intel designed a new cache hierarchy, and made a whole new platform based on its Quick Path Interconnect (hereafter QPI) and a new triple channel on-processor memory controller. While they were at it, they also added some new instructions and improved support for virtualization.
The following information on the Core i7 (Nehalem) architecture was taken from the Intel Development Forum papers.
Core i7 Architecture
In order to optimize performance - and reduce power consumption - Nehalem takes power management to new heights for Intel. Not only can it run on just one underclocked core when the computer is not loaded, it can automatically overclock from one to four cores as the performance is needed! Intel hopes to use the same Nehalem design to address the needs of servers, mobile computing, and desktop/workstation computers.
You read right, Core i7 chips automatically overclock themselves to a certain degree when needed. Mind you, the overclock is not much at this time - just an increase by one step of the multiplier when all four cores are busy - however it can potentially overclock fewer cores even higher, as long as the processor as a whole stays within its rated TDP envelope.
The Core i7 keeps all of the performance features of the Core 2:
and adds new advances of its own:
Each core has 32KB of instruction cache, 32KB of data cache, and a private unified 256KB L2 cache - and the four cores share a massive 8MB of L3 cache.
At the "front end" of a core, the 4 instruction wide decored is followed by a "macro fusion" unit and a loop stream detector.
The Macro fusion unit can combine TEST/CMP instruction followed by a branch into a single operation, thus improving throughput and effectively executing more instructions per unit time.
The loop stream detector allows the disabling of unneeded gates as there is no need to keep fetching and decoding the same instructions repeatedly, and also there is no need to predict the branches - this leads to higher performance and lower power consumption. Nehalem also improves on branch prediction by going to a multi-level approach,
At the "Execution Unit", Nehalem uses a "unified reservation station" to schedule work among the six potential execution units - and it can potentially execute six operations per clock cycle:
While Penryn (Core 2) already has a similar Reservation Station scheme, Nehalem significantly improves on it.
Due to the addition of SMT, Nehalem has 36 reservation stations instead of 32 for Penryn; Penryn had 32 load buffers, Nehalem increases that number to 48, and Nehalem also increases store buffers to 32 from Penryns 20.
These changes also help Simultaneous Multi Threading keep more of the execution engines occupied that would be left idle, and by keeping idle units busy, increasing performance instead of wasting power.
Core i7 Architecture - Continued
Nehalem also greatly improves on the memory architecture - massively increasing the bandwidth available to the cores not only from the caches but also from the external main memory. Careful attention was paid to reducing latency in all the levels of the memory hierarchy - L1, L2 and L3 caches as well as the new on-chip triple channel memory controller.
Intel considers the Nehalem to be divisible into two areas - the "Core" area, consisting of a number (currently four) of processor cores with individual L1 and L2 caches, and the "Uncore" area, consisting of a shared L3 cache, an Integrated Memory Controller (currently with three channels), a number of Quick Path Interconnects, and a Power&Clock section.
For servers and desktops in 2008-2009, Intel intends to differentiate its offerings by varying the:
Intel is increasing cache performance by moving to per-core low latency L1 and L2 caches that share a unique shared L3 caches. The L3 cache is inclusive, that is, anything present in a core's L1 or L2 cache must be present in the L3 cache as well.
This presents some advantages, as Intel has added a "present in core's L2 cache" bit to each cache line in the L3, significantly reducing cache snooping and cache coherency traffic, as if the data being requested is not available in the L3 cache, it is guaranteed not to be in the L1/L2 caches of the other cores. If the L3 cache line is present in another core, the other core must be snooped to see if it has modified the cache line. The relatively small sizes of the L1 and L2 caches allow them to be built with very low latency - and help minimize cache coherency checks - and also allow for reducing the size of the L3 cache to as little as 1MB in a four core processor.
Nehalem increases the scalability of multi-processor systems significantly by having the memory controller on the processor; thus adding a socket also adds another three memory channels that may be populated. Adding processors, with associated memory channels, will increase the memory bandwidth available to servers, thus greatly improving scalability.
Currently Nehalem officially supports up to DDR3-1333, but as you will see, we were able to exceed that in our tests. Having the memory controller - with a potential peak 32GB/sec bandwidth - on the processor allows for hitherto unseen (on Intel platforms) low memory latencies. Nehalem wil also support RDIMM and UDIMM memories.
Using the Quick Path Interconnect, Intel adds NUMA capability (Non-Uniform Memory Access) needed to access the memory attached to other processors in the system. The memory local to any processor socket will always be faster to access than memory attached to another processor, however QPI will make non-local memory available at data rates comparable to, and in some cases faster, than current Intel FSB designs.
The combination of the triple channel memory controller and QPI is likely to erode the current advantage AMD enjoys in multi-socket servers; thus allowing Intel inroads in the only market where it currently arguably runs second place to AMD.
The improved virtualization support not only reduces the time cost of entering/leaving a virtual machine, it also reduces the number of virtual transitions by implementing extended virtual page tables to translate guest to host physical addresses, removing the #1 cause of having to leave the virtual machine and allowing virtual guests full control over their own page tables. A virtual processor ID also helps reduce the frequency of TLB entry invalidations.
Intel is also updating its optimizing compilers, and so is Microsoft - the new 2008 Visual studio will support SSE4.2 fully.
Core i7 920 - the value part
The new Intel Core i7 920 processor is the lowest cost of the three Nehalem based processors that are being launched this month.
With an OEM 1000 unit price of $284 per processor it is priced competitively, and should perform well - later in this review you will be able to compare its performance to the current highest speed Intel processor, the Core 2 Quad QX9770 Extreme.
The Core i7 920 features:
Here is an image of the Core i7 920 Intel provided us with:
And here it is with its stock heatsink:
The Core i7 920 should retail for around $300 and it represents the "value" segment of the current i7 line, and it will be very interesting to see how it performs compared to the current QX9770.
The i7 920 we received ran at a default 1.17V core voltage, and surprisingly had a 150W TDP programmed into the processor.
Core i7 940 - mid range part
The new Intel Core i7 940 processor is the middle priced processor of the three Nehalem based processors that are being launched this month.
At $562 per processor in OEM quantities, it is priced at almost twice the price of the i920, and runs at 266MHz higher speed for each core than the i920.
The Core i7 940 features:
Here is an image of the Core i7 965 Intel provided us with - we downclocked the processor to 740 speeds per Intel's directions:
Here is an i720 with its stock heatsink, presumably the 940 will ship with the same heatsink:
The Core i7 940 should retail for around $580 and it represents the "mainstream" segment of the current i7 line. It will be very interesting to see how it performs compared to the i7 920, 965 and the current champion, the QX9770.
The i7 965 we received ran at a default 1.15V core voltage, and had a 130W TDP programmed into the processor.
Core i7 965 Extreme - high end part
The new Intel Core i7 965 Extreme processor is the new top of the line Nehalem based processor being launched this month. It replaces the Core 2 Quad QX9770 as the highest performance part being sold by Intel for the consumer marketplace, and it is supposed to be a performance monster.
At $999 per processor in OEM quantities, it is priced at almost twice the price of the 940 and almost four times the price of the 920, and the 965 runs at a blistering 3.2GHz, significantly faster than the 2.66GHz of the 920 and the 2.93GHz of the 940.
The Core i7 965 features:
Here is an image of the Core i7 965 Intel provided us with:
We don't know what heatsink Intel will ship with the Core i7 965 Extreme - it is conceivable that it will ship with one similar to the one shown below that is packaged with the 920, or it might be something more similar to the Thermalright Intel provided us for testing the 965.
The Core i7 965 Extreme should retail for just over $1000 and it represents the "performance" segment of the current i7 line.
I can't wait to see how it will perform, and how high it will overclock!
The i7 965 we received ran at a default 1.15V core voltage, and had a 130W TDP programmed into the processor.
X58 & ICH10 North & South bridge for the Core i7
The new Core i7 processors needed a new chipset to support them. Unlike previous Intel chipsets, there was no need to provide a memory controller in the Northbridge, and the connection to the processor now was via a QPI interconnect instead of a front side bus.
The HyperTransport like interconnect consists of two sixteen bit lanes unidirectional lanes. By having lanes for each direction of transfer there is no need to waste time switching directions, and it also eliminates the possibility of conflict by having the processor and chipset trying to drive a bus at the same time.
The Core i7 920 and 940 run the QPI at 4.8GT/sec, which translates to potentially 9.6GB/sec in each direction for an aggregate potential bandwidth of 19.2GB/sec per QPI link. The Core i7 965 Extreme runs the QPI paths at 6.4GT/sec, with a potential for a staggering 25.6GB/sec of theoretical maximum aggregate bandwidth per link - MUCH faster than the old FSB design of Core 2 and older generations.
As you can see on the diagram below, a Core i7 processor connects to three channels of DDR3 memory (at up to 8.5GB/sec per channel with DDR3-1067) and has a QPI link to the X58 Northbridge (now called an IOH for I/O Hub).
The X58 provides up to 36 lanes of PCIe 2.0, which can be used to support multiple different configurations of slots - however I suspect one popular configuration will be two PCIe 16x slots and several 1x slots.
The X58 also connects to the new ICH10 or ICH10R Southbridge over a 2GB/sec DMI link.
Here is the diagram:
Of course, you will find the X58 on Intel's new DX58SO motherboard, which we will describe in more detail later.
Thermalright Socket 1366 Cooler
The Thermalright Ultra-120 eXtreme cooler is a great heatsink, it kept the Core i7 965 and Core i7 920 at decent temperatures through my overclocking torture of the chips.
It looks cool, and it is cool.
Stock Intel Heatsink and Fan
And here we have a (larger than previous Socket 775 versions) stock Intel heatsink. I am sure it would work with the Core i7 920 at stock speeds, however I was bad and used the Thermalright for all the tests.
Intel XM25 SSD
Intel's XM25 SSD is small but packs one hell of a punch.
You will see the charts later, but I tell you now that it averaged at least 212MB/sec with a 0.1ms seek time across the benchmarks with both HDTach and HDTune!
The speedup in Business and especially Creative Winstone is noticeable, along with a 25% speedup in video transcoding.
This is one sweet drive, and it deserves a "Performance Award".
The Quimonda DDR3-1066's supplied by Intel were totally problem free and far exceeded their specifications, easily running at 1200-7-7-7-20 with 2T command rate; and 1394-8-8-8-24 - I was impressed! If this were a review of those sticks, they'd get a "Value Award" from me!
Intel Extreme Motherboard DX58SO
Intel has a whole black-and-blue theme going with the DX58SO... the board is meant to look baaaddd.
One of the first things I noticed about the motherboard were the small heatsinks - far smaller than the monstrosities we have become accustomed to on Socket 775 motherboards. This is a good indication that Intel thinks that the thermal envelope of the chipset is under control.
The next thing I noticed was the lack of "legacy" connectors - while I've become used to the lack of a serial or parallel port, it was a bit shocking to not see at least one PS/2 port, and there was no sign of a floppy or IDE connector on the motherboard!
Looking at the back of the board we see that Intel is serious about the mounting of not only the processor heatsink, but also that of the X58 Northbridge.
Look at those nice solid state caps and coils.. and the heatsinks on the voltage regulators.
One of the few things I did not like about the motherboard is that only one of the three memory channels supported two DIMMs. Come on, with such a high end board and processor let me stuff 12GB or more memory in there!
Here we have a pretty big heatsink for the X58, and a standard 24 pin power connector.
The six Sata-II ports are nicely spaced out, making cabling more convenient, and note the small heatsink on the Southbridge.
As you can see, there are two PCIe 16x connectors, one PCIe 4x connector, two PCIe 1x connectors and a lone PCI slot.
Here is another angle on the slots:
The IO panel is a bit sparse looking - two eSATA ports take the space normally taken by PS/2 connectors; eight USB2.0 ports, one FireWire, one Gigabit Ethernet and many audio connectors - including an optical out - complete the back panel.
Here is the stuff that came in the Intel evaluation kit:
The Intel BIOS splash screen is pretty plain - but fairly elegant, and would be even more so without the second Intel logo on white.
The main BIOS page allows controlling how many cores are used, and allows you to disable HyperThreading, and lets you view more system information.
The Advanced menu gives access to the normal chipset control, hardware monitoring and so on functions - however Intel does add a nice touch with a BIOS based error log.
I will leave the good stuff - the performance page with its overclocking options - for the next page in the article :-)
Here are the rest of the conventional pages:
Now on to the good stuff....
What am I seeing? Officially supported overclocking on an Intel board?
Almost as rare as hen's teeth, but it does occasionally happen - and fortunately this is one of those occasions.
The "Failsafe Watchdog" works quite well, usually recovering after three beeps and a bit of a wait if you've pushed things so far; I only had to clear the CMOS on two failed overclocking attemps of many.
The first thing OC'rs will do is go to Manual on the "Host Clock Properties Override" - which will allow you to change the nominally 133MHz of the "base clock" that is multiplied out for processor, QPI and memory speeds.
Intel is warning everyone not to exceed AT MOST 1.65V for the total processor voltage - that is, the sum of the Static CPU Voltage Override and the Dynamic CPU Voltage Offset.
Given that we only have one i7 965, I obeyed this rule.
The Maximum Non-Turbo Ratio is the default multiplier; under load, the processor will use the X-Core Ratio Limits, where X is the number of active cores. Note that this is unfortunately only available with the i7 965 :-(
The Memory Override lets you enter your own timings, for the tests where I exceeded 1066MHz memory speed, I upped the command rate to 2.
The QPI screen is very important for clock speed based overclocks (instead of multiplier based ones). The i7 will only exceed 6.4GT/s by so much... very similar to how HT speed must be limited on Athlon's.
With this article, we re-tested the previous highest performance Intel Extreme series processor, the Core 2 Quad QX9770, The retests were done with the SSD provided in our review kit and the same video card - an HD 4870 - that we used to test the Core i7 processors in order to have a fair comparison between the previous Intel top of the line processor and the new Core i7 processors. We had to update to Sandra IX, and we added a number of high resolution gaming tests.
Hardware used for testing the motherboard:
Hardware used for testing the motherboard:
Benchmarks Used For now, here is a listing of the tests performed: For the additional gaming tests we used
For now, here is a listing of the tests performed:
For the additional gaming tests we used
Video drivers used were the latest Catalyst drivers 8.10. To make the tests interesting, the Core 2 Quad QX9770 was re-benchmarked with the solid state drive and video card used for the Core i7 tests, both at stock speed, and at 4.05GHz - which was the maximum overclock we got on the Core i7 965.
Video drivers used were the latest Catalyst drivers 8.10.
To make the tests interesting, the Core 2 Quad QX9770 was re-benchmarked with the solid state drive and video card used for the Core i7 tests, both at stock speed, and at 4.05GHz - which was the maximum overclock we got on the Core i7 965.
For Business Winstone we get a bit of a surprise - the stock speed Core 2 Quad QX9770 slightly beats the stock speed Core i7 920, 940 and 965 by a small margin! And when overclocked, the QX9770 again takes top spot, mind you, by a tiny margin, from the Core i7 965.
Content Creation presents a somewhat different picture. The previous champion, the Core 2 Quad QX9770 turns in the lowest result at stock speeds, beaten even by the lowest speed Core i7 920! (Mind you, by a tiny margin.)
When overclocked, the picture stays the same, and the QX9770 cannot stand up to the overclocked Core i7 processors - at least not when running at 4.05GHz.
The Core 2 Quad QX9770 is SLAUGHTERED for WinRAR by all of the Core i7 920 / 940 / 965 processors at stock speed, even when the QX9770 is overclocked to 4.05GHz.
The 3.2GHz Core i7 965 has almost twice the WinRAR-MT score of the 3.2GHz QX9770!
The average seek time for the SSD was 0.1ms.
The Intel SSD smokes the Seagate 10.7 7200rpm IDE drive it was tested against with at least 212MB/sec vs. 61MB/sec average read speed, and the results are largely independent of clock rate or processor. A VERY nice and fast drive!
HDTune confirms the smoking hot speed of the Intel XM25 SSD, however please note that the scale distorts the minor difference in these average read speeds. Again, the average seek time for the SSD was 0.1ms.
The Core i7 965 - and its 940 and 920 brothers - flex their muscles at Sandra CPU.
The Core i7 920 at 2.66GHz is faster than the Core 2 Quad QX9770 at 4.05GHz.
At 3.2GHz, the Core i7 965 is 71.2% faster than the QX9770 for Dhrystone (integer test), and 68.6% faster for Whetstone (floating point test)
For MMX, the 8x integer test shows the Core i7 965 at 3.2GHz to be 26.1% faster than the QX9770, and 78.5% faster for the 4x MMX floating point test.
The Core 2 Quad QX9770 is simply slaughtered here, basically by a factor of three at the same clock speed as the Core i7 965.
It's not even close.
The Sandra latency is better for the Core i7 than for the QX9700, but not by a large margin.
I will let the figures speak for themselves. Core i7 965 940 and 920 destroy the Core 2 Quad 9770 for RightMark Read.
Surprisingly, while the Core i7 965 and 940 hold a healthy lead, the 920 is not that much faster for RightMark Write. Only one GB/sec or so :-)
The other results are roughly 50% faster in favor of the i7's.
RightMark Bandwidth is NOT the QX9770's friend, all the i7's kill it. Badly. The stock Core i7 965 gets almost 4x as much bandwidth as the stock QX9770.
On-processor memory controllers and large caches sure help the Core i7 here... 2.5x-3x lower latency than a Core 2 Quad QX9770 with an X48 chipset.
For dual threaded MP3 encoding the Core i7 965 is about 6% faster than the QX9770.
Looks like TMPGEnc gives about a 13% advantage to the stock Core i7 965 over the stock QX9770.
At the same stock speed of 3.2GHz, the QX9770 is about 17% slower than the Core i7 965, and just a tad slower than a Core i7 920.
The Core i7 965 is 34.8% faster than the same clock speed QX9770, and even the Core i7 920 beats the QX9770 by 13%.
And just look at those overclocked i7 numbers!
For Doom 3, the 2.66GHz Core i7 920 is about one FPS faster than the 3.2GHz Core 2 Quad QX9770; and the 3.2GHz Core i7 965 smokes the QX9770 by 18.8%!
When overclocked to the max, the i7 965 got over 401FPS at 640x480!
Quake 4 is a bit more friendly to the QX9770, it actually beats the stock Core i7 920, but is beaten by almost 20FPS by the i7 965.
At the same 3.2GHz, the Core i7 965 gets 21FPS more (bit over 10%) than the QX9770. Need I say more?
Again, a big difference.
And once more. It always seems to be around 10% at low res.
Now where have I seen results like this before? Oh yeah, on the last page.
Call of Duty
An interesting twist here - the 4.05GHz QX9770 handily beats the Core i7 965 at 4.05GHz.
I can only think that COD is VERY sensitive to memory speed.
World In Conflict
As we have seen before, the higher the resolution and higher the AA and texture quality, the less difference the processor makes (given that you have a decent CPU).
Here are the actual results in numeric form:
|World In Conflict||1024 LO||1024 HI||1280 HI||1600 HI|
|Core i7 920 20x133 1066||162||62||56||49|
|Core i7 920 20x166 1333||198||77||62||53|
|Core i7 920 20x174 1394||205||74||65||54|
|Core i7 940 22x133 1066||167||71||62||54|
|Core i7 965 24x133 1066||174||65||61||52|
|Core i7 965 27x150 1200||221||82||66||53|
|Core i7 965 30x133 1066||218||89||72||62|
|QX9770 10x405 1620||182||76||65||54|
|QX9770 8x400 1066||150||61||58||49|
Crysis is obviously very GPU bound.
Here are the raw numbers:
|Crysis||1024 LO||1024 HI||1280 HI||1600 HI|
|Core i7 920 20x133 1066||66||56.1||44.5||32.6|
|Core i7 920 20x166 1333||72.6||60.2||45||32.7|
|Core i7 920 20x174 1394||73.7||61.2||45||32.7|
|Core i7 940 22x133 1066||70.2||58.6||44.7||32.5|
|Core i7 965 24x133 1066||72.1||60||45||32.8|
|Core i7 965 27x150 1200||74.1||61.5||44.9||32.7|
|Core i7 965 30x133 1066||73.5||61.4||45.1||32.8|
|QX9770 10x405 1620||75.5||62.2||45.5||33.4|
|QX9770 8x400 1066||69||58.1||45.4||33.4|
Devil May Cry 4
Devil May Cry 4 is also very GPU bound with the eye candy enabled.
More numbers for you:
|Devil May Cry 4||1024 LO||1024 HI||1280 HI||1600 HI|
|Core i7 920 20x133 1066||243.1075||177.025||137.47||108.8|
|Core i7 920 20x166 1333||232.1975||184.8825||142.7075||112.795|
|Core i7 920 20x174 1394||249.785||182.5525||143.77||114.1925|
|Core i7 940 22x133 1066||244.8875||180.8325||144.225||110.935|
|Core i7 965 24x133 1066||248.865||190.425||140.555||111.4425|
|Core i7 965 27x150 1200||248.7075||186.35||140.7575||110.5025|
|Core i7 965 30x133 1066||248.5275||185.1275||137.785||109.4025|
|QX9770 10x405 1620||258.6025||191.1875||140.65||110.5575|
|QX9770 8x400 1066||253.835||189.85||144.1025||110.9725|
Dynasty Warriors 6 Benchmark
Well, what do you know? At the "Low" setting, Dynasty Warriors is a decent CPU benchmark, but it is GPU bound on the "Hi" setting.
More numbers for you:
|Core i7 920 20x133 1066||118.7||151.1|
|QX9770 8x400 1066||133||173|
|Core i7 940 22x133 1066||135.3||171.2|
|Core i7 965 24x133 1066||137.1||167.9|
|Core i7 920 20x166 1333||147.8||177.2|
|Core i7 920 20x174 1394||155||196.4|
|QX9770 10x405 1620||161.6||210.4|
|Core i7 965 27x150 1200||177||401|
|Core i7 965 30x133 1066||184||331|
Overclocking the Core i7 processors was an interesting experience, one very reminiscent of overclocking AMD's Phenom's.
Intel's QPI "Quick Path Interconnect" is extremely similar to AMD's HT "HyperTransport", and since both the Phenom and the i7's also have on-board memory controllers, it should not be surprising that overclocking the parts is very similar in some ways.
Core i7 965 Extreme Overclocking
This is the new high-end "Extreme" series processor from Intel, and boy does it show it (in performance).
The Core i7 965 derives its nominal 3.2GHz clock rate by multiplying its 133MHz base clock by 24 - which is 3.2GHz.
Intel however introduced its "Turbo" mode, which automatically increases the multiplier to 25 when more than one core is loaded, or to 26 when only one core is loaded - so the nominally 3.2GHz processor runs at 3.33GHz under load, and up to 3.46GHz when only one core is loaded - without any overclocking on the users part.
There are two ways of overclocking the Core i7 965 - increasing the base clock rate, which also increases the memory speed and the QPI speed, or just increasing the "Turbo" mode multipliers. Modest overclocks can be achieved without raising the core voltage, but as I wanted to find out the air cooled limits of the i7 965, I went up all the way to the maximum 1.65V that Intel tells everyone NOT to pass.
The i7 965 we have had a 130W TDP, and a default Vcore of 1.15V - so I bumped it to 1.4V with a 250mV dynamic voltage override.
I was then able to enjoy the Core i7 965 running at a quite decent 4.05GHz - totally stable.
I was able to go higher, but it was unstable, and I did not want to exceed 1.65V as I did not want to take a chance on burning out our only 965.
I also wanted to see how far I could push the QPI, so I backed down to a multiplier of 24, and increased the bus speed until the system was unstable, then backed down. The system was stable up to an FSB of 150MHz, for a 7.2GHz QPI rate with the DDR3 memory running at 1200MHz with 7-7-7-20-2T timing. Pretty impressive.
Core i7 940 Overclocking
Based on what I learned from the i7 965 overclocking, I suspect that an i7 940 would overclock to at least 3.3GHz, perhaps far more - see my i7 920 results :-)
Core i7 920 Overclocking
Unfortunately the Core i7 920 has a maximum 20x "normal" and 21x "Turbo" multiplier, which cannot be adjusted in the motherboards BIOS, so I knew I'd be QPI and memory speed limited.
From my i7 965 testing, I knew that I could reach at least 7.2GHz QPI data rate, and at least 1200MHz DDR3 rate - however given the low multiplier, I had to try to go as far beyond a 133MHz base clock rate as I could.
Learning as I went, I had to adjust the QPI rate down to "official" 4.8GHz, and the memory timing to 8-8-8-24 - but after a lot of re-boots, and complaints from my co-workers about the "beep-beep-beep" the motherboard made on failed overclocks, I finally was able to reach the highest base clock I could get stable - 174MHz, a 30.8% increase over the stock 133MHz.
One problem I had was with the memory multiplier - even though the BIOS claims to support a 6x memory multiplier, it did not work for me, so I had to stay with an 8x memory multiplier; which in turn, based on the 1.65V maximum memory voltage, limited how far I could push the base clock. Frankly, with the 6x multiplier working, I would not be surprised if I could have reached a 200MHz base clock!
I can't wait to get my hands on some nice third party overclocking boards, I think 4GHz ought to be quite reachable with an i7 920.
All that CPU power does not come without a price - at idle, the Core i7 draws a bit less juice than the QX9770, but it draws a fair bit more under load. All those transistors need to be fed!
But let's be real - someone wanting to buy a Core i7 - regardless of it being a 920, 940 or 965 - is not really worried about power consumption, but about raw power.
And the i7 has that in spades.
And the raw data:
|QX9770 8x400 1066||169||231|
|Core i7 940 22x133 1066||159||241|
|Core i7 920 20x133 1066||157||243|
|Core i7 965 24x133 1066||162||267|
|Core i7 965HD 24x133 1066||164||272|
|Core i7 965 30x133 1066||184||331|
|QX9770 10x405 1620||218||339|
|Core i7 965 27x150 1200||177||401|
|Core i7 920 20x166 1333||204||409|
|Core i7 920 20x174 1394||206||414|
The Intel Core i7 processors are a significant evolutionary step forward from their Core 2 predecessors.
In all but a couple of tests, the 2.66GHz Core i7 920 handily beats the 3.2GHz Core 2 Quad QX9770 - and in some tests by an embarrasing margin. Needless to say, the Core i7 965 tends to wipe the floor with the QX9770 in memory or processor bound applications, and I think that AMD is in for a nasty surprise on the server side - the memory bandwidth improvements and QPI will allow Intel to finally build multi-socket servers that outperform current AMD Opteron offerings - but who knows what Shanghai will bring to the table?
Overclocking will become more difficult with the i7 parts than the Core 2's - in the past, one could buy a Core 2 meant for a 1066MHz bus and get crazy overclocks by cranking the FSB to over 1600MHz - 50%-60% overclocks were quite doable with good cooling.
With the Core i7, except for the Extreme parts, the issue has become more complicated, as the balancing act between QPI, base clock and memory speed - with the limited QPI and memory multipliers - makes it tougher to achieve spectacular overclocks on the lower end parts. Overclockers may find that it will be necessary to migrate to mid-range parts at least in order to get high overclocks.
Fortunately, for gamers it does not really matter.
In high-resolution, high AA/AF and high texture resolution games, we will be GPU limited for the forseeable future. I added a lot of high resolution eye-candy game benchmarks to this review in order to see what difference it makes - and the fact is, it makes very little difference at the high resolutions, with high AA/AF. As long as you have a decent processor - and the Core i7 920, the lowest end i7, certainly qualifies as one - or even an E8500 - get a great GPU and frag away.
The situation is VERY different for multi-media professionals.
The transcoding and rendering speed of the i7's has to be seen to be believed. The rendering times blew me away - and the increase in speed to be had for transcoding from high performance SSD drives such as the Intel XM25 is also well worth considering. For multi-media, a Core i7 940 or Core i7 965 has the potential for quickly paying for itself.
Servers will also likely benefit greatly from using an i7 - the memory bandwidth is simply insane. Later, when I can test some low voltage higher speed DDR3's I expect to see significantly greater memory bandwidth - but even now, with DDR3-1066, you can get roughly three times the memory bandwidth that you could get from the previous champion, an X48 DDR3 setup. SSD's will also be big for servers, just watch.
I was not sure Intel could do it again - have another step-like increase in performance like it did with the Pentium D to Core 2 transition - but they pulled it off. The Core i7's are insanely fast for integer, floating point and MMX computation, and they have the memory bandwidth to keep those hungry cores fed.
Well done Intel.
Please do not redistribute or use this article in whole, or in part, for commercial purposes.