Author: William Henning
Editor: Howard Ha
Publish Date: Wednesday, November 1st, 2006
Originally Published on Neoseeker (http://www.neoseeker.com)
Article Link: http://www.neoseeker.com/Articles/Hardware/s/core2quad_qx6700/
Copyright Neo Era Media, Inc. - please do not redistribute or use for commercial purposes.
QX6700 Extreme: Quad Core is HERE!
Finally we get to publish the "straight scoop" on Intel's new Kentsfield Core 2 Quad processor. We spent the last few weeks with Intel's QX6700 and today we're reporting on our findings. But first, let's talk a little about Core 2 Quad in general.
If you're not already familiar with Kentsfield, it is Intel's newest processor sporting 4 cores, aka "Core 2 Quad". The outstanding Core 2 Duo, codenamed Conroe, of course is a 2 core part, and Kentsfield is its Quad core sibbling.
As you know, the Core 2 Duo is a true dual core CPU in that both cores are on the same physical die (leading to the "Core 2 Duo" moniker). The Core 2 Quad takes it one step further by placing two dual core dies into the same physical processor package - that is two of the Core 2 Duo processor dies are placed side by side and connected internally, with both chips sharing the pins on the BGA 775 package.
Now why would Intel do this?
This was the quickest way of getting a four core product to market, and since the clock speed race has been side tracked into a race for more cores as a way of increasing total system performance, it was a smart temporary move until they manage to get four cores onto the same die.
The rendered image of the Kentsfield above shows how it all goes togeather - the heat spreader on top of the two dies, the two dies themselves, each measuring 143 square milimeters - and that small area holds 291 million transistors. Each of the two dies has 4MB of L2 cache shared between the two cores (per die), so there is actually 8MB of L2 cache in the package. (For more information about the Core 2 Duo architecture see our Core 2 Duo launch article)
The codename for this chip was "Kentsfield", and the first member of the Core 2 Quad family is the 2.66GHz QX6700 - which is being launched today.
(we would like to thank Intel for providing us with a QX6700 and 975X motherboard in time for us to do a launch review)
We've had the QX6700 Quad core chip in the lab for a few weeks, and have been running benchmarks essentially non-stop. Originally the launch date of Kentsfield was November 14th, but the powers that be decided to push the schedule up to Nov 1st, 9:00PM PST, so here we are. Had the original launch date of November 14th been retained, we'd have run even more tests :-) but as it is, I think you will be happy.
Not only did we test the QX6700 at its stock speed of 2.66GHZ with a 1066MHZ FSB - we also tested it at some of the speed grades that we can expect from Intel in the near future - as well as a maximum stable overclock series of tests.
The speeds we tested at are:
Note 1: Intel has not released or endorsed these names or specifications, they are educated guesses based on the Core 2 Duo naming scheme.
Note 2: It would not make sense to label a 3.33GHz processor with the same number as a 3.19GHz processor, so I cut the difference in half.
During the recent Intel Developer's Forum, Intel had a presentation on Kentsfield - A quick review of some of the slides from that presentation will yield some insight into this new chip.
As you can see, they are making basically the same points as they did during the Core 2 Duo launch - this makes sense, as Kentsfield is simply two Core 2 Duo processors packaged in one BGA (Ball Grid Array) processor package.
Intel's slides promise a massive 70% greater performance over the X6800 Extreme processor, and today we're here to explore just what Kentsfield means for the enthusiast and corporate users and figure out whether or not Intel can deliver on this promise.
Frankly, its not possible for them to deliver 70% across the board. It is logically impossible.
If you have a single threaded application, and it is the only process running, the X6800 will be up to 10% faster, simply based on its 2.93GHz clock rate versus the QX6700's 2.66GHz clock rate. There is no magic to this, simply logic.
For performance improvement, we have to look at multi-threaded applications; and to a lesser extent, the "snappyness" of the system under load, when there are several applications running. Unfortunately, "snappyness" is difficult to quantify, as it is a subjective experience, so we will basically stick to benchmarking software - where we can measure the speed increases (if any).
Intel realizes this - and provided a slide showing some of the vendors of applications that can take advantage of the Core 2 Quad.
Intel DID however deliver on their promise to have Kentsfield launched in November, and the part is expected to be available for sale around middle of the month (Intel says on the 14th).
Before we get to the testing and technical details, I'd like to discuss why you should care about quad core processors.
I mean, let's be real - dual core processors have only really become popular in the last year, and already we have a quad core offering from Intel, with AMD trying to launch a quad core device by Q2 2007... why should we care?
Reason #1: SPEED!
Reason #2: MARKETING
Reason #3: MULTIMEDIA
Reason #4: VIRTUALIZATION
Reason #5: POWER SAVINGS
Reason #6: CHEAPER DUAL CORE PROCESSORS!
OK, now that we've listed some reasons why you should care about quad core processors, let's take a quick look at the approach Intel took with their first offering, and take a sneak peek at what they will offer in the future.
First, you should understand that the QX6700 is based on *EXACLY* the same core as the Core 2 Duo E6700, its just that Intel put two of them into a single processor package - the same thing they did when they made the D 900 series of dual core 65nm processors.
Instead of regurgitating the lower level architecture of Core 2 Duo processors, we will be concentrating on the differences between the QX6700 and the E6700 - however if you are interested in such architectural details, don't worry, you can read about them in our original E6700 article.
This approach that Intel took has both advantages and disadvantages, for both Intel and consumers.
Ok, that's great, we have a list of advantages and disadvantages. But what does it REALLY all mean?
Well, for a regular user, all it means is that they get a Quad core processor sooner than if they had to wait for a single die solution.
For servers, it means that the solution is less than ideal, and we can expect a small but measurable performance hit compared to a "true" quad core die solution. Nevertheless, it will still be significantly faster than a "plain" dual core chip.
For heavy multimedia producers, it means they can get their hands on a big performance boost without having to wait for quad cores on one die - and if they make money producing said multimedia, they will be lining up to buy Quad core processors, as it will give them a significant capability boost for rendering and post production.
For gamers... well, the gaming houses still have to multi-thread their engines.
If you take at the below Intel roadmap, you can take a peek into the near future of Intel processors for the Extreme and Performance/Mainstream segments.
We can bid a fond farewell to the 800 series of dual core processors at the end of the year - the 805 D was a sweet bargain overclocker for a long time; but I suspect that remaining stocks will show up in the Value segment until everyone's warehouses are empty.
The 900 series will apparently leave the channel around mid next year, with the 955 and 965 Extreme's losing their Extreme status about now. The chip we are looking at, the QX6700 will lose its Extremeness after 2007, and the Core 2 Duo 65nm chips will be around for a while yet :-).
Intel wants to make a big push to recover the market share it lost to AMD Opterons in server space. The problem is that until Intel gets away from having a shared FSB for its processors, it is inherently hobbled compared to AMD's HyperTransport interconnection scheme; something that is especially evident in systems with more than two processors.
Intel Marketing has put out a slide showing the new quad core server chips as being 50% better than existing Xeon 5100 - which are actually Core 2 Duo's in a 771 pin LGA package. Overall, given moderate to high server loads, 50% may actually be a reasonable estimate of the speedup you would get from going to quad core processors per socket on an Intel platform.
Intel also put in a line claiming 150% better performance than the competition - and stated that the estimate was based on SpecInt compared to a dual core Opteron. I can actually believe this, as it compares a quad core Intel processor to a dual core AMD processor, for an integer benchmark that is not a memory bandwidth hog.
The situation would be different comparing a four processor 5100 based system to a four processor Opteron system with a workload that was memory bandwidth hungry.
Intel is quite aware of the memory bandwidth limitations of the front side bus - and did something clever about it with the "Bensley Platform" - their dual FSB quad memory channel chipset for servers.
To put it simply, the chipset has TWO front side busses, one for each processor socket, and four memory channels.
This increases the potential memory bandwidth to 21GB/sec, and also allows up to 21GB/sec traffic (10.5GB/sec per socket) to the processors in order to work around the FSB bottleneck; and this was a simpler and quicker solution than replacing the FSB with something like CSI (Intel's upcoming alternative to Hypertransport), leading to faster time to market.
The downside is that I don't even want to think about how many pins the Northbridge must have...
Four socket platforms will still be crippled by the FSB, and if you notice the chart above, the bandwidth to the processors is given in a smaller font size than before - why?
The answer is simple. It looks like for four processor socket systems, Intel has to drop the FSB to 800MHz - and two 800MHz FSB's will give you the 12.8GB/sec theoretical bandwidth indicated above. They had to reduce the bus speed to accomodate the four loads placed on the bus by the two dual die sockets; also please note that cache coherency traffic between all the dies will be going over the shared FSB as well, reducing the bandwidth available for memory traffic.No, its not a mistake - a two socket Intel motherboard will have almost twice the potential bandwidth available to two quad core processors that a four socket motherboard will make available for four quad core processors. A quad socket Opteron motherboard will by comparison have 12.8GB/sec * 4 = 55.6GB/sec of memory bandwidth available, PLUS significant additional bandwidth from the Hypertransport links... but will be limited to just 8 cores over the four sockets (until K8L shows up) compared to Intel packing 16 cores into four sockets.
Intel showed some of the big software vendors who are building multi-core friendly software that can take advantage of quad core processors.
The server roadmap shows us a taste of things to come...
Intel has not given up on Itanium, and will be bringing out Montvale in roughly mid 2007, followed by Tukwila and later Poulson. Frankly, I wish they would use those resources to improve their x86 lineup by introducing CSI and on-chip memory controllers.
In the 7000 series, we get Tigerton mid next year, followed by Dunnington; and its a pretty safe bet that Intel will be concentrating on increasing the number of cores and improving their chipsets.
The 5000 series gets the Quad Core 5300's about now, and "Future Product" after 2007. I'd guess 8 cores in 2008.
For their UP platform, Quad Core 3200's will show up around January, with future products in 2008 and beyond.
Because Quad core processors are such a step forward as far as testing requirements go, we did something a bit different than a lot of our other reviews. Intel was kind enough to send a list of "recommended benchmarks" for showing the power of the QX6700.. and we carefully reviewed the list, however we noticed that most of the software listed had newer versions available on the net, and in some cases, the tests required files that were not available on the net.
So we decided to update the list of software; downloading newer trial software, and where possible, using test case files that are available to the public at large - as well as running the benchmarks we normally use for processor reviews.
We also ran our "standard" processor benchmarking suite on the motherboard supplied by Intel, but we later switched to our excellent Asus P5W DH for our overclocking tests as we know the boundaries of that board and did not have time to fully explore the Intel board.
Software used during testing consisted of the following:
In order to keep the testing as fair as possible, we will use the following test platform:
For the FX-62::
As is our custom for a while now, we discuss our overclocking adventures at the end of the article. However, in the results you'll see in the following pages we include overclocked benchmarks to show you what gains you might get if you also achieved similar overclocks - our test sytems were all stable at the settings shown with air cooling. The chart labels incorporate a lot of information about the test configuration. The first line shows the socket type and the model of the processor. Since all the processors shown are dual core devices, we did not specify that on the charts.
The second line shows the "FSB/HT clock rate" x "CPU multiplier" and the effective DDR memory speed. Please note that all DDR2 tests were run at 4-4-4-12 timings where possible.
Ok, enough talking about the basics... let's get on with the testing!
Notes about our charts:
Please note that to help you digest the chart information we use a different colour to mark the labels and the bars/lines so that the stock speed products will stand out and be easier to identify. Just look for the dark red text and darker coloured bars - for this review, the Core 2 Duo E6700, Core 2 Quad QX6700 and Athlon FX-62 running at stock speeds will be highlighted. This differs from how we normally highlight the product we review because we have such a large sample of data for this review.
For some of the charts we have omitted some less important single threaded results in order to save time as those results are of little interest when reviewing a quad core processor.
Business Winstone is an old benchmark, yet still an appropriate one for evaluating the speed of modern processors for standard office usage.
Here we see that the AMD Athlon FX-62 outperforms the Core 2 Duo and the Core 2 Quad. What does this tell us? Simple. For standard office use - that is, browsing the net, email, word processing, and spreadsheeting, a QX6700 is WAY overkill.
The Multimedia Winstone is yet another oldie but goodie... and again, the Athlon FX-62 outperforms the Core 2 Duo and Core 2 Quad - this time even when the Core 2 Quad is overclocked two speed grades beyond the launch product!
Mind you, we should take this test with a grain of salt, as more modern multimedia applications would be capable of taking better advantage of threading, and of the four cores... but until we find a more approrpriate and up to date business multimedia benchmark, we are stuck with one which cannot appreciably take advantage of 4 or more cores.
Photoshop 5 Elements
Intel suggested using Photoshop Elements 4.0, and batch converting some supplied images. Adobe is one of the more pioneering developers in that they have been supporting multi-threading since the original dual core Intel processors were released, but we haven't seen huge performance leaps from their multi-threaded applications.
Given that Photoshop Elements 5 is now available, I downloaded the trial edition and used that for the test. As you can see, at the stock settings, the E6700 was only beaten by ONE second by the QX6700.. so even if there is multi-threading going on, it must not be using more than two threads.
ABBY FineReader 8.0 Professional
Intel also suggested using ABBY FineReader 8.0 Professional - a popular OCR (optical character recognition) package for getting text out of images - as one of the benchmarks, so I ran it against one of the Intel Developer's Forum PDF files (an 82 page document).
At stock speeds, the QX6700 beat the E6700 by a measily five seconds, less than 1.5% - so if Abby is multi-threading, it is again limiting itself to two threads. Pity.
Sandra CPU Bench
I have been resisting moving to Sandra 2007 simply because of the vast database of Sandra 2004 results we have; however it looks like I no longer have the option of staying with 2004, as the readings are sometimes unbelieavable.
The AMD Athlon FX-62 is left in the dust by even the stock Core 2 Duo E6700, never mind the Core 2 Quad.
Remember, Sandra CPU is a pure processor benchmark, and you cannot expect to correlate the differences in speed you see here with real world performance for anything except other pure processor benchmarks :-)
Sandra Memory Bandwidth
Sandra Memory Bandwidth tells a different tale - here the FX-62's on-board memory controller and latency allow it to dominate the chart, besting even the overclocked Core 2 Quad results. Its literally no contest.
WinRAR has the FX-62 at the bottom of the heap, and as we would expect on a single core test, the four core QX6700 and the dual core E6700 score the same at the same clock speed.
Oh my does Kentsfield shine here!
The multi-threaded version of WinRAR definitely makes decent use of the additional cores, getting a 27% advantage of the dual core E6700 at the same clock speed.
We did not get any surprises here... the on-chip memory controller of the AMD Athlon FX-62 allows it to beat the Intel Core 2 Quad QX6700 at stock speeds by 21%. If your applications are VERY memory read oriented, the FX-62 is a better way to go - but the vast majority of applications are not that memory bound.
As we also expected, there is no real speed difference between the QX6700 and the E6700.
As we expected, the FX-62 dominates here by an amazing 67%
There was no significant difference between the dual and quad core Intel parts here.
Umm... the on-chip memory controller of the FX-62 wins here BIG time. The Intel parts are 230% slower for latency. Ouch.
Again, dual vs. quad core makes no difference here, as we are really measuring the 975X chipset's memory controller performance with this test (and the dimm's of course).
I really find this result puzzling; and I can come up with only two possible interpretations.
1) The way memory bandwidth is calculated is wonky - remember, the FX-62's on-chip controller won each of the individual read / write / latency tests
2) The memory prefetch logic in the 975X does an amazing job
I really don't know which of the two reasons above applies - perhaps both - however the FX-62 loses out by a small margin, and the dual and quad core devices score effectively the same.
The dual core E6700 and the quad core QX6700 score exactly the same here; this is to be expected, after all, this version of LAME is not multi-threaded, and the cores are running at the same speed.
All of the Intel results best the AMD FX-62 results by 26%
I was dissapointed to see that the LAME multi-threaded test only uses two threads, so it cannot show an advantage for a four core processor over a two core processor. Perhaps the next version will allow us to control how many threads it spawns?
Since only two threads are used, the E6700 and QX6700 perform the same at the same clock speed.
Intel suggested that the WAV to MP3 conversion function would be a good test for Kentsfield, so I gave it a try.
iTunes dissapointed me. While it showed a small 3 second (3.5%) advantage to the four core processor; its pretty obvious that they also only use two threads.
When will people learn? to write properly multi-threaded apps, you don't design for the lowest number of threads!
TMPGEnc gave very interesting results. So much so, that I will be investigating them further in a future article; suffice it to say that it was obvious that the benchmark was using all four cores - see the performance monitor screen capture below - but the quad core QX6700 was only 15% faster than the dual core E6700... I suspect that the reason we did not see a bigger increase was due to waiting for disk I/O to complete.
The AMD Athlon FX-62 finished dead last, taking 20% longer to complete the multi-threaded test.
During an un-recorded test of TMPGEnc, I took a look at the processor usage, and found something VERY interesting:
TMPGEnc was using all four cores! But none of them were 100% busy.
I am guessing TMPGEnc was waiting for the OS to process disk I/O.
3D Studio Max
Ok, I will make this simple.
If you are a heavy 3D Studio Max user, you need a Core 2 Quad. Period.
Look at the chart below!!!!
At stock speeds, the Core 2 Duo E6700 takes 88% longer to render the test image.
That's right, the QX6700 is almost twice as fast.
3D Mark 2006
We see almost the same thing here as we saw with 3D Studio Max, but this time the QX6700 was "merely" 70% faster than the E6700.
If you are involved with heavy duty rendering... four cores are for you.
Now why am I not surprised that we got a similar speedup here?
The E6700 took 74% longer to render the test scene than the QX6700... four cores are the way to go for rendering.
The comparison FX-62 result is pleasantly close to the E6700, but is left in the dust by the QX6700.
Ok, here the Core 2 Duo E6700 and the Core 2 Quad QX6700 perform the same.
Unfortunately it looks like POVRay only launches two threads, and as such, does not take advantage of all the cores in Kentsfield.
I hope this will change in the next version!
A note about our gaming tests
We are interested in how a processor might affect gaming tests with an "infinitely fast" video card.
As unfortunately a ludicrous speed video card does not exist, we approximate it as closely as possible by running the tests at 640x480 with no or minimal AF and AA - thereby removing the GPU speed as much as is possible from the equation.
This does tend to emphasize gaming performance differences in our charts FAR more than you will ever see in real life - where the tests run at 1280x1024 8xAA 4xAF the differences would be minimal - but that would defeat our purpose, which is seeing which processor would be best with the fastest video card imaginable. For example, when the next generation NVIDIA G80 and ATI R600 are released, the results we get in testing now will more likely be reflective of how those cards will help scale performance.
Ah, our old friend, Doom 3...
Unfortunately Doom 3 does not benefit from the additional cores in Kentsfield, so the performance of the QX6700 is basically identical to the E6700. Both beat the Athlon FX-62.
Another older game, that scales nicely with clock speed, but again is not multi-threaded.
We need some good heavily multi-threaded games to test... is any software publisher listening?
The additional cores of course did not make a difference here, and again, the FX-62 loses.
We see an almost 10% increase from the dual core E6700 to the quad core QX6700 on the same motherboard, with all other components being identical. Since the game is not multi-threaded, the difference must be due to drivers taking some advantage of multi cores.
Surprisingly, the FX-62 was almost as fast as the E6700 here.
Call of Duty
As Call of Duty was not written with multi-threading in mind, I am not surprised that there is basically no difference between the dual core E6700 and the quad core QX6700. The FX-62 is left seriously behind.
Again, there is basically no difference between the dual and quad core processors, and again, AMD is left lagging.
Unreal Tournament 2004
Good thing this is the last gaming chart - I was running out of ways of paraphrasing "no difference between dual core and quad core for this test". The FX-62 loses - again.
The overclocking potential of the Kentsfield is limited by two main factors:
Intel has a nice X-ray style image that shows an illustration of the two dies within the same package - illustrating how the Kentsfield works.
As an "Extreme Edition" processor, the multiplier is unlocked both up and down... so I took the opportunity to test the increased multiplier based overclocking limits of the QX6700 on the original "BadAxe" Intel motherboard that was shipped to us for the review with the processor.
For FSB based overclocking, I moved the processor into our Asus P5D WH, which is our current in-house overclocking champion. It did not dissapoint us, and we got some excellent results, as you could see in our charts.
As always, there were four parts to the successful overclock:
Well, five if you consider using sufficient cooling for the processor.
We already knew how fast our Asus P5W DH could go, and given the capacitance and thermal characteristics of the QX6700 we did nto expect to be able to overclock as far as our current in-house record for Core 2 Duo's (3.64GHz with a Core 2 Duo E6400), and did not expect to match our E6700 record of 3.5GHz.
I was pleasantly surprised to be able to get the QX6700 to be stable at 3.45GHz at a 345MHz FSB with a 10x multiplier - frankly, I did not expect to get much higher than 3.33GHz, but this shows you what a combination of excellent parts can sometimes achieve.
Oh sure, if I'd been willing to risk the processor and go for exotic cooling, I might have been able to go somewhat higher - but frankly, I did not think the risk was worth it.
In order to run at 3.45GHz we:
Ok, I admit, that was a lot of test results to go through.
But what is my overall opinion?
Kentsfield, the QX6700, is a very interesting product, which will greatly benefit those that use highly multithreaded applications that can make good use of the four cores.
Intel is working hard to promote multi-threading to software houses, and is producing quality tools to identify critical paths and allow for better load balancing. I am skeptical about automatic loop parallelization, as most loops are too fine grained to make effective use of multiple cores given the overhead of launching and retiring threads.
That is a pretty impressive list showing the buildup of companies working on multi-threading their applications, however until the products are on the shelves and ready for purchase, most of the advantages of four cores will not be available to the average user.
For the average home user, it is overkill, but then again, the average home user is not going to spend $1000 on a processor either; so that is not the market it is aimed at.
For gamers, while Kentsfield does not hurt, an X6800 will beat it in most games as the software houses STILL have not significantly multi-threaded their game engines yet. But when they do... watch out!
For media professionals, who need to transcode videos, render animation, encode music and video, the QX6700 is a winner. It can literally almost double the performance of a system for these applications; which, if you are a production house where time is money, can save back the cost of upgrading to Kentsfield lickety-split.
For developers, VT support with four cores means being able to run quite a few virtual machines at high speeds; a capability that will also be most appreciated for server use.
I have not tested it for server use yet, however I suspect it would perform fairly well, but not as well as a two socket dual Opteron solution due simply to FSB vs. Hypertransport and on-chip memory controllers. It would scale very well for applications that are more compute bound than memory bound, so it is still an interesting product. Once Intel goes to a native quad core solution (all four cores on the same die, with out of band cache coherency control) with an integrated memory controller and CSI AMD will have some tough competition to face in the server market; however until then I expect Opterons to dominate.
Mind you, AMD has quite a bit of catching up to do on the home/small business performance part, Core 2 Duo bests the FX62, and Kentsfield won't have any real competition until the K8L arrives - mind you, 4x4 might surprise us as it has far greater memory and I/O bandwidth.
Would I buy a Kentsfield?
If I was making a living with video, audio, or rendering, in a minute.
Otherwise, I'd go with a nice Core 2 Duo, or even an AM2 X2 based system!
Please do not redistribute or use this article in whole, or in part, for commercial purposes.