Author: Carl Poirier
Editor: Howard Ha
Publish Date: Wednesday, October 12th, 2011
Originally Published on Neoseeker (http://www.neoseeker.com)
Article Link: http://www.neoseeker.com/Articles/Hardware/s/amd_fx-8150/
Copyright Neo Era Media, Inc. - please do not redistribute or use for commercial purposes.
The wait is finally over! No more delays, no more teasing, no more slow-rolls. AMD's Bulldozer is here, and the eagerness of Neoseeker's readers to know how it performs can be felt throughout the network. Why has the wait been so long? AMD has designed Bulldozer from the ground up, and thus it is not simply the follow up to the K10 core like the Llano is. In contrast, Intel's Sandy Bridge has been in talks since 2005, but was revealed to be a mere evolution from the previous series. Six years for a simple upgrade!
The new Bulldozer architecture sounds rather promising based on what AMD has already said about it. The question is, does it actually deliver? In 2006, Intel abandoned its Netburst architecture used since the beginning of the new millenium by releasing the Core architecture, which was derived from 1995's Pentium Pro instead. Will Bulldozer be doomed to a similar destiny? Will it have heat and power consumption problems? Neoseeker's review aims to find out how the first iteration of the new architecture stands - and if it is a viable future.
What's interesting to note is that the Bulldozer architecture is being launched for both the server and desktop markets. In fact, the same dies are being used to manufacture the Zambezi, Valencia and Interlagos processors, which are the desktop, 1-2 socket and 2-4 socket platforms respectively. Can Bulldozer really cater to both markets, though? Its performance in consumer applications will be the deciding factor.
Looking at the specifications, the FX-8150 chip reviewed today seems rather well equipped; this newcomer makes for the first eight-core in its target market. AMD says it has been designed for higher frequencies, and the numbers seen here speak for themselves. The large 16MB of L2 and L3 cache is also quite promising; in previous articles, it became quite clear that the Phenom II's 6MB of L3 cache gave it the edge over the Athlon II. It is also important to remember that with Bulldozer, the modular unit is now comprised of two cores; unlike the previous processors where each individual core could be disabled and and later unlocked, it is now done in pair of two cores, or one module.
|Model Number & Core Frequency||FX-8150 / 3.6GHz (4.2GHz Turbo)|
|L1 Cache Sizes||
64K of L1 I-cache per module and 16K of L1 D-cache per core (384KB total L1 per CPU)
|L2 Cache Sizes||2MB of L2 data cache per module (8MB total L2 per processor)|
|L3 Cache Size||8MB (shared)|
|Total Cache (L2+L3)||16MB|
|Memory Controller Type||Integrated 144-bit wide memory controller, configurable in dual 72-bit channels|
|Memory Controller Speed||Up to 2.2GHz with Dual Dynamic Power Management|
|Types of Memory Supported||Unregistered DIMMs up to PC3-15000 (DDR3-1866MHz)|
|HyperTransport 3.0 Specification||One 16-bit/16-bit link @ up to 5.2GHz full duplex (2.6GHz x2)|
|Total Processor-to-System Bandwidth||
Up to 37.3GB/s total bandwidth [Up to 21.3 GB/s memory bandwidth (DDR3-1333) + 16.0GB/s (HT3)]
|Packaging||Socket AM3+ 942-pin organic micro pin grid array (micro-PGA)|
|Fab location||GLOBALFOUNDARIES Fab 1 module 1 in Dresden, Germany (formerly AMD Fab 36)|
|Process Technology||32-nanometer DSL SOI (silicon-on-insulator) technology|
|Approximate Die Size||315mm²|
|Approximate Transistor count||~2 billion|
|Max TDP||125 Watts|
The FX-8150 is not the only model that's being launched; as with earlier chips, AMD has played with the number of cores and frequency. It is unknown yet if the disabled modules can be unlocked.
|Model||CPU Base||Turbo Core||Max Turbo||TDP||Cores||L2 Cache||NB||MSRP|
Specifications and diagrams are courtesy of AMD.
As mentioned earlier, Bulldozer is a brand new architecture from AMD. It shares nothing with the Phenom II. So how was it designed? Starting from the beginning, reinventing the 4004 isn't quite how the engineers proceeded; decades of CPU design cannot be thrown away so easily after all. Instead, AMD started off with the general idea of what a modern processor looks like. Simply put, a processor core is composed of instruction fetch and decode stages, the floating-point and integer execution units, some cache, and a link to a northbridge which handles memory access and further I/O. Single-core processors have now become quite rare these days, with the exception of low-power platforms such as the entry-level AMD Fusion processors, the Intel Atom or the VIA Nano, so at least two cores are found in a chip.
This basic concept has been improved over time, but it may have attained its limitations. It needs some drastic changes to continue marching forward. AMD's focus here was to maximize the instructions per watt while offering more cores, which is the best way to increase the overall throughput of a server or cluster. The smaller a core is, the more can be put in a chip, obviously. In this field of engineering, there's a principle which says that the most common use case must be favored. This does not go hand in hand with the fact that floating-point operations account for only 20% of the CPU usage compared to 80% for integers according to AMD, and that their operations are much more complex, thus requiring lots of die space. In order to save some of it, AMD started off with the idea of sharing one FPU over two cores. However, many computing applications make intensive use of them, and newer sets of instructions have recently appeared to boost their performance, such as the 256-bit Advanced Vector eXtensions (AVX) featured on Intel Sandy Bridge processors. The FPU in Bulldozer does support them as well, but when they are not in use, each core in the module has access to half of the pipelines for 128-bit calculations.
The saved die space has been spent on aggressive features that benefit both cores, namely prefetching. The shared frontend prefetches instructions in a dynamic fashion, according to the destination addresses of branches stored in the two levels of the branch target buffer that are 512B and 5KB in size, respectively. For those unfamiliar with the branch prediction concept, what's basically stored in these buffers is the actual memory address of the instruction located at the branch destination. There is a rule of thumb which states that a processor spends 90% of its time in 10% of the code; when a branch is being taken, it is likely to be taken again very soon, so keeping its previous target at hand instead of waiting for its computation can save some precious cycles. The prediction pipeline is free to run as long as its queue (dedicated for each thread) is not full. By looking at the Relative Instructions Pointers (RIP), the instruction fetch pipeline can then predict future cache misses. As for the 64KB instruction cache, the two threads compete dynamically for it. There is a slight problem though; in some specific cases, there can be an excessive number of cache invalidations, forcing the instructions to be fetched again. There was a discussion back in July between some folks at AMD and some other guys, namely Mr. Linus Torvalds himself, about patching the Linux kernel to prevent this, which currently hasn't been done. Supposedly this would lead to a 3% sacrifice in performance, but rumors posit this to be higher on Windows. This bug doesn't affect the viability of the system like the Intel P67's premature SATA degradation or the AMD TLB bug did, though. Fixing it in the next core revision would of course be better than software workarounds, and allow for a measurable performance boost. Finally, another big difference with Phenom II is the addition of a fourth instruction decoder, which puts it on par with Sandy Bridge. All of these improvements should help maximize the use of the execution units.
Each ALU has its own thread scheduler. One can see the pipelines for division and multiplication, as well as address generation. The latter serves the fully out-of-order load and store unit, which can handle two 128-bit loads and one 128-bit store per cycle. The queue for each of these operation is 40 and and 24-entry long, respectively, and the data cache is 16KB in size. There's also some register renaming going on in there to avoid unnecessary data hazards, or dependencies. For example, if one instruction stores the value of a given register in memory, and the following instruction wants to use the same register for storing the result from the ALU, the result will be put in another register instead of waiting for the memory store to be completed.
The FPU, as explained above, is shared between two cores. To allow such a configuration, AMD has adopted a coprocessor arrangement. The unified FP scheduler manages both threads, and when the execution is completed, the parent core is advised. Two of the pipelines consist of Fused Multiply-Accumulate (FMAC) which, in the four operand form adopted by AMD, can be described as follows with the arrow representing a store operation: A ← B + C x D. The upcoming processors from Intel are also going to feature FMA pipes, however they are going to be in the three operand, or destructive form, like this: A ← A + B x C. Obviously, keeping A unmodified has its advantages. If it needs to be used for other operations, it will need to be copied over to other registers before doing an FMA3 operation, thus adding more instructions. To maintain compatibility, AMD will also support the three operand form in the next core, dubbed Piledriver. The other two pipelines in the coprocessor are actually integer pipelines, which can also work with 128-bit operands for the SSE instructions. They take care of the operations in the XOP instruction set as well, which along with FMA4, forms what was originally supposed to be SSE5, first proposed by AMD back in 2007. Once again the reason for this change is to have a better compatibility with Intel's instruction set. XOP contains integer vector operations such as multiply-accumulate, compare, shift, rotate, permute, and more. So there is a great opportunity for developers to get tremendous boosts in speed with these new SIMD instructions.
Then there is the 16-way unified L2 cache, 2MB in size. Since it is shared between two cores on eight, the core on which a given thread is scheduled might affect peformance; if two threads of a program are in the same module, they will share their L2, otherwise they have to rely on the slower L3. The Windows scheduler is obviously not aware of such a detail of the implementation, but supposedly the Windows 8 developer preview shows some benefits due to its better scheduler. What is important to note also is that unlike the L1 cache, the L2 is exclusive in regard to its higher sibling which results in a total of 16MB of data. Additionally, the 8-way L2 translation lookaside buffer, used to do the conversion between virtual and physical memory addresses, has 1024 entries and services both the instruction and data requests. Finally, there are data prefetchers which try to predict data use and bring it into cache ahead of when the processor executes the load.
The integrated northbridge has also been redesigned. After the synchronization between the four modules, the requests are sent to either the L3 cache or the memory and the rest of the system via the HT link. There are also two Advanced Programmable Interrupt Controllers (APIC).
There is also an Application Power Management (APM) module somewhere in there which measures the TDP headroom for the Turbo Core 2.0. If there is enough of it, all cores can get a 300MHz increase, significantly boosting the performance. This case happens when an application uses more than four cores, but doesn't load them up to 100%. The major difference with the previous version seen in the Thuban die is the addition of a second Turbo mode. If no more than half of the cores are active, the unused modules can go into C6 state and allow the others to level up another 300MHz, for a total of 600MHz on the FX-8150. On the FX-8120 model, this ramps up to 900MHz higher than stock! Again, there is a small hiccup with the current Windows scheduler; if the threads are not running on the right modules, this Turbo mode won't work. Hopefully a patch to the Windows 7 scheduler will soon arrive.
That C6 state implies power gating the whole unused module. First, after a predetermined period of inactivity, the L2 cache is flushed and the register's content is saved. Then, some FETs essentially isolate the module from the ground. To resume, they close back the loop and the execution context is restored from the saved space. There is also some clock gating going on in the modules, which is essentially bringing the frequency down to zero, but at a more granular level. Some parts of the northbridge can also be power gated if not used.
The four modules are surrounded by four Hyper-Transport links, the memory controller and other miscellaneous I/O. Even though the L3 cache is split in four parts, it is treated as one global cache, and latency is the same throughout. The northbridge functionality is in the middle. This yields a 315mm² die and approximately 2 billion transistors.
For the desktop variant however, three Hyper Transport links go unused. The chip is put into the socket AM3+ package, which compared to its predecessor adds support for CPU voltage loadline adjustments and increases in ILDT and DRAM current for higher frequencies. In fact, at stock the Hyper Transport link runs at a nice 2.6GHz in both directions, providing up to 5.2 GT/s. The officially supported memory speed is now up to 1866MHz, like with Llano. AM3 processors are also forward compatible.
For the 1-2 socket platform, the HT link count is now up to three, with each running at a whopping 3.2GHz. The memory speed is down to 1600MHz, however. The Advanced Platform Management Link (APML), a processor slave interface based on the SMBus, gets added to the mix as well. This allows the TDP to be capped dynamically using the APML tools. This will surely prove to be quite useful in the server department, where hot spots remain a concern in data centers. The set TDP of the processors could then be part of the equation for the heating system. Last but not least, the Valencia processors are a drop-in replacement for the Opteron 4000 series, provided the motherboard manufacturer publishes the required BIOS update.
Now this is the cream of the crop: the world's first hexadeca-core processor. That's sixteen. Interlagos consists of two dies slapped together under the same hood, which are then seen as a single processor for the OS. This time, all four HT links are enabled, with some used for connecting the two chips. There are four links left for external communication. The supported memory speed is now back up to 1866MHz, too. The G34 infrastructure allows for up to four sockets, so this adds to a total of 64 processor cores on a single motherboard. Once again, Interlagos is a drop-in upgrade provided the BIOS has been updated.
The processor reviewed here is obviously the desktop version, which looks pretty much the same as the AM3 processors. The retail FX-8150 can be bought in a neat little steel box with a window on the side, and comes with the standard air cooler.
There is another version which comes in a larger cardboard box bundled with a self-contained water cooler. The rumors were true; it will be possible to acquire the new chip with a better cooling solution right out of the box, although no official price has been announced for it yet.
The water loop is manufactured by Asetek. It features a double-thick radiator aided by a pair of 120mm fans. Unfortunately, it was received by Neoseeker only a few hours before the article launch, so water loop performance testing will come in a future update.
The FX processors are much like their predecessors. Beside the labels on the heatspreader, the only differences externally are the extra two pins at the back. These are most likely ground pins.
The FX-8150 will be pitted against the Phenom II flagship in both stock and overclocked states. It supposedly has lots of overclocking headroom so the comparison will be interesting. Both will be running on the ASUS Crosshair V Formula reviewed a few weeks ago, and will be cooled by a Cooler Master V8 heatsink. The now three years-old Intel Nehalem architecture and the newer Sandy Bridge meanwhile stand for the blue side of the fence. All test setups have the Turbo feature disabled to increase the score consistency.
AMD FX "Zambezi" (Socket AM3+)
AMD Phenom II "Thuban" (Socket AM3)
AMD Phenom II "Deneb" (Socket AM3)
Intel Core i7 "Bloomfield" (Socket 1366)
Intel Core i7 "Sandy Bridge" (Socket 1155)
Oddly enough, Crysis Warhead would not run on the AM3+ setup so it will not be used. Hopefully the problem will be found and the game benchmarking tool will be used again. Three of the following benchmarks will also be used for testing the Turbo capability.
After what was witnessed at the Bulldozer conference, great numbers were expected from this FX-8150; with the self-contained water cooling kit, this baby ran a Unigine benchmark all day long at 4.8GHz on all eight cores. Had it been tested with a stability benchmark though? Here at Neoseeker, every overclock has to sit through such a test for one hour, which is still pretty generous. In previous AMD reviews, OCCT Perestroïka worked pretty well for catching instability in the cores and IMC. With the new core, it didn't perform as well compared to Prime95, because the processor did not heat up as much, and it turns out Bulldozer enjoys cooler temperatures when it comes time to overclock. Furthermore, the self-contained water cooler that can be purchased along the processor, thus replacing the standard heatsink, was not received in time for this overclocking session, so as mentioned earlier the Cooler Master V8 heatsink will be used instead.
As will be shown later, the chip is pretty environmentally-conscious at stock levels, but at some point the heat builds up quite rapidly. The power usage of the K10 seemed to increase less abruptly, but at the same time it was much less tolerant with higher temperatures; at around the 55 to 60oC mark, it really began to show off decreased stability. This particular FX-8150 sample seemed to be able to get as hot as 75oC. Obviously, the poor Cooler Master V8 had quite a hard time keeping up, and definitely limited the overclock.
Nevertheless, Bulldozer is overclocked the same way as Phenom II. More precisely, like the Black Edition Phenom IIs. In fact, all processors in the FX series are going to be unlocked. With Neoseeker's sample, the memory controller didn't seem as friendly, maximizing at around 2650MHz for an extra 200mV. The memory compensated for this though, as it just kept climbing without any fuss. ASUS' own utility was used to for monitoring the voltages and temperatures.
All in all, these are the final settings used:
Seeing how the FX-8150 starts to heat up at 4.5GHz, the water cooler should really help to maximize the overclocks. It remains to see if the premium commanded by the bundled water cooler is actually worth it compared to just buying a stand-alone cooling system. Neoseeker will look into this in another article.
The overclocks on air cooling can be summarized as follows:
Update 11/10/15: The Crosshair V Formula has a VRM temperature protection mechanism. With power-hungry processors such as an overclocked FX-8150, under stressful situations, the mechanism triggers the throttle of the processor. Putting a fan over the VRM heatsink prevents this, and actually also has a great impact on the processor temperature. With the Asetek self-contained water cooler provided along the FX-8150, at the overclock presented above, the processor would top at 61oC instead of 75oC. In the initial testing, this protection mechanism was not observed because the VRMs were kept cool enough. It can be disabled in the BIOS with the option "VRM Temperature Protection", but it is obviously not advised.
This program includes benchmarks for most hardware. The CPU arithmetic and multi-core efficiency benchmark will be run, as well as memory bandwidth and latency.
SiSoftware Sandra considers the newest instruction sets so the FX-8150 and the i7-2600K are quite strong in the CPU arithmetic test. The latter is still a long way ahead, but that doesn't prevent Bulldozer from smoking the hexa-core and the i5-2500K. At 4.5GHz, it can barely take the lead. The multi-core efficiency has more than doubled compared to its predecessors, but it's still far from the high-end Intel chips. The 2.2GHz memory controller is pretty much equivalent to the 3GHz one on Phenom II when it comes to memory bandwidth, but the latency has not moved much. Overclocked, Bulldozer is on the heels of the triple-channel systems.
HandBrake is an application that converts sound and video files to other formats. It makes use of all available threads so it can exploit the processor to its full potential. The input video is a 1:48 mp4 file coming from the Lord of the Rings, in 1080p. The file is 96MB large and it will be converted to the mkv format.
POV-Ray, for Persistence of Vision Raytracer, is a 3D rendering software that has impressive photo-realistic capabilities.
It appears that Bulldozer shows its strength with encoding and rendering; everything including the i7-2600K is left behind, and by a nice margin. The performance difference between Phenom II and Bulldozer in their overclocked states is approximately proportional to the one at stock, so even though Bulldozer can get to a higher frequency, its overclocking potential does not seem to be a whole lot better.
7-Zip is a compression program, much like WinRAR. It features a built-in test which gives a score for compression and decompression.
Cinebench 11.5 is another rendering program supporting an insane amount of threads. The image is processed by chunks, each running on a particular thread.
Another strong showing from Bulldozer; it's the top performer in 7-Zip, and in Cinebench only the hexa-core and the Sandy Bridge Core-i7 can best it. This time, in their overclocked states, Bulldozer gets an extra 2% lead ahead of the Phenom II in 7-Zip. Still it's quite deceiving that it cannot overtake it in Cinebench.
PCMark is similar to the 3DMark suite, except that it includes many other tests like hard drive speed, memory and processor power, so it is considered a system benchmark and not just a gaming benchmark.
Now that's quite the radical change; here the AM3+ system finishes last, just a few points below the nearly three years-old Core i7-920, and overclocking does not yield any miracles. Bulldozer is said to have been made for modern workloads, so can it do better in the newer installment of the benchmark? PCMark 7 is exactly five months old at time of writing, so Futuremark was aware of the latest development in terms of CPU architecture. Happy birthday by the way!
Here Bulldozer fares a bit better, but it's nothing really amazing. At stock, it now surpasses the 2.66GHz Bloomfield, and overclocked, only the strongest from Intel stays ahead. The i5-2500K falls behind the overclocked AMD parts as well.
Far Cry 2 is another first person shooter that has been developed by Ubisoft. The story takes place in Africa, where the ultimate goal is to assassinate an arms dealer.
DiRT 2 is a popular driving game in the Colin McRae series. It features a built-in benchmark consisting of displaying a race of computer players using the same view as the gamer would.
It's been seen in past reviews that Far Cry 2 is really fond of higher frequencies and... the Intel architecture. Unfortunately, Bulldozer makes no exception to that rule, finishing last. It can at least overtake the Phenoms once overclocked. What is interesting to note however is that the framerates are not held back at the higher resolution compared to the Core i7-920. In DiRT 2, it overtakes Nehalem and the Sandy Bridge Core i5. At 4.5Ghz, it can now beat the K10.
3DMark 2011 is the latest from Futuremark. It is specifically made for DirectX 11, which allows for a realistic amount of graphical detail. Like Vantage, it has three presets ranging from Entry, suitable for ultraportables, to Extreme which is naturally adapted for top-end gaming rigs.
Lost Planet is a game developed by Capcom. It features a built-in benchmark which will be run at the lowest settings like the previous tests, including a resolution of 800x600. It has two different runs; one takes place in a cave while the other one is set in a snow landscape.
Bulldozer performs well in the new DX11 benchmark; only the Extreme Editions from Intel are ahead of it, while the recent i5-2500K is left far behind. The hexa-core can't even beat it while overclocked. However, the tables are turned in Lost Planet; this four year-old game places the FX-8150 second last, right above the quad-core Phenom II. At least it can overtake everything but the i7-2600K after an overclock.
The demo of these two gaming benchmarks can be downloaded for free. Call of Juarez is made by Ubisoft and World in Conflict is developed by Massive Entertainment. They will be run at the lowest settings possible so the score is not GPU-bound, and that entails a resolution of 1024x768 pixels for Call of Juarez and 800x600 for World in Conflict. In this way, the true processor power behind the cards will be exhibited.
The octo-core finishes dead last in Call of Juarez. Overclocked, it can barely overtake the Phenom II X4! World in Conflict also demonstrates an odd ranking, as all processors from both sides of the fence are ranked according to their frequency, regardless of their architecture. The blue team is at the top though.
By now, both processor manufacturers had the occasion to reiterate their Turbo capability which had been disabled in all previous tests. Can it give a boost to the sub-par single-thread performance of the FX-8150? Three benchmarks from the previous pages have been chosen and were run with Turbo enabled and disabled. Far Cry 2 and Call of Juarez have been chosen for their poor multi-threaded support, whereas 7-Zip has been chosen as a benchmark that makes use of all threads, but without capping the TDP.
It looks like the FX-8150 sees better performance increases than the Core i7-2600K. It will be even rosier for Bulldozer once the operating systems get better schedulers. Yet the Core i7-920 still stands to profit the most however, due to its low stock clock. Meanwhile, the first version of Turbo Core from AMD isn't even worth it.
To measure power usage, a Kill A Watt P4400 power meter was used. The following numbers represent the power drain for the entire benchmarking system, not just the video cards themselves. For the 'idle' readings, the power drain while at the OS desktop with no applications running was measured, and for the 'load' benchmarks, the average power consumption was taken while running the OCCT power supply test, stressing both the video card and processor for a couple of minutes.
AMD aimed at improving the performance per watt compared to its previous generation, and this it achieved. Bulldozer is much more eco-friendly than the previous flagship from the company. At stock, the power consumption is now down to 81W for the whole system, which is 9% below the next contender, the Core i5-2500K. Clearly, the power gating capabilities made a difference. If overclocked with all the power saving features turned off, it does consume more than Phenom II though, but who is going to overclock while wanting to save some energy?
Honestly, AMD's Bulldozer didn't turn out as impressive as it sounded. In many applications, it is barely better than the Phenom II X6 1100T, and in a few others it's actually worse. However, these applications either weren't highly multi-threaded, or do not make use of the newer instruction sets. FX is said to have been designed for modern workloads such as DX11 games played in high resolutions and details, and looking at the numbers, it is clear that it performs better on the newer applications. In the older games, especially the three that were ran at the lowest resolution, Bulldozer was ranked based on its single-thread performance and nothing more, which really isn't doing it any justice. The upcoming programs will, hopefully at some point, use the new FMA4 and XOP instructions sets which for now are only found in this architecture. AVX is at least slowly gaining some ground since the launch of Sandy Bridge. In the few applications that exploit the potential of Bulldozer, one can see some very nice scores out of the chip as it overtakes the 2600K in many occasions. Considering AMD was playing catch-up with Nehalem and Westmere, this is still satisfactory. In the server market, it should perform quite nicely considering many of the workloads feature independent threads, for which all cores will be exploited to their full potential. Overall, it can cater to both markets, but on the client side it would be better to see stronger single-thread performance. At the very least, AMD can now fill this gap with a very potent Turbo Core which has proven to be as great if not better than Intel's implementation.
Zambezi is however only the first generation of the new architecture. Whereas K12 in Llano was a fully mature core, with all its architectural innovations the new Bulldozer chip is in its infancy and thus has a lot of potential for improvement. AMD is confident about a ten to fifteen percent improvement in performance with each new revision. 2012 will see the arrival of Piledriver, which will also cover both the desktop and server markets. It should close the gap between Phenom II in single-threaded workloads. AMD said that Trinity, the next generation of APUs, is going to feature more or less that core revision as well. It will also be the occasion to fix the L1 cache invalidation bug in hardware, should it not be addressed sooner. This bug isn't terribly harmful outside of its tiny performance hit accounted for in the numbers posted on the previous pages, so if these have been discussed, the bug has also been as well. As for the transition to the 22nm manufacturing process, that's not going to happen until 2013, so a safe speculation would be that the top-end Piledriver at launch will have the same number of cores as the unit reviewed today.
Compared to Phenom II, it is already a quite big improvement when looking at the performance per watt; at stock, the power consumption is much better, both under load and at idle. AMD was already showing very strong numbers at idle thanks to its Cool'n'Quiet technology, but Intel recently caught up since its move to 32nm with Clarkdale. The new power gating technology once again creates a nice gap between the opposing camps, with the FX-8150 consuming a whole 9W less power than the i7-2600K. It's funny how their power consumption is similar, yet the i7-2600K is rated with a 30W lower TDP. It's not the same story though at 4.5GHz; Zambezi is getting close to the 600W mark, and is heating up quite a bit. With better cooling, it should be able to reach an even higher frequency, hopefully like those seen at the Bulldozer conference in Austin. It clearly has the potential to go higher, but with an air cooler, the numbers have shown that the gains from overclocking are approximately proportional to those of Thuban. It will also soon be possible to acquire the chip with a self-contained water cooler instead of the standard cooling solution, which should allow for even higher overclocks. Neoseeker will be looking at this bundle in the coming days.
Considering the FX-8150 is priced at $245, only slightly higher than the i5-2500K which performed worse in the majority of the newer applications, the new chip is a viable option. In most benchmarks it didn't overtake the i7-2600K however, so it makes sense that it would be $70 lower than that particular chip, and honestly that price difference is required for it to be competitive. As applications get more multi-threaded and get compiled to use the newer instruction sets, Bulldozer should remain strong, whereas the other offerings might not be positioned to keep up.
Please do not redistribute or use this article in whole, or in part, for commercial purposes.