Author: Chris Ledenican
Editor: Howard Ha
Publish Date: Thursday, May 10th, 2012
Originally Published on Neoseeker (http://www.neoseeker.com)
Article Link: http://www.neoseeker.com/Articles/Hardware/s/Nvidia_GTX_670/
Copyright Neo Era Media, Inc. - please do not redistribute or use for commercial purposes.
NVIDIA first introduced the Kepler architecture just over a month ago with the release of the GTX 680, and since that time they have managed to turn the graphics market on its head. For the last three to four graphics card generations, the hierarchy saw AMD having the fastest dual core graphics card on the market, with cards such as the HD 4870x2, HD 5970 and most recently the HD 6990, while NVIDIA laid claim the fastest single GPU solutions, such as the GTX 480 and GTX 580. With Kepler this is no longer the case as NVIDIA is poised to have this generation’s fastest single and dual GPU graphics solution.
In today’s review we are going to be examining the next iteration in the Kepler family, the GeForce GTX 670. As the name suggests, this model sits just below the GTX 680 in the family. There isn't much difference between the two cards, actually. Both use the same Kepler architecture and support features such as GPU Boost, but the GTX 670 has one less SMX cluster in the GPU. This means that instead of having 1536 CUDA cores, it has a total of 1344. This will slightly reduce the performance of the GTX 670 in comparison to the flagship GTX 680, but it also allows NVIDIA to price the card at a more affordable $399 MSRP.
In the review we will go into detail about the GTX 670's architecture and specifications, but first let’s do a quick break down. As mentioned earlier, the GTX 670 has 7 SMX units giving it a total of 1344 CUDA cores, but it also includes 112 Texture units, 32 Raster operation units, along with base and Boost clock speeds of 915MHz and 980MHz, respectively. The card also has a 2GB GDDR5 frame buffer that runs on a 256-bit interface that comes clocked in at 6008MHz effective.
Like Fermi, Kepler GPUs are comprised of different configurations of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. The GeForce GTX 670 uses the same GK-104 GPU as the GTX 680, but again the difference is it has one fewer SMX cluster. This means the GTX 670 still has 4 GPCs, but uses 7 next-generation Streaming Multiprocessors (SMX) units instead of eight. It still has the same amount memory controllers however, 4 in total.
Starting at the top of the GK-104 block, Kepler has a single GigaThread Engine which fetches the specified data from system memory and copies them to the frame-buffer. The Engine then creates and dispatches the threads from the memory to the GPCs, where it delivered to the execution units. Following the GigaThread Engine are a total of four Graphics Processing Clusters (GPCs), which is where the majority of operations are performed. This is due to each GPC having a dedicated raster engine, as well as resources for shading, texturing and computation.
The memory sub-system of the Kepler architecture has also been redesigned to support higher clock speeds. This overhaul of the memory interface allowed NVIDIA to push the operating frequency of the memory up to 6008MHz (4002MHz effective). The memory sub-system of the GTX 670 is the same as the GTX 680, down to the frequencies, so it has a 2GB frame buffer that runs on a 256-bit wide GDDR5 interface, which equates to a total bandwidth rating of 192.2GB/s. Additionally, the GK-104 GPU has 4 memory controllers, along with 512KB L2 cache, and since each GPC has its own Raster Unit there are a total of 32 Raster Operation Units.
Inside each GPC are two SMX units which have been optimized to offer the best performance-per-watt by running the shaders at the same frequency as the GPU clock, and not double it. This approach gives Kepler twice the performance-per-watt of the Fermi architecture while allowing more CUDA cores to be packed into a single SMX unit. Inside each SMX are 192 CUDA cores which equates to a total of 1344 CUDA cores, triple the amount in the GTX 570. Of course since the CUDA core clock is equal to the GPU clock, the performance per CUDA core is reduced from the previous generation but the 1:1 clock design allows the GTX 680 and GTX 670 to achieve the same throughput all while staying within a lower power envelope.
Looking at the functions of the execution units, the CUDA cores are designed to perform the pixel, vertex and geometry shading, as well as the physics compute calculations. The texture units on the other hand perform texture filtering, load/store units and fetch and save data to memory. Meanwhile, Special Function Units (SFUs) handle transcendental and graphics interpolation instructions. Finally, the PolyMorph Engine handles vertex fetch, tessellation, viewport transform, attribute setup, and stream output.
The new Boost Clock feature is one of the biggest changes to the Kepler family. In essence, the Boost Clock works along the same lines as Intel's Turbo Boost, which dynamically adjusts the clock speeds in real-time, thus increasing the performance. However, Boost Clock is different in the sense that the maximum Boost Clock frequencies are not necessarily where the GPU clock will cap during gaming. Instead, Boost Clock works at both a hardware and software level to dynamically boost the GPU clock speed and under most circumstances, will increase the GPU clock speed well above the actual Boost Rating. Of course not all silicon is the same, so each Kepler board will have its own unique Boost Clock speed.
The typical board power defined for the GTX 670 is only 170W. This means that the Boost Clock will increase the clock speeds to fit into this power envelope under load. Additionally, GPU Boost operates completely autonomously so there are no game profiles and no intervention required by the end user, providing an instant performance boost to gamers. The technology also works on a microsecond level, and does constant checks of the GPU voltage and conditions to see if the clocks can go higher or if they need to be throttled down to the base 3D clock. In addition, the GTX 670 has a maximum thermal threshold of up to 98°C.
The Kepler series comes with a host of new technologies, some of which are exclusive to the new architecture while others affect current generation NVIDIA hardware via quick driver update.
The first of the new technologies added to NVIDIA based hardware is Adaptive Vsync. Before going into how the new technology works, let’s first examine the issue it addresses. To create a smooth gaming experience, many gamers rely on Vsync to cap their frame rates at 60FPS. This prevents issues such as screen tearing which happens at higher frame rates, and also should prevent the frame rate from dipping below what would be considered smooth. The issue however is that Vsync syncs with the screen's refresh rate, so when the frames dip below 60FPS, Vsync actually dips to the next lowest refresh rate of 30FPS. This dip causes what is known as frame stutters.
To address this, NVIDIA has added a new feature to their latest R300 drivers dubbed Adaptive Vsync. Essentially this feature dynamically turns Vsync off if the frame rate falls below 60FPS. With Vsync disabled, the frame rate more smoothly transitions to lower frames-per-second instead of dropping to a lower rate altogether. This helps prevent the in-game stutter and tearing mentioned earlier, thus creating a smoother gaming experience.
As you can see from the graph below, Adaptive Vsync allows the frames to drop and rise at a smoother rate than traditional Vsync. The frames in the graph did at times nearly reach 30FPS, but the difference is the drop is gradual which prevents stuttering when the frame rate drops too fast as opposed to the frame rate dipping to around 30FPS.
The R300 drivers also add FXAA to the NVIDIA Control Panel. This opens up the technology to hundreds of games, because it is no longer up to developers to implement it in their titles.
FXAA is a technology developed by NVIDIA to reduce visible aliasing in games. This is done by applying FXAA along with other post processing steps such as motion blur and bloom. Additionally, since FXAA is post processing shading technology and not a deferred shader like MSAA, it improves the performance while reducing the strain on the memory.
On top of this, NVIDIA has also added an entirely new anti-aliasing technology called TXAA. TXAA is a film style anti-aliasing technology designed to utilize the high texture performance of the Kepler architecture. The technology is a GC film style AA that combines hardware anti-aliasing to achieve smooth edges. In the case of 2x TXAA there is an optional temporal component for even better image quality. In total, TXAA can be used in 1x and 2x configurations and between the two, TXAA 1 offers image quality similar to 8x MSAA but with a much lower impact on performance. TXAA 2 meanwhile offers higher in-game aliasing, and the impact at running the technology at maxed out results is equivalent to running MSAA at 4x.
Lastly we have another Kepler exclusive, an update to the NVIDIA Surround technology. Like Fermi based hardware, Kepler supports both 3D Vision and Surround functions, but Kepler can run both technologies on a single graphics card. In the case of the Fermi, two graphics cards were required to run more than two displays, but this is no longer the case with Kepler which can simultaneously drive up to four displays out-of-the-box, without need for adapters.
NVIDIA has also optimized the Surround technology to utilize the best available interface. This was achieved by using the middle screen as the main display, putting the task bar in an easy to access location. Currently the main display is located to the center screen, but a future update could add manual support to allow the user to adjust the setup to best fit their needs.
The GTX 670 is designed to offer best-in-class performance while fitting into a more affordable price segment compared to the GTX 680 or HD 7970. According to NVIDIA, the GTX 670 runs up to 41% faster than its predecessor (the GeForce GTX 570) on average, and over 50% faster in some cases with the most demanding DX11 applications. In upcoming titles like Max Payne 3, the GTX 670 is 30% faster than GTX 570. This means the GTX 670 should offer a ton of bang for the buck, and it could possibly be the best graphics card available in its price range.
Taking a look at the exterior, we can see the GTX 670 only mildly shares a family resemblance with the rest of the cards in the stack. First off it has a simple plastic enclosure with a small GeForce logo near the rear mounted fan. However, since most manufacturers will be using a custom design any aesthetic nitpicking over this reference model are moot, as it will have only limited availability outside of the review samples.
The GTX 670 uses the same GK-104 graphics processing unit as the GTX 680, albeit slightly slimmed down. This means the GPU is built on the same 28nm fabrication process, has a die size of 295mm² and packs in a total of 3.54 billion transistors. The GTX 670 additionally features base and Boost clock speeds of 915MHz and 980MHz, respectively, making this the first Kepler graphics card to not boost over 1GHz. Additionally, the GTX 670 includes 4 Graphics Processing Clusters with 7 SMX units, giving the GTX 670 a total of 1344 CUDA cores, 32 ROPs, and 112 Texture units. The memory specs are exactly the same as the GTX 680, meaning it has a 2GB frame buffer that comes clocked at 1501MHz (6008MHz effective).
The back of the PCB shows off the length of the board, one of the more interesting aspects of the GTX 670. The PCB of the GTX 670 is only 7-inches long while the remaining 2.5-inches is an enclosure for the exhaust fan. That is a lot of power for such as small package. In addition, he GTX 670 has a memory configuration where the chips are found on both sides of the PCB. As you can see there are four chips on the back, while the other four are on the front. However, every other chip is missing, meaning it will be extremely easy for AIB partners to increase the memory from 2GB to 4GB.
The PCB also demonstrates how the layout of the on-board circuitry has been optimized to utilize all the available space on the board. Just looking around the PCB we can see there is front mounted power supply along with dual SLI connectors and of course the memory and GPU. The GTX 670 also runs on a Gen 3.0 PCIe interface which has double the maximum data rate over Gen 2.0, giving the card up to 32 GB/s of bi-directional bandwidth on a x16 connector.
On paper the TDP of the GTX 670 is close to that of the GTX 680. According to NVIDIA, the maximum TDP for the GTX 670 is 170W, which is same as the typical power consumption rating of the GTX 680. However the typical gaming power rating is only 141W, which is extremely low for a high-end graphics card. NVIDIA's new GPU core design is voltage limited in terms of clock speed, and that voltage limit is defined by the TDP meaning the Boost clocks can scale higher than the target rating, but only up to the 170W TDP. With the dual PCIe 6pin connectors plus PCIe slot, the GTX 670 can consume up to 225W under the PCIe specification.
Another interesting aspect of the power design is because the length of the PCB is so small, the power connectors are located at the middle of the card and not at the rear. It is really going to be interesting to see exactly what NVIDIA's partners plan do with the board design. Just from the few I have seen, most will be using larger heasinks but there are single slot solutions in the work, and of course there should be shorter length models down the pipeline as well.
Like the GTX 680, the video outputs on the GTX 670 has been completely retooled to support the expanded 3D Vision and Surround technologies. In total there are two DVI ports, a single HDMI port and a full-sized DisplayPort. Out of the ports, both the DisplayPort and DVI connections can support resolutions of up to 2560x1600, while the HDMI port is capable of supporting resolutions of up to 1080p and comes with native support for all the latest HDMI 1.4a features.
With an average gaming power rating of only 141W, the GTX 670 doesn't require a massive heatsink. As you can see from the image below, the thermal solution used on the GPU consists only of a medium sized fin stack, and there are no heatpipes attached to the cooler. The PCB does, however, have a larger aluminum heatsink on the front mounted power supply to improve the thermal performance of the on-board power circuitry.
The heatsink is just a square fin stack with a larger copper base. When installed onto the GPU, the fin stack is positioned with the opening facing toward the back of the PCB. This allows the air from the fan to be push through the fin stack and exhaust out the rear of the card. This technique allows the air to be blown outside the PC rather than being trapped within it. This feature is particularly beneficial for small form factor PCs.
The fan used on the GTX 670 is the same as the one found on the GTX 680. This means it has the same custom design which allows it to maximize the airflow, yet still run at low acoustic levels. NVIDIA achieved this by designing the fan with special acoustic damping material that improves the noise output by up to 5dBA without affecting the CFM. The result of all these custom features is a quieter graphics card that is still more thermally efficient than any of NVIDIA's prior flagship graphics cards.
As we have already mentioned, the board size of the GTX 670 is just tiny. It is nearly 3-inches shorter in comparison to most high-end models. The size of the PCB did reduce the room NVIDIA had to work with though, so everything on the PCB is very compact. To accomplish this feat, NVIDIA moved the power supply to the side of the GPU and rotated the core to improve power integrity and increase efficiency. The power circuitry has also been moved to the front of the board to maximize the available real estate.
The PCB of course also includes the 295 mm² GK104 GPU, 8 HYNIX memory chips (with room for 8 more) and the dual 6-pin power connectors. In addition, there are a total of 4 phase units and the VRM has additional cooling for improve efficiency.
The base GPU Boost clock of the GTX 670 is set at 980MHz, but since this is just a target the Boost clock traditionally runs higher. The GTX 670 we were sent was no exception, as our Boost speed was 1084MHz, which is 9.6% faster than NVIDIA's GPU Boost target. These clocks were hit with no alterations to the power target or GPU offset, meaning the Boost feature dynamically adjusted the clock speeds nearly 10% higher without any work on user's part.
When it came to overclocking, we again used the EVGA Precision X software utility. In our labs, the GTX 670 was able to reach a stable clock speed of 1244MHz. To reach this frequency we increased the Power Target to 122% and increased the voltage as well. This gave us a total voltage rating of 1175mV, which was the main factor in achieving such as high overclock. In comparison to the target Boost clock speed and actual Boost clock, the final frequency we reached was 27% and 15% higher respectively.
The memory also overclocked substantially, which was surprising considering it is already clocked at 6000MHz effective. Our end results netted an additional 1612MHz (6450MHz effective), up from 1502MHz. At this speed, the memory bandwidth was increased to over 200GB/s.
||AMD Radeon HD 7970||AMD Radeon HD 7950||AMD Radeon HD 7870 GHz Edition||AMD Radeon HD 6950||AMD Radeon HD 6970|
||3GB GDDR5||3GB GDDR5||2GB GDDR5||2GB GDDR5||2GB GDDR5|
Nvidia GTX 670
||Nvidia GTX 680||Nvidia GTX 480||Nvidia GTX 570||Nvidia GTX 580|
Core Clock/ Boost Clock
915MHz / 980MHz
||1006MHz / 1058MHz||700MHz||742MHz||782MHz|
||2GB GDDR5||1.5GB GDDR5||1.25GB GDDR5||1.5GB GDDR5|
Futuremark's latest 3DMark 2011 is designed for testing DirectX 11 hardware running on Windows 7 and Windows Vista. The benchmark includes six all new benchmark tests that make extensive use of all the new DirectX 11 features including tessellation, compute shaders and multi-threading.
The GTX 670 was slightly slower than the GTX 680 in 3DMark 11, but it was still substantially faster than any of the Southern Islands graphics cards.
Unigine Heaven became very popular very fast, because it was one of the first major DirectX 11 benchmarks. It makes great use of tessellation to create a visually stunning heaven.
Unlike the previous benchmark, Ungine narrows the gap between AMD and NVIDIA. Here the GTX 670 was only 1FPS faster than the HD 7950, and 2FPS slower than the HD 7970. In comparison to the previous generation GTX 570 though, we can see a substantial improvement of nearly 45%.
Batman: Arkham City is the sequel to the smash hit, Batman: Arkham Asylum. The game was created with the Unreal 3 Engine, and includes areas with extreme tessellation, high res textures and dynamic lighting. Batman, also includes native support for PhysX and is also optimized for Nvidia 3DVision technology.
The Top graph reflects our results at 1920x1080, while the lower graph reflects our results Eyefinity and Surround results at 5760x1080.
The GTX 670 was nipping at the heals of the GTX 680 in Batman Arkham City, not to mention it was 50% faster than the GTX 570, 27.5% faster than the HD 7950, and even 13.3% faster than the HD 7970.
The GTX 670 also did well when it came to Surround, as it was able to maintain an average frame rate of over 30FPS.
Battlefield 3 is designed to deliver unmatched visual quality by including large scale environments, massive destruction and dynamic shadows. Additionally, BF 3 also includes character animation via ANT technology, which is also being utilized in the EA Sports franchise. All of this is definitely going to push any system its threshold, and is the reason so many gamers around the world are currently asking if their current system is up to the task.
The Top graph reflects our results at 1920x1080, while the lower graph reflects our results Eyefinity and Surround results at 5760x1080.
Battlefield performs extremely well with the 600-series cards and the GTX 670 is no exception. Looking at the difference between the GTX 670 and its competition, the card was 47% faster than the GTX 570, 30% faster than the HD 7950, and 14% faster than the HD 7970. It fell behind the GTX 6870 only by 10.7%.
At 5760x1080, the GTX 670 was again able to perform well, but it couldn't quite average 30FPS, and there were some times when the game felt a bit sluggish while playing.
Crysis 2 is a first-person shooter developed by Crytek and is built on the CryEngine 3 engine. While the game was lacking in graphical fidelity upon its release, Crytek has since added feature such as D11 and high quality textures. This improved the in-game visuals substantially, which in turn pushes even high-end hardware to the max.
The Top graph reflects our results at 1920x1080, while the lower graph reflects our results Eyefinity and Surround results at 5760x1080.
In Crysis 2 the GTX 670 did show some slowdown in comparison to the AMD graphics cards. Here the GTX 670 was only 10% faster than the HD 7950, and 8% slower than the HD 7970. Don't get used to these numbers though, as they are the exception and not the rule.
Once again the GTX 670 was slightly under the 30FPS mark we look for as a minimum, but it came pretty close. If we scaled back the settings just a bit, or tweaked the AA we were using, the GTX 670 could easily get above the 30FPS mark at 5760x1080.
DiRT 3 is the third installment in the DiRT series and like it's predecessor incorporates DX11 features such as tessellation, accelerated high definition ambient occlusion and Full Floating point high dynamic range lighting. This makes it a perfect game to test the latest DX11 hardware.
DiRT 3 puts us back on track with the results we were seeing prior to Crysis 2. In the benchmark, the GTX 670 was 45% faster than the GTX 570, 27% faster than the HD 7950 and 14% faster than the HD 7970.
At 5760x1600 the GTX 670 was able to scale above 30FPS. This allowed the game to run smoothly the whole time we were benchmarking, as there was not a period when we experienced any stuttering.
Metro 2033 puts you right in the middle of post apocalyptic Moscow, battling Mutants, rivals and ratio-active fallout. The game is very graphics intensive and utilizes DX11 technology, making it a good measure of how the latest generation of graphics cards perform under the latest standard.
Metro 2033 traditionally performs better on AMD graphics cards, so the results here should not be surprising to anyone. In this benchmark, both the HD 7950 and HD 7970 were slightly faster than the GTX 670. This game craves memory bandwidth, and is really one of the few games on the market were the larger memory interface on the Southern Islands cards makes a difference.
The Surround results were the same as those at 1080p, as the AMD cards made a strong showing in this benchmark.
Total War: Shogun 2 is a game that creates a unique gameplay experience by combining both real-time and turn-based strategy. The game is set in 16th-century feudal Japan and gives the player control of a warlord battling various rival factions. Total War: Shogun 2 is the first in the series to feature DX11 technologies to enhance the look of the game, but with massive on-screen battles it can stress even the highest-end graphics cards.
We round things up with the GTX 670 ahead of the HD 7950 in Shogun 2 by over 10%, and essentially tying with the HD 7970. Meanwhile, the GTX 670 nearly doubles the performance of its predecessor.
With an average frame rate of 29FPS at 5760x1080, we were able to run Total War with only minimal stutter which hardly affected the gaming experience at all.
To measure core GPU temperatures, we run three in-game benchmarks and record the idle and load temperature according to the min and max temperature readings recorded by MSI Afterburner. The games we test are Crysis 2, Lost Planet 2 and Metro 2033. We run these benchmarks for 15 minutes each. This way we can give the included thermal solution and GPU time to reach equilibrium.
Since the GTX 670 has a power consumption rating that ranges from 141W to 170W, there is little need for a huge heatsink. Still, even with such a small thermal solution we were surprised the core never exceed 76°C, especially considering the fan never ramped up loud enough to be heard over the case fans.
To measure power usage, a Kill A Watt P4400 power meter was used. Note that the numbers represent the power drain for the entire benchmarking system, not just the video cards themselves. For the 'idle' readings we measured the power drain from the desktop, with no applications running; for the 'load' situation, we took the sustained peak power drain readings after running the system through the same in-game benchmarks we used for the temperature testing. This way we are recording real-world power usage, as opposed to pushing a product to it's thermal threshold.
The power consumption of the GTX 670 was nearly identical to the HD 7950, but since this card is 25% faster in most games we certainly aren't complaining.
The Kepler architecture has already exceeded all our expectations here at Neoseeker, so it didn’t come as a surprise to see the GTX 670 absolutely dominating in its price range. Looking at the performance numbers, we can see how the GTX 670 was around 25% faster than AMD's Radeon HD 7950 across the board, and nearly on par with the HD 7970. This isn’t going to sit well with AMD, but the truth of the matter is the GTX 670 performs either at the same level or better than the HD 7970 with only a few exceptions, all while being nearly $100 dollars cheaper. To make matters worse for AMD, the GTX 670 is slightly more power efficient than the HD 7950 and HD 7970, and once overclocked it is able to push as many pixels as a stock GTX 680.
AMD’s response to the Kepler family currently lies with the fact that the Southern Islands graphics cards have 3GB of GDDR5 memory that run on a 384-bit memory bus, which in some instances does yield better performance. However, the fact of the matter is the performance is just not up to par with what NVIDIA has been delivering and since gamers tend to buy the best card within the budget (aside of fanboys) it is hard to currently recommend either the HD 7950, or HD 7970. This makes the GTX 670 the best choice available, providing your budget doesn't exceed $399.
Another aspect AMD touched on was the availability of their Southern Islands cards, which is sort of a red herring. It is true AMD's 7000 series graphics cards are more readily available in the market, but the fact of the matter is this has nothing to do with production. Instead the issue is Kepler based graphics cards are in high demand. We have talked to a few retailers and the word from them is Kepler is selling at nearly a 4 to 1 ratio over Southern Islands. This leaves plenty of volume available for AMD, but makes it appear as if Nvidia is lacking the same volume, when this is not necessarily the case.
On top of all this, the GTX 670 is also an extremely quiet graphics card. In our labs there was hardly a time when the fan ramped up high enough for us to hear it over the case fans. The only time we did hear the fan was when we manually adjusted the RPM to accommodate for the higher voltage levels we were using when overclocking. Even then, the fan was still extremely quiet for being set at 75% rotation. This is of course due to the excellent power efficiency of the card, which in most instances will not consume more than 141 watts of power.
The GTX 670 also comes with all the latest Kepler technologies, and NVIDIA really has gone all-out this generation. The short list of these new technologies includes GPU Boost, 3D Vision Surround via single graphics card (new to NVIDIA), Adaptive Vsync and improved anti-aliasing technologies. These are all standard in Kepler based graphics cards and will continue to be implemented in future cards in the product line. All these technologies are designed to improve the gamers experience by allowing them customize their setup to fit their gaming needs, whether that be using multiple monitors, running games in 3D, or using their different AA technologies to optimize the in-game visuals and performance.
At this point the party is over for the Southern Islands architecture, and AMD had better come up with something fast if they don't want Nvidia to completely turn out the lights.
Please do not redistribute or use this article in whole, or in part, for commercial purposes.