News Headlines
- Fri, May 24
- PlayStation 4 could reach Europe within 2013, according to UK newspaper ad
- Ninja Theory, developers of DMC: Devil May Cry, announce Fightback for iOS and Android
- Tiny Tina takes over Twitter, Borderlands 2's 'Tiny Tina's Assault on Dragon Keep' DLC reveal incoming
- Rumor: Xbox One used game policy detailed by retailers [Update: Microsoft statement released]
- Thu, May 23
- Shin Megami Tensei IV's 'The Samurai Way' trailer prepares aspiring demon vanquishers
New Articles
Related Articles
Like Fermi, Kepler GPUs are compiled of different configurations of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. The GeForce GTX 680 GPU consists of four such GPCs, eight next-generation Streaming Multiprocessors (SMX), and four memory controllers.
Starting at the top of the GK-104 block, Kepler has a single GigaThread Engine which fetches the specified data from system memory and copies them to the frame-buffer. The Engine then creates and dispatches the threads from the memory to the GPCs, where it delivered to the execution unts. Following the GigaThread Engine are a total of four Graphics Processing Clusters (GPCs), which is where the majority of operations are performed. This is due to each GPC having a dedicated raster engine, as well as resources for shading, texturing and computation.
The memory sub-system of the Kepler architecture has also been redesigned to support higher speed clock speeds. This overhaul of the memory interface allowed Nvidia to push the operating frequency of the memory up to 6008MHz (4002MHz effective). The memory operates on a 256-bit wide GDDR5 interface, but since the good speed is at 6000MHz, the bandwidth has a 192GB/s rating. Additionally, the GK-104 GPU has four memory controllers, along with 512KB L2 cache, and since each GPC has its own Raster Unit there are a total of 32 Raster Operation Units.

Inside each GPC are two SMX units which have been optimized to offer the best performance-per-watt by running the the shaders at the same frequency as the GPU clock, and not double it. This approach gives Kepler twice the performance-per-watt of the Fermi architecture, while allowing more CUDA core to be packed into a single SMX unit. Inside each SMX are 192 CUDA cores which equates to a total of 1536 CUDA cores, triple the amount in the GTX 580. Of course since the CUDA core clock is equal to the GPU clock, the performance per CUDA core is reduced from the previous generation but the 1:1 clock design allows the GTX 680 to achieve the same throughout all while staying within a lower power envelope.
Looking at the functions of the execution units, the CUDA cores are designed to perform the pixel, vertex and geometry shading, as well as the physics compute calculations. The texture units on the other hand perform texture filtering, load/store units. fetch and save data to memory. Meanwhile Special Function Units (SFUs) handle transcendental and graphics interpolation instructions. Finally, the PolyMorph Engine handles vertex fetch, tessellation, viewport transform, attribute setup, and stream output.

The new Boost Clock feature is one of the biggest changes to the Kepler family. In essence the Boost Clock works along the same lines as Intel's Turbo Boost, which dynamically adjusts the clock speeds in real-time, thus increasing the performance. However, Boost Clock is different in the sense that the maximum Boost Clock frequencies are not necessarily where the GPU clock will cap during gaming. Instead, Boost Clock works at both a hardware and software level to dynamically boost the GPU clock speed and under most circumstances will increase the GPU clock speed well above the actual Boost Rating. Of course not all silicon is the same, so each Kepler board will have its own unique Boost Clock speed.
The typical board power defined for the GTX 680 is 170W. This means that the Boost Clock will increase the clock speeds to fit into this power envelope under load. Additionally, GPU Boost operates completely autonomously so there is no game profiles and no intervention required by the end user, providing an instant performance boost to gamers. The technology also works on a microsecond level, and does constant checks of the GPU voltage and conditions to see if the clocks can go higher or if they need to be throttled down to the base 3D clock.

Article Index |
|

~51fps in bf3 @ 2560 x 1600 w/ 4xAA & 16xAF? thats a big meatball!
and theres enough performance left to force all those other fancy features, like transparency AA.
when you got it, nvidia, you got it.
the next generations of GPUs will be very interesting. i dont think the gtx 680 will be anywhere near quite as powerful to max out next-gen console games as i would bf3 (2560x1600 & beyond, bunches of AA, big ol' AF, etc) assuming directx doesnt have any major efficiency reworks. its a fun ride.
either i wait and see what the 690 is like (and its cost!), or ill probably have to settle for eyefinity on the 7970, and sacrifice the CUDA that Folding@Home greatly benefits from.
Any news on how much bandwidth the 7970 and 680 need? As far as I'm aware PCIe 2.1 x16 hasn't yet been saturated yet.
AVP matched the 7970 with both at stock levels.
Only 8.6% faster at the highest settings in Arkham City on with both at stock clocks.
Smokes the 7970 by 18.6% at best in Battlefield.
Loses to the 7970 in Crysis by 21.6% at the highest settings.
Up to 24% faster than the 7970 in Dirt three in the middle tier settings. Around 18% in the others.
Loses by 15-25% in metro compared to the 7970. And Metro is not an AMD biased or optimized game. Until the 6xxx cards came out AMD always lost by a wide margin. The 7970 either is a better architecture for the game or currently has better drivers. I have no doubt the 680's performance could have been better with the game as it's a new architecture and could probably use some driver work.
The 680 also loses in Total War between 10 and 17%.
On average that makes the 680 about 5% slower at the highest settings.
Like I said, hardly a 7970 killer, very efficient architecture and it certainly trades blows the AMD but it's not nearly 10-15% faster across the board as stated in the article. I really have to question whether you did the math or just eyeballed it.
The overclocking of the 680 was 15.5% core and about 19% vram.
The 7970 on the other hand had a gpu OC of 21.6% (925mhz to 1125 mhz) and about 14.5% vram OC. So the overclocking is kind of 'meh' compared to it. It's a nice increase no doubt but it's nothing to write home about. The 7970 is able to go past the limits set in CCC in the 7970 review but the gtx 680 actually capped out before its software limits, whether that's due to power or physical limitations of the architecture at that voltage I don't know though.
The pricing and efficiency is what makes this a great card, not the raw performance.
Perspective is important.
to play next-gen console ports, your compy will need to be considerably more powerful to run 'em. you want to notch up the resolution, add mods and force spiffy gfx options? be ready to throw a ton more power at it.
and emulation is a whole 'nother story.
just sayin' by the time next-gen console games are ported to PCs, a gtx 680 probably wont be too relevant.
Weird how it can sometimes have negative scaling (if that is the correct usage of this somewhat technical term).
Thanks for the response on our GTX 680 review. I noticed this is the second time you have mentioned not being sure of the reliability of our scores. Can you tell use what you would like to see different in the way we approach and analyses our benchmarking results ?
Thanks