Neoseeker : Articles : Video : Game Accelerators : BFG GeForce 8800 GTS
Hardware Newsletter:
Email:

News Headlines
New Articles
Compare Prices

Motherboards
Abit
ASUS
Gigabyte
MSI
DFI
Intel
Tyan
More...

Processors
AMD
Intel
More...

Memory
DDR
DDR2
SDRAM
More...

Video Cards
ATI
eVGA
XFX
Sapphire
More...

search for lowest prices

send article   hardware newsletter   article comments (35)   Lowest prices check
BFG GeForce 8800 GTS - PAGE 2
Geordan Hankinson, Tom Karpik
- Wednesday, November 8th, 2006


Architecture

Though we received a 8800 GTS for today's review, we will be looking at the 8800 GTX's architecture as our primary example. We will talk more about the differences between the GPU's later on the article.

We'll start with the most notable aspect of the new architecture - the unified shader pipeline. While news of a unified design from NVIDIA came as a suprise when it hit the rumor channels a number of weeks ago, this decision was made based on the entire ethos behind DirectX 10. What's important to note about DirectX 10, is that aside from the new geometry shader step which we will likely see implemented in future DX10 based games, Microsoft has not added anything drastically different from what DirectX 9 already offers. The central theme of DirectX 10 is optimization and this extends from reduced CPU usage to the new push for unified architectures.

Instead of being divided up into seperate vertex and pixel shaders as in the past, NVIDIA has unified the entire shader pipeline. The result is what they are calling their Gigathread technology. This centers around a completely different approach to GPU building and this NVIDIA supplied graph does a good job of contrasting the two concepts.

The Old Way: Vertex and Pixel Shaders

The diagram below shows the classic GPU architecture which we have all grown quite accustomed to. This design does not maximize efficiency as at any given moment, not every one of the vertex shaders may be being utilised while all of the pixel shaders may be under maximum load or vise versa. This effectively leaves unused pipes that sit idle waiting for the other units to catch up before receiving more instructions.



The New Way: Unified Architecture!

NVIDIA's approach to unified architecture as detailed in the diagram below, was to get rid of the vertex and pixel shader pipelines as we know them, and replace those with completely decoupled "stream processors", as they are being dubbed. In the case of the 8800 GTX, the core clock speed is 575 MHz (500 MHz GTS). In a standard GPU this would implicate that the vertex and pixel shading units also run at this speed. In 8800 series architecture however, these units (now bundled together as stream processors) run at a completely seperate clock speed which in the 8800 GTX's instance, is 1350 MHz (1200 MHz GTS).



If the diagram above does not make any sense, keep reading! The concept of completely decoupled pipelines within the GPU is an odd thing to grasp but is facilitated by a central dispatch processor (or arbiter in ATI/Microsoft speak) that keeps the stream processors consistently utilised. The dispatch processor essentially sends data it receives through the stream processors which loop that data multiple times until all the necessary operations are complete before outputting to the Raster Operations Pipeline (ROPs) and then to memory.

The decoupling motif extends even further to the decoupling of the shader pipelines (stream processors) from the texture units. In the past, shader pipes would often be limited by the texture units fetching and filtering and thus a bottleneck would arise. Because these have been seperated on 8800 series cards, the stream processors can be performing other calculations while the texture units (which work at only 575 MHz) work over longer operations. The figure below shows an illustration of what this might look like in some instances.



All these design decisions come together to create the following diagram. You can see all 128 stream processors (96 in the case of the GTS) in their arrangements here.



You can see here the path that the render data takes as it enters the GPU and is processed through the new shader structure. The vertice is effectively run through multiple wash cycles as it moves through the dispatcher, through a stream processor, back through dispatch (depending on the nature of the data) etc before being output  to the ROP.

Some have wondered about the effectiveness of these seemingly general purpose stream processors in relation to their 'dedicated' vertex and pixel shader predecessors. If comparing pure shader vs shader performance, the stream processors in 8800 series cards should theoretically be able to do either operation just as fast as a dedicated unit. The real potential performance hold up however would be in the scheduling overhead that gets introduced by having to dispatch multiple threads to different sub processors. Fortunately, any inefficencies in NVIDIA's Thread Processor design will be negated by the fact that the 8800 has 128 pipelines that are available to perform either operation at any time, loosing a major performance bottleneck. One final note about the unified design is that the performance benefits it brings extend to current DirectX 9 games as well as future DirectX 10 games which should mean tangible performance deltas while we wait for DirectX 10 titles to hit with Vista next year.

The final point worth mentioning here is NVIDIA's new marketing speak for their physics processing. This will be implemented into some DirectX 10 games and will allow physics processing to be done directly off of the GPU through the stream processors. This is obviously to encourage the purchase of two boards in SLI (or three as may be the case with the GTX) to maximise graphics and physics performance.

Memory Arrangement

When details initially emerged on the G80 last month, there was much suprised discussion over the memory configuration and the higher bus width. NVIDIA has spent virtually no time discussing this seemingly major enhancement (seeing as this is the first external memory bus on a GPU over 256 bits) however, and really, there isn't a whole lot to discuss. We would presume that the sub system in this case functions similarly to the memory bus on 7 series cards and has simply been expanded to accomodate more memory. They have however mentioned future support for GDDR4, though GDDR3 is the memory used on current 8800 cards. As seen on the previous page, the 8800 GTX has a 384 bit wide memory bus and 768 MB of GDDR3 memory while the GTS features a 320 bit memory bus and 640 MB of total memory. The memory clock speeds on the GTX are 900 MHz while the GTS loses some and operates at 800.

Keep reading for a look at the card and an overview of the new image quality enhancements made!

next: The Board »

Article Index

1.Introduction
2.Architecture
3.The Board
4.Image Quality
5.Test Setup and 3DMark 06
6.3DMark 06 and Call of Duty 2
7.Company of Heroes and Far Cry
8.F.E.A.R. and Prey
9.Quake 4, Splinter Cell 3 and X3
10.Power Consumption and Conclusion

Submit our article to: diggDigg this! de.le.ciousdel.icio.us

Get updates when we publish new articles
Email Address:

(0.0420/d/nova)