Architecture
Back in late September of last year, NVIDIA disclosed some information regarding its next generation GPU architecture, codenamed "Fermi". At the time, actual product names and detailed specifications were not disclosed, nor was performance in 3D games, but high-level information about the architecture, its strong focus on compute performance, and broader compatibility with computational applications were discussed.
We covered much of the early information regarding
Fermi in this article. Just to recap some of the more pertinent details found there, the GPU codenamed Fermi will feature over 3 billion transistors and be produced using TSMC's 40nm processes. If you remember, AMD's RV870, which is used in the ATI Radeon HD 5870, is comprised of roughly 2.15 billion transistors and is also manufactured at 40nm.
Fermi will be outfitted with more than double the number of cores as the current GT200, 512 in total. It will also offer 8x the peak double-precision compute performance as its predecessor, and Fermi will be the first GPU architecture to support ECC. ECC support will allow Fermi to compensate for soft error rate (SER) issues and also potentially allow it to scale to higher densities, mitigating the issue in larger designs. The GPU will also be execute C++ code.
During the GPU Technology conference that took place in San Jose, NVIDIA's CEO Jen-Hsun Huang showed off the first Fermi-based Tesla-branded prototype boards, and talked much of the compute performance of the architecture. Game performance wasn't a focus of Huang's speech, however, which led some to speculate that NVIDIA was forgetting about gamers with this generation of GPUs. That obviously is not the case, however. Fermi is going to be a powerful GPU after all. The simple fact of the matter is, NVIDIA is late with their next-gen GPU architecture and the company chose a different venue--the Consumer Electronic Show--to discuss Fermi's gaming oriented features.
With desktop oriented parts, Fermi-based GPUs will here on in be referred to as GF100. As we've mentioned in previous articles, GF100 is a significant architectural change from previous GPU architectures. Initial information focused mostly on the compute side, but today we can finally discuss some of the more consumer-centric details that gamers will be most interested in.
At the Consumer Electronics Show, NVIDIA showed of a number of
GF100 configurations,
including single-card, and 2-way and 3-way SLI setups in demo systems. Those demos, however, used pre-production boards that were not indicative of retail product. Due to this fact, and also because the company is obviously still working on feverishly on the product, NVIDIA chose NOT to disclose many specific features or speeds and feeds of GF100. Instead, we have more architectural details and information regarding some new IQ modes and geometry related enhancements.
Each GF100 GPU features 512 CUDA cores, 16 geometry units, 4 raster units, 64 texture units, 48 ROPs, and a 384-bit GDDR5 memory interface. If you're keeping count, the GT200 features 240 CUDA cores, 42 ROPs, and 60 texture units. The geometry and raster units, as they are implemented in GF100, are not in the GT200 GPU. The GT200 also features a wider 512-bit memory interface, but the need for such a wide interface is somewhat negated in GF100 in that the GPU uses GDDR5 memory which effectively offers double the bandwidth of GDDR3, clock for clock.
Features
Many of the new feature of GF100 are designed to increase geometric realism, while offering increased image quality, and of course high performance. One of the new capabilities that will be a part of the GF100, like other DirectX 11 class GPUs, is hardware accelerated tessellation.
The GF100 has built-in hardware support for tessellation. As we've mentioned in the past, tessellation works by taking a basic polygon mesh and recursively applying a subdivision rule to create a more complex mesh on the fly. It's best used for amplification of animation data, morph targets, or deformation models. And it gives developers the ability to provide data to the GPU at coarser resolution. This saves artists the time it would normally take to create more complex polygonal meshes and reduced the data's memory footprint. Unlike previous tessellator implementations, the one in the GF100 adheres to the DX11 spec, and will not require proprietary code.
In addition to offering much more compute performance and geometry processing than previous generations, the GF100 also features new anti-aliasing modes, PhysX and ray tracing.
The Kitchen Sink
Perhaps the most complex demo NVIDA used to showcase GF100 was the Supersonic Sled. A system equipped with three GF100 cards was used to run the demo, which exploits virtually all of the features of the GPU. The Supersonic Sled Demo uses GPU particles systems for smoke, dust, and fireballs, PhysX physical models for rigid bodies and joints, which are partially processed on the CPU, tessellation is used for the terrain, and image processing is used for the motion blur effect. NVIDIA called the demo the "kitchen sink" because physical simulation, DX11 Tessellation, environmental effects, and image processing are all employed simultaneously.
In the demo a pilot is launched down a track on a rocket-propelled sled and general mayhem ensues. Particles are strewn about and objects like a shack, bridge, and rock ledge crumble as the sled jets by. Hundreds of thousands to a million particles can be on the screen at any given time, all being managed by the GPU. The demo requires an immense amount of compute performance to run smoothly with the detail and number of particles cranked up, hence the GF100 3-way SLI configuration.
There were other GF100 demos at CES as well, including 3D Surround--
which we showed you here--and a side-by-side FarCry 2 benchmark run which showed GF100 running roughly 65% faster than a GTX 285 at 1920x1200 (84 FPS vs. 50.4 FPS). All told, we wish we had more specific detail regarding GF100 to share with you today. And we know NVIDIA feels the same.
For now, we'll all just have to wait a little longer and hope that NVIDIA hits their current Q1 2010 release target.