Maxwell, or GM204 is NVIDIA's 10th generation GPU architecture and is made up of 5.2 Billion Transistors and measures 398 mm2. NVIDIA's goals for Maxwell were increased gaming performance, incredible energy efficiency, and added support for VXGI lighting. The comparison of Maxwell should be made back to the original Kepler GK104 GPU which was the GeForce GTX 680. Compared to Kepler, Maxwell has 2x the performance and 40% more performance per CUDA core.
The original Kepler diagram can be compared here. In GeForce GTX 980, each GPC ships with a dedicated raster engine and four SMMs. Each SMM has 128 CUDA cores, a PolyMorph Engine, and eight texture units. With 16 SMMs, the GeForce GTX 980 ships with a total of 2048 CUDA cores and 128 texture units.
The GeForce GTX 980 features four 64-bit memory controllers supplying a 256-bit total. Tied to each memory controller are 16 ROP units and 512KB of L2 cache. The full chip ships with a total of 64 ROPs and 2048KB of L2 cache (this compared to 32 ROPs and 512K L2 on GK104). NVIDIA was able to integrate 2x more SMs without doubling the die size.
We know GTX 680 is not the fastest Kepler GPU out there right now, that is GeForce GTX 780 Ti. We will make comparisons to GTX 780 Ti when we look at the card itself on the next page. There is a reason these comparisons are being made to GTX 680, and we will talk about that later in the "Who is this card meant for?" section.
Based on efficiency and workload analysis, and math vs. texture processing requirements of modern games, NVIDIA engineers determined that eight texture units per SMM is the best architectural balance for Maxwell; therefore, the total number of texture units is the same as Kepler, 128. However, thanks to GeForce GTX 980’s higher clocks, texture fill rate improves by 12% from one generation to the next. To improve performance in high AA/high resolution gaming scenarios, we doubled the number of ROPs from 32 to 64. Again, thanks to the added benefit of higher clocks, pixel fill-rate is actually more than double that of GTX 680: 72 Gpixels/sec for GTX 980 versus 32.2 Gpixels/sec for GTX 680.
The memory subsystem has also been significantly revamped. GTX 980’s memory clock is over 15% higher than GTX 680, and GM204’s cache is larger and more efficient than Kepler’s design, reducing the number of memory requests that have to be made to DRAM. Improvements in our implementation of memory compression provide a further benefit in reducing DRAM traffic effectively amplifying the raw DRAM bandwidth in the system.