AMD has been very busy in the last year with a load of updates about what they’re working on, the hardware inside the PS4 and some of their new products and technologies that you can buy today (one of which is the stunningly cheap HD7790). But today we’re talking about something a little more in-depth and it has far-reaching implications for the company and gamers.
Today’s memory controllers are marvels of technology. They’re integrated straight onto the CPU die and deliver higher and higher bandwidth numbers every year as manufacturing processes mature and higher-frequency memory is delivered to market. In the past, though, it wasn’t so rosy. With older processors, system memory was partitioned for each separate core and cores couldn’t share anything with each other.
Information held both in the RAM and the processor’s cache was kept separate and this presented a problem to programmers in the early days when working with parallel processes. If you wanted a very crude version of multi-threading, you had to make your program run multiple instances of itself on different processors in its own memory space and stitch the results together to form your answer. It was tedious and time-consuming to debug because programmers were effectively working with separate systems.
That changed with dual-core processors and more modern memory controllers. Not only was cache shared between the processors, we also got a new feature that shared the contents of system between processors and their individual cores – Uniform Memory Access (UMA). This was a breakthrough for multi-threaded workloads because you were no longer limited to running code in separate instances, you could now run it on separate cores and all cores would work on the same data in real-time.
Later on, we had the big four – ATi, AMD, Nvidia and Intel – put integrated graphics cores directly onto the motherboard to service buyers who wanted a new computer but didn’t need fast, power-consuming discrete graphics cards. This presented a problem because those GPUs required their own memory, and embedding that into the board as well would have been costly. They then created Non-Uniform Memory Access, which partitioned a section of the RAM for the graphics card to use.
NUMA had its own quirks. You couldn’t re-partition the amount of memory allocated to the GPU without restarting the system and changing some BIOS settings. You couldn’t access any memory held in the GPU’s partition because GPUs don’t work with information the same way processors do.
This was evident in the way that modern game consoles work. The Xbox 360 has 512MB of system memory shared between the GPU and CPU. The partition split can be adjusted to accommodate games that need to have more or less memory to run effectively, but the other limitations still applied because memory held in the GPU’s partition wasn’t accessible by the CPU and vice versa. Partitioning the memory was a delicate balancing act, but this ability allowed many Xbox 360 titles to look and run better than the same game on the PS3, which can’t alter its memory split.
To address this, AMD is integrating new memory technology into its APUs, starting with Kaveri which will be launched at an as-yet-undisclosed date this year. hUMA, or Heterogeneous Uniform Memory Access, allows the GPU to share memory space indiscriminately with the processor cores and orders the information its working with in a form that the processors can understand.
hUMA now allows the CPU and GPU to work on the same thing without having to split available memory or pass results from computations back and forth. With Uniform Memory Access, the CPU can use a pointer to show the GPU what it’s working on. The GPU can then get cracking on the calculations and the CPU can read and verify the results in real-time without being prompted! This reduces latency, which is the time it would normally take for the CPU to copy over a set of data to the GPU, for the GPU to finish the code and then copy it back into the CPU’s memory to read the results.
hUMA has even more tricks up its sleeve. With Kaveri, the built-in GPU and CPU can access each other’s cache for the first time as well as the system memory contents. This means that the GPU can monitor the CPU’s cache for code that it needs and even have a copy of the same data, updated in real-time. Without any prompting, the GPU can fetch and work on the code in the cache and copy the results to RAM, which the CPU can later work with.
In addition, both the CPU and GPU can now access each other’s virtual memory contents stored on the hard drive. The implications of this are far-reaching: in theory, the GPU could fetch a compressed texture that’s needed for a future scene and copy it into virtual memory. The CPU can address those textures and work out what’s going to be added to it, like cloth physics or a particular particle effect. The GPU can then take those optimisations, meld them together with the texture it’s just uncompressed and wrap it onto the 3D model for displaying on your screen. All this takes place in under a millisecond and it wouldn’t be possible without hUMA.
For bonus points, this works with discrete AMD graphics cards from the HD7000 family as well, although there will be some latency issues when accessing GDDR5 memory. Its possible to have the CPU, integrated GPU and the discrete GPU all working on the same thing.
All this is done in machine code without an OS overlay. It runs on its own without special software and developers can take advantage of it without doing complex math or requiring extensive knowledge in how hardware functions. Its easier to debug and it makes developing games for different software environments easier because the hardware still does the grunt work and the game engine can take advantage of this.
Its great for developers, but even better for gamers. hUMA is the secret sauce in the PS4 and will eventually reach the desktop in 2014 with AMD’s Steamroller and Excavator products. It has the power to completely change how games are optimised for different systems and it especially benefits AMD’s hardware. Your hardware will be used more efficiently and CPU and GPU cycles won’t be wasted to copy over data or work with duplicate sets of information. Your system RAM usage may go down but as developers take advantage of this, 16 and even 32GB kits may become popular in the future.
hUMA also means we can pre-load entire games and their individual scenes into system memory and not have to wait for slow hard drive access times. Load times between levels could be non-existent because the GPU and CPU can use idle moments to instead pre-render things before they’re needed.
With the PS4 and HSA, AMD may well be well on its way to dominating the gaming scene in future. Not only will their platform be easier to code for, but Intel and Nvidia are currently not releasing any products with the same benefits. Intel’s Haswell graphics still partition off the GPU memory and Nvidia’s graphics cards won’t be compatible with hUMA. Its a great time to be an AMD fan.