The first three versions were about getting the features developers need into their hands. Now, NVIDIA is refining its GPU application development tools with a single unified memory base in the late February release of CUDA 4.0. NVIDIA hopes that this new version will make it easier for developers to build applications that can take advantage of their video cards to provide extra processing power.

Until now, the CUDA development process has required some tough compromises from developers. Because CUDA enables the use of the NVIDIA graphics processor in a computer, developers using the system were required to keep track of two forms of memory: GPU VRAM and traditional system memory. And because GPUs can be stacked up inside of a computer, developers were potentially tracking half a dozen different stacks of memory.

CUDA 4.0 removes this burden and gives developers a unified memory base to work with. Sanford Russell, director of CUDA marketing, said that this new capability allows for “peer-to-peer memory access within the node.

“If you have two GPUs in a node, the way this worked prior to 4.0, you literally had to copy the objects to main memory through the CPU, then copy it back out and put it on GPU No. 2. There were a whole bunch of extra steps required. We now have a peer-to-peer capability, where it literally is a copy from memory to memory. It goes across the PCIX bus, and it’s no longer going to system memory.”

That capability is extended thanks to virtual addressing. Russell said CUDA 4.0 now has “unified virtual addressing. With two GPUs and a CPU, each one has its own memory and address space. As a developer, you’ve got multiple pools of memory in that system. With unified virtual addressing, it’s now one single address space.”

In addition, version 4.0 of the CUDA tool set also includes more robust optimization options, thanks to a new tool that can suggest methods for getting around bottlenecks.

Finally, Russell said that CUDA 4.0 shows an increased consideration for C++ programmers. While previous versions have addressed most of the concerns of C developers, he said, the new C++-focused libraries offer developers some prepared algorithms for use in the GPU.