For a lot of people, disruptive technology makes them uncomfortable. It changes the rules. It takes you out of your comfort zone. Not me—I love it.
Traditionally, the way we think about software development is that you have a CPU (central processing unit) and a GPU (graphics processing unit), and each is good at handling certain sets of tasks. The x86 CPU is good at executing a sequence of stored instructions, and even has the ability to handle multiple instructions simultaneously (instruction-level parallelism). The GPU is a powerful vector engine that can operate simultaneously on multiple data items (data parallelism). Developers have used the CPU for handling the majority of computational tasks, and the GPU for visualization and graphics-related functions. But it’s time to forget conventional wisdom and to rethink everything.
The GPU isn’t just for visualization anymore. The world is waking up to the fact that GPUs are high-performance, many-core processors that can be used to accelerate a wide range of applications. This knowledge is driving a shift in how applications are developed and accelerated.
This shift began with hardware optimizations in the GPU that enable software to harness its processing power for more general data structures. Microsoft’s Windows 7 and Apple’s Snow Leopard are prime examples of software designed to leverage next-generation graphics technology. And now the software development community has to figure out how to harness the GPU for computational tasks that can benefit from its very high computation and data throughput capabilities.
Let’s take a quick look at tools for programming the GPU. We have moved beyond the era where programming the GPU was a “bare metal” experience. Graphics libraries like Microsoft DirectX and OpenGL are widely used today for “traditional” GPU functions, such as creating 2D and 3D graphics and performing multimedia operations like image rendering, video transcoding, and even HD video conferencing.
DirectX is a set of low-level application programming interfaces (APIs) that includes support for high-performance 2D and 3D graphics, sound, and input. Components of DirectX include Direct2D, Direct3D and DirectCompute (more about that later). Several of these APIs use the GPU for hardware acceleration of tasks. For example, Direct2D is an immediate-mode, 2D graphics API introduced with DirectX 11 that utilizes hardware acceleration for high-performance and high-quality rendering for menus, user-interface elements, and heads-up displays. These APIs are used with Microsoft’s developer tools, mainly Visual Studio.
OpenGL is a cross-platform, low-level procedural API maintained by The Khronos Group. It is used to develop portable, interactive 2D and 3D computer graphics applications, and it offers a broad base of rendering, texture mapping, special effects and other powerful visualization functions.
OpenGL is a specification based on the C programming language, and can be accessed by other programming languages with the proper bindings (which exist for Fortran, BASIC, Visual Basic, Python, Perl, Java, Ruby and C#, to name a few). Like DirectX, OpenGL will offload computation to the GPU in certain tasks.
But what if you want to use the GPU to perform computational tasks traditionally handled by the CPU? GPUs have a parallel throughput architecture that executes many concurrent threads. The addition of programmable stages to the GPU allows software to use it for non-graphics data. And the benefits can be astounding. It’s not unheard of to have speed-ups of the magnitude of 100X or higher using a GPU compared against a CPU for certain algorithms (depending on the hardware). Some applications that are well-suited to GPU computing include image, video, and audio processing; sophisticated user interfaces; and simulation and modeling used by the financial, academic, and technical communities.
As the powerful computational capabilities of the GPU have been exposed, we have seen developer tools evolving that leverage these capabilities, including Microsoft’s DirectCompute APIs, nVidia’s CUDA Toolkit, and OpenCL.
DirectCompute is a Microsoft DirectX-specific API that provides a more flexible way for developers to access the computational capability of GPUs that support DirectX 10 and DirectX 11 with Windows Vista and Windows 7. Key features of DirectCompute’s compute shader include explicit thread dispatch, communication of data between threads, and a rich set of primitives for random access and streaming I/O operations. Programmers will need to use Visual Studio when writing code that uses DirectCompute APIs.
CUDA gives developers access to the virtual instruction set and memory of the parallel computational elements in nVidia’s CUDA GPUs. It offers a set of tools, libraries and C language extensions that let developers have a more generalized and lower-level access to hardware than typical graphics libraries. Programmers use “C for CUDA” (C with NVIDIA extensions), compiled through a PathScale Open64 C compiler, to code algorithms for execution on nVidia GPUs.
OpenCL is a framework for writing parallel programs that execute across heterogeneous platforms consisting of CPUs, GPUs and other processors. It uses task-based and data-based parallelism to give an application access to GPUs for non-graphical computing, and enables a programmer to write a base of code that can use both CPU and GPU capabilities. OpenCL includes a language (based on C99) for writing kernels, which are functions that execute on OpenCL devices, plus APIs that are used to define and then control the platforms.
However, simply introducing software tools doesn’t mean the shift to offloading code to the GPU for general computing tasks happens overnight. Platform and chip makers must be incredibly engaged with the software community to ensure the right drivers are available, code libraries are robust, and education is in place so that there is an understanding about how to work with new hardware and programming models.
What all of this adds up to is that the software applications we all know and love—and the ones that have yet to even be created—are about to get a lot better. When the CPU and GPU are utilized to their full capabilities, and then utilized in harmony, the pace of innovation starts to accelerate in an amazing way. But beware: It disrupts everything you thought you knew. Don’t worry, this is the fun part. Hope you agree.
Margaret Lewis is director of software product marketing for AMD.