When it comes to high-performance computing applications, OpenMP has long been the standard open-source API for the job. But with new processors and new efforts from the companies behind those processors, OpenCL [Open Compute Language] has emerged as a new challenger in the HPC space. According to Evans’ Data Corp., OpenCL is now the second most popular HPC tool, behind Intel’s Threading Building Blocks. Evans also shows OpenCL adoption has increased since 2009.

AMD is preparing a retinue of new tools for OpenCL developers, tools it hopes will help to spread the adoption of this open framework for writing heterogeneous cluster-based applications. The aim is to compete with nVidia’s CUDA tools and framework.

The most significant and recent change to both platforms has been the unification of memory space across RAM and VRAM, allowing developers to track the stored information for their high-performance compute applications without having to use two separate maps for memory.

Now that the two systems are nearing par with each other, AMD has decided to step up its game by releasing optimization and development tools that fill in the gaps it sees in the OpenCL ecosystem.

Further proof of AMD’s newfound commitment to tools and HPC came last May 2010, when Manju Hegde left nVidia to join AMD. Hegde became AMD’s corporate vice president of its products group.

“The promise of OpenCL is that you can optimize,” he said. “To give meaning to that, we’ve developed tools.” He went on to say that much of the work in HPC is not writing the functionality, but streamlining the code to be as fast as possible.

“OpenCL’s promise is that it works across platforms,” said Hegde. “It works across CPUs, GPUs and in the low-power space. It works across vendors. To give meaning to that message, the first thing we’ve done is invested in tools that allow development across CPU and GPU.”

Right tools for the job
Those tools run the gamut from debuggers to performance profilers. The gDEbugger is designed to help developers find trouble spots in their applications, and to help them identify bottlenecks. AMD Code Analyst, on the other hand, is designed to explicitly point out those bottlenecks to developers.

Elsewhere, AMD is also releasing an LLVM extension, a kernel analyzer and an application profiler. All of these tools will be released later this year, and some will be made open source, said Hegde.

But AMD’s efforts with OpenCL don’t end with the developer. Hegde said the company is also producing its own college-level course in OpenCL, and offering the materials to professors. AMD has even partnered with an education startup to solve one of the biggest problems in computer-science classes: grading projects. This startup that automates CS project grading, provided the projects are written in C or C++. Grading occurs online, and professors simply upload submitted projects to a website.

But the real proof of OpenCL is in its use in the real world. Sean Varah, CEO of MotionDSP, recently transitioned his team from nVidia’s CUDA tools to OpenCL. “To be honest, I was really pessimistic about OpenCL,” he said.

“It basically took nVidia two years to get a stable SDK and driver out. Eighty percent of my business is with the military, so my software can’t break. It’s a problem we were having with CUDA in the early days, and if my software blows up, I can’t blame anyone else.”

Sanford Russell, director of marketing for nVidia’s CUDA platform, admitted that it took time for the CUDA tool chain to evolve. “With CUDA 1, 2 and 3.0, we were filling in major pieces of the wall. With 4.0, a lot of the technology that was from last year has been geared on how to make it easier for people to get their applications ported to the GPU, and how to make it easier to program in parallel on the GPU,” he said. Thanks to those years of development, he added, the CUDA tool chain is now mature and offers the capabilities demanded by the community.

Varah said that it takes time for any tool chain to evolve, and that when he looked at AMD’s OpenCL offerings, he expected the same timeframe. “I saw a three-year trajectory there, with at least two years for AMD to get the tools stable,” he said.

“It’s one thing to port the GPU code, it’s another to optimize the hardware. We weren’t planning on porting to OpenCL for another year. But AMD came to us and said, ‘Try it out.’ So we ported our product to OpenCL.

“The initial port wasn’t that hard, but it was buggy as hell and performance sucked. We gave frank feedback to AMD, and to their credit they listened. In November of 2010, things were looking kind of grim. But AMD jumped in on three different levels; they listened to where the bottlenecks were on performance. That three-year trajectory was turned into 1.5 years.”

Varah said that OpenCL wasn’t too difficult for his team to get acquainted with. “From our side, the actual coding in OpenCL isn’t that complicated. Certainly, changing the architecture of your code to the manycore paradigm is a big architectural change,” he said.

“That is a bit of a shift, but on the other hand we kind of had to go back and start from scratch. The initial port didn’t take a lot of time. It’s really the optimization that takes time. It’s really getting up to speed on how you can optimize toward the hardware.”

All of this is indicative of AMD’s new approach to tools and developers, said Neal Robinson, senior director of global content and application support for AMD. “We focus more on broad developer outreach,” he said. AMD won’t be offering paid-for consulting services to help optimize applications, as Intel does. Rather, AMD is working with its ecosystem of partners so that third parties can offer this type of support. Additionally, the forthcoming OpenCL tools will be made available to developers for free.

AMD’s Hegde said that much of the work that remains to be done is at the compiler level. “There will be lots of compiler work because we want to always stay true to the OpenCL promises of being cross-vendor and cross-platform,” he said. “We have to go from an intermediate layer to target all the ISAs. There is tool work that needs to be done there. Our first reference compiler will be for C++.”

Additionally, he said, LLVM will be the first compiler to receive tooling from AMD. He said that the architecture of the LLVM extensions for OpenCL will be made available, and thus should allow other compiler teams to integrate support on their own time.