Intel today announced that it will begin to offer a new coprocessing unit to developers in late January. Known as the Intel Xeon Phi Coprocessor, this 60-core PCIX card allows developers to push extra processor power into their high-performance applications without having to learn a new development environment and language, such as nVidia’s CUDA or the up-and-coming OpenCL.
James Reinders, chief evangelist and director of marketing for Intel’s software development products, said that the Phi offers many advantages over typical GPU compute.
“We’ve seen a phenomenon where people have taken GPUs and found they can take a sliver of the HPC market, and can also see performance advantages per watt,” he said. “That gave us inspiration that the interest is there at taking on certain workloads, but we think that there’s a great opportunity here to design a device that’s more programmable, more flexible.”
To wit, Reinders said that GPU computing is limited by the capabilities of the GPU itself. Those cards and chips were designed to do only geometric computation. That means GPUs are extremely limited in how they compare to a standard x86 processor.
“I never like to say ‘hard’ or ‘easy’ to describe programming,” said Reinders. “Writing in CUDA is pretty restrictive because of the restrictions the hardware has. You see some of the queues they’re putting in to give a little more flexibility. I think those baby steps aren’t going to solve the problem well enough. The hardware needs to be very flexible.”
Reinders said that Phi will work with parallel applications as well as those built using Intel development tools, such as Threaded Building Blocks and Intel Cluster Studio XE. “If you look at Cluster Studio XE, it’s aimed at people using OpenMP and MPI. We’re continuing to try to support those sorts of standards. We’ll continue investing heavily in that. That’s where most of the action is.”
All of this brings up a major question, however, about the standards and tools being used in HPC. While MPI and OpenMP continue to evolve, AMD and nVidia have sought new standards to support their GPU computing capabilities. AMD has embraced OpenCL, and is now pushing to include support for GPU compute in Java itself.
nVidia is pushing for its own standard known as OpenACC, which started out as an effort inside OpenMP to add offload directives, sometimes known as accelerator directives, to the API.
Duncan Poole, senior manager of HPC at nVidia, said that “OpenACC is simple compiler hints inserted into the code to tell the compiler how to move data from CPU instructions to an accelerator, and run those segments of code on the accelerator and return that code to the CPU. It’s launching a kernel and returning a result. OpenACC compilers should target AMD, nVidia and soon Intel in both Fortran and C.”
But Reinders felt that OpenACC is pointless. He said OpenMP is on its way toward implementing offload directives, and that it’s on almost the same timeline as OpenACC. Poole did say that the plan is to bring OpenACC back to be integrated with OpenMP.
But while Poole insisted that OpenACC will include other vendors, Reinders said that “OpenACC is a subset of what OpenMP was taking a look at. It only looks at nVidia GPUs.”
And that is emblematic of the problems in GPU compute, he said. The real secret behind CUDA and OpenCL is that developers have to start from scratch and learn new languages and environments to take advantage of GPU compute. But with Intel’s Phi coprocessor, developers can continue to work with the tools and languages they know.
Reinders said that the Xeon Phi Coprocessor will ship by the end of January.