Divergent views over supplemental compute

Published: January 23rd, 2013

- Alex Handy

After five years of work on CUDA for supplemental compute, nVIDIA is squaring off with the biggest chip-maker in the world: Intel. These two titans of processing cores are both pushing PCI-X-based methods of expanding the compute power of desktops, but their respective approaches and solutions couldn’t be any more divergent.

The new Intel Phi co-processor is quite different from an nVIDIA GPU. The Phi is a MIMD (Multiple Instruction, Multiple Data) machine, while the GPU is a SIMD (Single Instruction, Multiple Data) machine. The Phi is useful only for HPC, while the GPU can run games and graphical simulations as well. The Phi runs its own Linux and presents as a cluster, while GPUs are treated as on-device equipment and managed alongside the CPU.

But beyond the technical differences, the two companies are already showing that they have completely different approaches to the growing HPC market. Intel has focused on bringing its existing compilers and tools to this new processing platform, while nVIDIA has spent the past five years building a community around its CUDA platform, and around GPU compute in general.

Ian Buck, general manager of GPU at nVIDIA, said that “Fundamentally, for HPC, it’s about expressing the parallelism. These accelerators are designed to process this stuff in parallel.” To that end, developers using an nVIDIA GPU to run their code can generate a single thread, and have it replicated across the GPU cores into the tens of thousands of threads. He said this parallelism is easy to get with CUDA.

“If you try that on 60 cores on a chip, it’s a little less clear how you program it,” said Buck, referring to Intel’s Phi. “You can treat it like an MPI (message-passing interface) cluster on a chip. The challenge of that programming model is each core is wimpy on its own: It’s a Pentium Pro-type processor.”

But James Reinders, director of software products and multicore evangelist at Intel, said this is a benefit of the Phi, not a hindrance. Using the MPI programming model to treat the Phi as if it were a Linux cluster makes this desktop HPC environment familiar to existing HPC developers, who have been using MPI for some time.

“I think of it as SMP [symmetric multiprocessing] on a chip,” said Reinders. “You’ve got a collection of processors, a cache-based architecture with vector units, all brought together with extremely high-performance interconnect. What we’ve done is put it on a single chip. The benefit you get from that is that applications that have been written in MPI or OpenMP should see a very similar environment with Xeon Phi.”

And as for the belief that Phi is a collection of Pentium Pros, Reinders put that to rest. “First of all, they’re not Pentium Pros. They are in-order execution cores,” he said. “The Pentium Pro was our first out-of-order execution core. We’ve said in the past that it’s essentially a Pentium core, but the problem with talking about it like that is that Pentium cores didn’t have vector units. We have very wide vector units: 16 floats wide. It also has four threads per core. It has 64-bit support, machine exception handling, and power states. The Pentium had none of those.

“The only reason we ever mentioned it was sort of like a Pentium was to get people thinking about the in-order execution. It’s not as high-performance per thread. But it’s a better solution to overall power consumption.”
#!
These two companies are also elbowing each other over nVIDIA’s decision to branch out and build OpenACC, rather than build accelerator directives within the OpenMP standards process. These directives can be used to tell an instruction where to process: on the SIMD GPU or the MIMD CPU. For nVIDIA, that’s what OpenACC attempts to create. But Reinders said the OpenMP team was already working on a CPU- and GPU-compatible solution to the problem when nVIDIA went off to create its own directives standard.

Said nVIDIA’s Buck, “I think Intel and nVIDIA agree that directives are a good way to get operations off a CPU and onto a GPU by expressing regions that can be loops and use parallelism. But OpenMP takes time. They’re a standard, they have many years of legacy work to do. Instead of waiting for that to happen, we’ve been pushing OpenACC as an alternative. It supports GPU, CPU, PGI, Cray.”

But Reinders expressed frustration at nVIDIA’s decision to create OpenACC. “I do not think OpenMP is moving slowly. I think I would question anyone who would assert that,” he said.

“Standards bodies should be very careful to standardize things that we can all survive with. OpenACC is a subset of what the OpenMP committee was working on, but it was a subset designed only to service GPUs. There was very deliberate slicing of the standard so it was only to service GPUs. The OpenMP team was working on standardizing in a general way. I don’t find OpenACC to be a standard. It’s a proprietary solution for nVIDIA.

“OpenMP’s solution was the product of many companies, including nVIDIA. It was a very thoughtful solution for directives that would work both for Xeon Phi, nVIDIA GPUs, AMD GPUs, and even Intel GPUs. I think by the time it makes it into 4.0, hopefully all the technology problems will be solved, and it will be a real solution, and we can stay as far away from politicking as possible. I think OpenMP has done a stellar job of making sure the technical concerns of users are addressed, and that we get a standard that users can use.”

But Buck claimed nVIDIA has always planned on contributing the OpenACC standard and code back into OpenMP.

The tale of the tape:
Xeon Phi Co-processor 5100
8 GBs RAM
60-Core CPU
Maximum 16-channel GDDR memory interface with an option to enable ECC.
PCI Express* x16 lane Gen2 interface with optional SMBus management interface.
Node Power and Thermal Management, including power capping support.
On-board flash device that loads the coprocessor OS on boot.
Card level RAS features and recovery capabilities.
MIMD Machine
Cost: Around $1,200

nVIDIA GeForce GTX 690
CUDA Cores: 3072
Memory Speed: 6.0 (GB/sec)
Standard Memory Config: 4096MB (2048MB per GPU) GDDR5
Memory Interface Width: 512-bit (256-bit per GPU)
Memory Bandwidth: 384 (GB/sec)
SIMD Machine
Cost: Around $1,000

Article Tags

Intel, NVIDIA, supplemental compute

About Alex Handy

Alex Handy is the Senior Editor of Software Development Times.

View all posts by Alex Handy

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Divergent views over supplemental compute

Article Tags

Subscribe to SDTimes

About Alex Handy

Related Articles

AI updates from the past week: Docker MCP Catalog, Solo.io’s Agent Gateway, and AWS SWE-PolyBench — April 25, 2025

AI updates from the past week: New OpenAI models, NVIDIA AI-Q Blueprint, and Anthropic’s Google Workspace integration — April 18, 2025

Mar 21, 2025: AI updates from the past week — Anthropic web search, Gemini Canvas, new OpenAI audio models, and more

Biden administration sets new rules for exporting AI chips