CUDA vs. FPGAs for high-performance computing

A column by Kevin Morris, editor of the FPGA Journal, discusses the new Nvidia GPU offerings.  Here’s my response about why GPUs will kill-off the use of field-programmable gate arrays (FPGAs) as accelerators in high-performance computing systems.

New Nvidia GTX280 and 260 GPUs are announced!

Nvidia has announced their new GTX 280 and 260 GPU chips. The 280 and 260 increase the number of SPs up to 240 and 192 while the width of the interface to device memory has increased to 512 and 448 bits, respectively. (The older 8800 GTX has 128 SPs and a 384-bit wide memory interface.)

Here is a blog posting with a picture of the GTX 280 chip.

Apple’s competition to TBB and CUDA

Apple recently announced Grand Central and OpenCL which seem to be competitors to TBB and CUDA, respectively. Grand Central tries to make it easier to write multi-threaded apps for today’s multicore CPUs, and OpenCL (Open Computer Library) aims to make the processing power of GPUs available in general-purpose computing applications. OpenCL sounds like CUDA to me, but Steve Jobs says it’s “way beyond what Nvidia or anyone else has, and it’s really simple.” We’ll see. Here are some blog posts about Grand Central and OpenCL.

Addendum (June 18, 2008): Looks like Apple has submitted OpenCL to the Khronos Group “that aims to define a programming environment for applications running across both x86 and graphics chips”.  And here is a Wikipedia entry about OpenCL.

Threads and blocks and grids, oh my!

As an engineer, I like C because it is relatively low-level compared to other languages. This lets me infer how the C code is handled by the processor so I can make on-the-fly judgments about the efficiency of a program. For the same reason, I need a mental model of how a CUDA device is organized and how its parts operate. Read more of this post

Which is more popular? TBB or CUDA?

I’ve been monitoring the traffic on this blog since it started. (That isn’t hard – a big day has 100 page views.) Here are the accumulated hits for the blog posts I’ve made on TBB and CUDA:

Accumulated Hits
Framework #Hits Duration
TBB 1,045 120 days
CUDA 1,112 40 days

It appears that CUDA has garnered as much attention as TBB, but in a much shorter time and with far fewer posts. I’ll give three possible explanations for this:

  1. CUDA has been more visible in the tech press over the past few months, while TBB coverage has been almost non-existent.
  2. People perceive a bigger payoff from learning about CUDA (which offers much more potential parallelism with hundreds of parallel processors) than TBB (which uses the handful of cores available in today’s CPUs).
  3. My most popular posts concern setting up CUDA or TBB on Windows and getting a small example to compile. This is easy to do with TBB (after all, it’s being developed by Intel), but its hard to get the CUDA nvcc compiler integrated into Microsoft’s Visual C++ so people are looking for help with that.

What do you think? Is there some other reason I’ve missed? What’s your parallel programming framework of choice and why?

Update (6/24/2008):

Here are the updated statistics. CUDA is pulling away!

Accumulated Hits
Framework #Hits Duration
TBB 1,267 140 days
CUDA 2,722 60 days

Further Update (6/17/2010):

Let’s see where we are after two years:

Accumulated Hits
Framework #Hits
TBB 10,220
CUDA 235,010