/// Parallel Panorama ///

December 19, 2008

Nvidia GTX 295 GPU with 480 Cores!

Filed under: GPU — llpanorama @ 10:55 pm

The title says it all.  Read a bit more about it here.

June 19, 2008

CUDA vs. FPGAs for high-performance computing

Filed under: GPU — Tags: — llpanorama @ 5:50 am

A column by Kevin Morris, editor of the FPGA Journal, discusses the new Nvidia GPU offerings.  Here’s my response about why GPUs will kill-off the use of field-programmable gate arrays (FPGAs) as accelerators in high-performance computing systems.

June 16, 2008

New Nvidia GTX280 and 260 GPUs are announced!

Filed under: GPU — Tags: — llpanorama @ 5:09 pm

Nvidia has announced their new GTX 280 and 260 GPU chips. The 280 and 260 increase the number of SPs up to 240 and 192 while the width of the interface to device memory has increased to 512 and 448 bits, respectively. (The older 8800 GTX has 128 SPs and a 384-bit wide memory interface.)

Here is a blog posting with a picture of the GTX 280 chip.

June 13, 2008

Apple’s competition to TBB and CUDA

Filed under: GPU, multicore — Tags: , — llpanorama @ 12:11 pm

Apple recently announced Grand Central and OpenCL which seem to be competitors to TBB and CUDA, respectively. Grand Central tries to make it easier to write multi-threaded apps for today’s multicore CPUs, and OpenCL (Open Computer Library) aims to make the processing power of GPUs available in general-purpose computing applications. OpenCL sounds like CUDA to me, but Steve Jobs says it’s “way beyond what Nvidia or anyone else has, and it’s really simple.” We’ll see. Here are some blog posts about Grand Central and OpenCL.

Addendum (June 18, 2008): Looks like Apple has submitted OpenCL to the Khronos Group “that aims to define a programming environment for applications running across both x86 and graphics chips”.  And here is a Wikipedia entry about OpenCL.

June 11, 2008

Threads and blocks and grids, oh my!

Filed under: GPU — Tags: — llpanorama @ 2:49 pm

As an engineer, I like C because it is relatively low-level compared to other languages. This lets me infer how the C code is handled by the processor so I can make on-the-fly judgments about the efficiency of a program. For the same reason, I need a mental model of how a CUDA device is organized and how its parts operate. (more…)

June 4, 2008

Which is more popular? TBB or CUDA?

Filed under: GPU, multicore — Tags: — llpanorama @ 9:28 am

I’ve been monitoring the traffic on this blog since it started. (That isn’t hard – a big day has 100 page views.) Here are the accumulated hits for the blog posts I’ve made on TBB and CUDA:

Accumulated Hits
Framework #Hits Duration
TBB 1045 120 days
CUDA 1112 40 days

It appears that CUDA has garnered as much attention as TBB, but in a much shorter time and with far fewer posts. I’ll give three possible explanations for this:

  1. CUDA has been more visible in the tech press over the past few months, while TBB coverage has been almost non-existent.
  2. People perceive a bigger payoff from learning about CUDA (which offers much more potential parallelism with hundreds of parallel processors) than TBB (which uses the handful of cores available in today’s CPUs).
  3. My most popular posts concern setting up CUDA or TBB on Windows and getting a small example to compile. This is easy to do with TBB (after all, it’s being developed by Intel), but its hard to get the CUDA nvcc compiler integrated into Microsoft’s Visual C++ so people are looking for help with that.

What do you think? Is there some other reason I’ve missed? What’s your parallel programming framework of choice and why?

Update (6/24/2008):

Here are the updated statistics.  CUDA is pulling away!

Accumulated Hits
Framework #Hits Duration
TBB 1267 140 days
CUDA 2722 60 days

May 22, 2008

parallel_scan finally explained!

Filed under: multicore — Tags: — llpanorama @ 11:59 am

I beat my head against parallel_scan for a week and never really understood why I was having the problems I did. Now the developers at Intel have provided a better explanation of how parallel_scan works. It turns out that the pre_scan method may never be run at all, so the final_scan method always has to re-do what was done in pre_scan just to be safe. That explains why I had to make my pre_scan and final_scan methods identical in my example program. It would have been nice if one of the developers had mentioned that within a few days of when I submitted my problem to the Intel TBB forum. Or perhaps they should have called the method pre_scan_sometimes_if_we_feel_like_it just to warn TBB-users of the actual behavior.

Anyway, problem solved.

May 21, 2008

My first CUDA program!

Filed under: GPU — Tags: — llpanorama @ 9:04 am

I got CUDA setup and running with Visual C++ 2005 Express Edition in my previous post. Now I’ll write my first CUDA program. It’s a modification of an example program from a great series of articles on CUDA by Rob Farber published in Dr. Dobbs Journal. Rob does his examples in a make-based build environment; I’ll show how to build a CUDA program in the Visual C++ IDE. (more…)

April 24, 2008

Getting started with CUDA

Filed under: GPU — Tags: — llpanorama @ 9:34 am

I’m starting the CUDA portion of my parallel programming investigations.

I started reading about CUDA in early 2007. Sixteen months later, I finally have a CUDA programming environment set up under Windows XP. (A glacial pace, I admit.) I’ll describe the steps that got me there. (more…)

March 9, 2008

parallel_do? Parallel done!

Filed under: multicore — Tags: — llpanorama @ 10:13 am

parallel_do is a new TBB construct. It isn’t even in the Commercial Aligned or Stable releases; I had to install a Development release (tbb20_20080226oss) in order to get access to it.

The parallel_do construct is used when you don’t know how much data you have to process. parallel_do starts up tasks from a list, but these tasks can add further work to the list. parallel_do only shuts down when the list is empty and all the tasks are done. (more…)

Older Posts »

Blog at WordPress.com.