Which is more popular? TBB or CUDA?

I’ve been monitoring the traffic on this blog since it started. (That isn’t hard – a big day has 100 page views.) Here are the accumulated hits for the blog posts I’ve made on TBB and CUDA:

Accumulated Hits
Framework #Hits Duration
TBB 1,045 120 days
CUDA 1,112 40 days

It appears that CUDA has garnered as much attention as TBB, but in a much shorter time and with far fewer posts. I’ll give three possible explanations for this:

  1. CUDA has been more visible in the tech press over the past few months, while TBB coverage has been almost non-existent.
  2. People perceive a bigger payoff from learning about CUDA (which offers much more potential parallelism with hundreds of parallel processors) than TBB (which uses the handful of cores available in today’s CPUs).
  3. My most popular posts concern setting up CUDA or TBB on Windows and getting a small example to compile. This is easy to do with TBB (after all, it’s being developed by Intel), but its hard to get the CUDA nvcc compiler integrated into Microsoft’s Visual C++ so people are looking for help with that.

What do you think? Is there some other reason I’ve missed? What’s your parallel programming framework of choice and why?

Update (6/24/2008):

Here are the updated statistics. CUDA is pulling away!

Accumulated Hits
Framework #Hits Duration
TBB 1,267 140 days
CUDA 2,722 60 days

Further Update (6/17/2010):

Let’s see where we are after two years:

Accumulated Hits
Framework #Hits
TBB 10,220
CUDA 235,010

parallel_scan finally explained!

I beat my head against parallel_scan for a week and never really understood why I was having the problems I did. Now the developers at Intel have provided a better explanation of how parallel_scan works. It turns out that the pre_scan method may never be run at all, so the final_scan method always has to re-do what was done in pre_scan just to be safe. That explains why I had to make my pre_scan and final_scan methods identical in my example program. It would have been nice if one of the developers had mentioned that within a few days of when I submitted my problem to the Intel TBB forum. Or perhaps they should have called the method pre_scan_sometimes_if_we_feel_like_it just to warn TBB-users of the actual behavior.

Anyway, problem solved.

My first CUDA program!

Note: Check out “CUDA Gets Easier” for a simpler way to create CUDA projects in Visual Studio.

I got CUDA setup and running with Visual C++ 2005 Express Edition in my previous post. Now I’ll write my first CUDA program. It’s a modification of an example program from a great series of articles on CUDA by Rob Farber published in Dr. Dobbs Journal. Rob does his examples in a make-based build environment; I’ll show how to build a CUDA program in the Visual C++ IDE. Read more of this post

Getting started with CUDA

I’m starting the CUDA portion of my parallel programming investigations.

I started reading about CUDA in early 2007. Sixteen months later, I finally have a CUDA programming environment set up under Windows XP. (A glacial pace, I admit.) I’ll describe the steps that got me there. Read more of this post

parallel_do? Parallel done!

parallel_do is a new TBB construct. It isn’t even in the Commercial Aligned or Stable releases; I had to install a Development release (tbb20_20080226oss) in order to get access to it.

The parallel_do construct is used when you don’t know how much data you have to process. parallel_do starts up tasks from a list, but these tasks can add further work to the list. parallel_do only shuts down when the list is empty and all the tasks are done. Read more of this post

Parallel sorting

After my problems with parallel_scan, I approached parallel_sort with some trepidation. I was pleasantly surprised when parallel_sort worked as advertised. (I did have a few problems, but these were related to my C++ skills and not to TBB directly.) Read more of this post

parallel_scan works … kinda, sorta

In a previous post, I showed a program that uses the parallel_scan construct but gets the wrong result. Since then, I received a working, running sum example for parallel_scan (from Mike Deskevich of ITT) that I could poke at and observe the results. Doing that I found a mistake I was making, and I found a mistake that TBB is making. Read more of this post