parallel_scan finally explained!

I beat my head against parallel_scan for a week and never really understood why I was having the problems I did. Now the developers at Intel have provided a better explanation of how parallel_scan works. It turns out that the pre_scan method may never be run at all, so the final_scan method always has to re-do what was done in pre_scan just to be safe. That explains why I had to make my pre_scan and final_scan methods identical in my example program. It would have been nice if one of the developers had mentioned that within a few days of when I submitted my problem to the Intel TBB forum. Or perhaps they should have called the method pre_scan_sometimes_if_we_feel_like_it just to warn TBB-users of the actual behavior.

Anyway, problem solved.

My first CUDA program!

Note: Check out “CUDA Gets Easier” for a simpler way to create CUDA projects in Visual Studio.

I got CUDA setup and running with Visual C++ 2005 Express Edition in my previous post. Now I’ll write my first CUDA program. It’s a modification of an example program from a great series of articles on CUDA by Rob Farber published in Dr. Dobbs Journal. Rob does his examples in a make-based build environment; I’ll show how to build a CUDA program in the Visual C++ IDE. Read more of this post