Scanners? Aren’t those the guys that make your head explode?

Back in the 80’s, there was a movie called “Scanners”. Scanners were mutants that could think intensely about you with a very constipated look and veins standing out on their faces. Then your head would explode. That’s the way the parallel_scan construct makes me feel.

There is no example for parallel_scan in the TBB tutorial, but there is a terse explanation in the reference manual. Basically, parallel_scan breaks a range into subranges and computes a partial result in each subrange in parallel. Then, the partial result for subrange k is used to update the information in subrange k+1, starting from k=0 and proceeding sequentially up to the last subrange. Then each subrange uses its updated information to compute its final result in parallel with all the other subranges.

That’s suitably abstract, and it took me over an hour to get that much out of it. In order to get a better feel for parallel_scan, I tried to build a program that would compute the running minimum for a floating-point vector. Each location i in the running minimum stores the minimum value of the original vector found between locations 0 and i . The algorithm has the following phases:

  1. The vector is divided into subranges.
  2. A pre-scan version of the () operator finds the minimum value in each subrange.
  3. A reverse_join method communicates the minimum found in subrange k over to subrange k +1, starting from k=0. This tells subrange k +1 what the minimum value is in the entire vector that precedes it.
  4. A final scan version of the () operator computes the running minimum over each subrange using the minimum passed in by the reverse_join as the starting minimum.

The algorithm seems straight-forward. Here’s the resulting program:

// example4.cpp : Defines the entry point for the console application.
//

#include “stdafx.h”
#include
#include
#include
#include
#include “tbb/task_scheduler_init.h”
#include “tbb/parallel_scan.h”
#include “tbb/blocked_range.h”
using namespace tbb;
using namespace std;

class vector_run_min {
const double *const v; // vector of data
public:
double *const run_min; // vector of running minimums of data
double min_in; // minimum at beginning of subrange
double min_out; // minimum at end of subrange
// constructor copies the arguments into local storage
vector_run_min (double *v_, double *run_min_)
: v(v_), run_min(run_min_), min_in(FLT_MAX), min_out(FLT_MAX) { }
// splitting constructor
vector_run_min (vector_run_min &vrm, split)
: v(vrm.v), run_min(vrm.run_min), min_in(FLT_MAX), min_out(FLT_MAX) { }
// overload () so it finds the minimum within each subrange
void operator() (const blocked_range &r, pre_scan_tag) {
for(size_t i=r.begin(); i!=r.end(); i++)
min_out = (v[i] &r, final_scan_tag) {
size_t i = r.begin();
run_min[i] = (v[i](0,vec_size,1000), vrm );

// print some of the result to make sure it is correct
cout << "Vector and its running minimum (parallel): " << endl; for(size_t i=0; i<10; i++) printf("%6.0f ",v[i]); cout << endl; for(size_t i=0; i<10; i++) printf("%6.0f ",run_min_par[i]); cout << endl; // compare serial and parallel versions of the running minimum for(size_t i=0; i_tmain (lines 57-103), the running minimum for a vector of random values is computed in serial and parallel. The parallel running minimum calculation is initiated on lines 81-83. A task scheduler object is initialized. Then a vector_run_min object is constructed from the pointers to the vectors that store the data and the running minimum. Then that object is passed to the parallel_scan subroutine along with a blocked_range object that will partition the range into smaller parcels.

The vector_run_min object (lines 15-53) has both standard and splitting constructors just like those found when using parallel_reduce. A vector_run_min object for a given subrange stores the minimum value found across all the subranges that precede it (min_in), and it stores the updated minimum value after looking at the data within its subrange (min_out). The min_in and min_out values are initialized to the maximum floating-point value whenever a subrange is constructed. (Conversely, if I was doing a running maximum, then I would initialize these to the minimum floating-point value.)

The () operator is used to find the minimum within a subrange (lines 28-32). A dummy argument, pre_scan_tag, is used to identify this as the operator to apply during the pre-scan portion of the algorithm. The minimum value in the subrange is stored in the min_out member.

After the pre-scan () operator is applied in parallel to all the subranges, the reverse_join method is applied sequentially to the subranges (lines 34-39). The minimum value coming into a subrange from the preceding subranges is stored in the min_in member. Then the minimum value including the values in the current subrange is stored in the min_out member. This value is passed on to the next subrange, thus updating and propagating the correct minimum value.

The () operator is used again to calculate the running minimum within a subrange (lines 41-47). The final_scan_tag dummy argument identifies this as the appropriate operator for the final scan. The minimum value found up to the start of the current subrange is compared to the first element of the current subrange to get the first element of the running minimum. Then the running minimum is calculated at each point in the subrange in the same way as for the serial version (lines 69-71).

The logic of this program looks correct, but it is not working. I ran the serial and parallel versions and their answers are different. For a ten-thousand element vector, the blocked_range object partitions are [0,625), [625,1250), [1250,2500), [2500, 5000), and [5000, 10000), and the first difference always occurs at location 625, right where the first two subranges meet. It appears the reverse_join is not passing the minimum from the previous subrange on to the next subrange.

I’ve tried this program using both release and development versions of TBB, and the result is always the same. After several days, I stopped trying to fix it. Here’s the source code for this example if you want to try it. Let me know if you find the error in my code.

About dave_vandenbout
President of XESS Corp, a manufacturer of FPGA development boards.

3 Responses to Scanners? Aren’t those the guys that make your head explode?

  1. Pingback: parallel_scan finally explained! « /// Parallel Panorama ///

  2. Pingback: /// Parallelism Panorama ///

  3. Pingback: parallel_scan works … kinda, sorta « /// Parallelism Panorama ///

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: