parallel_scan works … kinda, sorta

In a previous post, I showed a program that uses the parallel_scan construct but gets the wrong result. Since then, I received a working, running sum example for parallel_scan (from Mike Deskevich of ITT) that I could poke at and observe the results. Doing that I found a mistake I was making, and I found a mistake that TBB is making.

My mistake was creating an incorrect mental model of how reverse_join works. In my original version for the running minimum program, reverse_join stored the incoming minimum to a subrange and compared it to the minimum found within the subrange so that the updated minimum could be passed on to the next subrange:


void reverse_join (vector_run_min &vrm) {
        min_in = vrm.min_out; // min at start of subrange k = min at end of subrange k-1
        // min at end of subrange k is lesser of min at end of subranges k, k-1
        min_out = (min_out < vrm.min_out) ? min_out : vrm.min_out;
    }

It actually doesn’t have to be that complicated. The vector_run_min object needs only a single member, mmin, that stores the minimum at the end of the subrange. Then the correct version of reverse_join can be written this way:


void reverse_join (vector_run_min &vrm) {
        // min at end of subrange k is lesser of min at end of subranges k, k-1
        mmin = (mmin < vrm.mmin) ? mmin : vrm.mmin;
    }

Obviously, TBB is doing something in the background to make this work. All I can figure is that, during the final scan, each vector_min_object is passed the subrange that followed it during the pre-scan. Then mmin would hold the minimum at the start of that subrange.

The second problem I had was with the () operator that handles the pre-scan and final scan. During my debugging of my original running minimum program, it seemed that the pre-scan version of the () operator wasn’t being called at the right times. So I re-wrote the original operators:


// overload () so it finds the minimum within each subrange
    void operator() (const blocked_range<size_t> &r, pre_scan_tag) {
        for(size_t i=r.begin(); i!=r.end(); i++)
            mmin = (v[i]<mmin) ? v[i] : mmin;
    }
    // overload () so it computes the running minimum over a subrange
    void operator() (const blocked_range<size_t> &r, final_scan_tag) {
        size_t i = r.begin();
        run_min[i] = (v[i]<mmin) ? v[i] : mmin; // set running min for first entry in subrange
        for(++i; i!=r.end(); i++)
            run_min[i] = (v[i]<run_min[i-1]) ? v[i] : run_min[i-1];
    }

and made them operate identically so the actual scan phase didn’t matter:


// overload () so it finds the minimum within each subrange
    void operator() (const blocked_range<size_t> &r, pre_scan_tag) {
        // find the running minimum over the subrange
        size_t i = r.begin();
        run_min[i] = (v[i]<mmin) ? v[i] : mmin;
        for(++i; i!=r.end(); i++)
            run_min[i] = (v[i]<run_min[i-1]) ? v[i] : run_min[i-1];
        mmin = run_min[i-1]; // subrange minimum is the last entry in the running minimum
    }
    // overload () so it computes the running minimum over a subrange
    void operator() (const blocked_range<size_t> &r, final_scan_tag) {
        // find the running minimum over the subrange
        size_t i = r.begin();
        run_min[i] = (v[i]<mmin) ? v[i] : mmin;
        for(++i; i!=r.end(); i++)
            run_min[i] = (v[i]<run_min[i-1]) ? v[i] : run_min[i-1];
        mmin = run_min[i-1]; // subrange minimum is the last entry in the running minimum
    }

Now the pre-scan version of the () operator needlessly computes the running minimum while it is finding the minimum for a subrange. And the final scan version needlessly updates the minimum after it calculates the running minimum. So there is some wasted effort, but at least it works.

Now, this is not the way I wanted to do it. And I’m sure it’s not the way the TBB developers want me to do it. But it’s the only way I could do it.

So here’s the current, working program for finding the running minimum. You can also download it here. Let me know if you find a way to fix my problem with the pre-scan and final scan.

// example4.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <FLOAT.H>
#include <iostream>
#include <stdlib.h>
#include <time.h>
#include "tbb/task_scheduler_init.h"
#include "tbb/blocked_range.h"
#include "tbb/parallel_scan.h"

using namespace std;
using namespace tbb;

class vector_run_min {
    const double *const v; // vector of data
public:
    double *const run_min; // vector of running minimums of data
    double mmin;  // minimum at beginning of subrange
    // constructor copies the arguments into local storage
    vector_run_min (double *v_, double *run_min_)
        : v(v_), run_min(run_min_), mmin(FLT_MAX) { }
    // splitting constructor
    vector_run_min (vector_run_min &vrm, split)
        : v(vrm.v), run_min(vrm.run_min), mmin(FLT_MAX) { }
    // overload () so it finds the minimum within each subrange
    void operator() (const blocked_range<size_t> &r, pre_scan_tag) {
        // find the running minimum over the subrange
        size_t i = r.begin();
        run_min[i] = (v[i]<mmin) ? v[i] : mmin;
        for(++i; i!=r.end(); i++)
            run_min[i] = (v[i]<run_min[i-1]) ? v[i] : run_min[i-1];
        mmin = run_min[i-1]; // subrange minimum is the last entry in the running minimum
    }
    // join partial results starting at subrange 0, 1, 2 ... and proceeding upward
    void reverse_join (vector_run_min &vrm) {
        // min at end of subrange k is lesser of min at end of subranges k, k-1
        mmin = (mmin < vrm.mmin) ? mmin : vrm.mmin;
    }
    // overload () so it computes the running minimum over a subrange
    void operator() (const blocked_range<size_t> &r, final_scan_tag) {
        // find the running minimum over the subrange
        size_t i = r.begin();
        run_min[i] = (v[i]<mmin) ? v[i] : mmin;
        for(++i; i!=r.end(); i++)
            run_min[i] = (v[i]<run_min[i-1]) ? v[i] : run_min[i-1];
        mmin = run_min[i-1]; // subrange minimum is the last entry in the running minimum
    }
    // assign
    void assign(vector_run_min &vrm) {
        mmin = vrm.mmin;
    }
};

const size_t vec_size = 10000;

int _tmain(int argc, _TCHAR* argv[])
{
    // allocate storage for vectors
    double *v = (double*)malloc(vec_size*sizeof(double));
    double *run_min_ser = (double*)malloc(vec_size*sizeof(double));
    double *run_min_par = (double*)malloc(vec_size*sizeof(double));

    // initialize the vector with random data
    srand((unsigned)time(NULL));  // set the random number seed
    for(size_t i=0; i<vec_size; i++)  v[i] = rand();

    // serially scan vector and record running minimum at each point
    run_min_ser[0] = v[0]; // initialize first entry in running minimum
    for(size_t i=1; i<vec_size; i++)
        run_min_ser[i] = (v[i] < run_min_ser[i-1]) ? v[i] : run_min_ser[i-1]; 

    // scan vector in parallel and record running minimum at each point
    task_scheduler_init init;
    vector_run_min vrm(v,run_min_par);
    parallel_scan( blocked_range<size_t>(0,vec_size,1000), vrm );

    // compare serial and parallel versions of the running minimum
    for(size_t i=0; i<vec_size; i++)
        if(run_min_ser[i] != run_min_par[i])
        {
            cout << "Serial and parallel versions differ!" << endl;
            cout << "run_min_ser[" << i << "] = " << run_min_ser[i] << endl;
            cout << "run_min_par[" << i << "] = " << run_min_par[i] << endl;
            return 1;
        }

    cout << "Serial and parallel versions are the same!" << endl;

    // print some of the result to make sure it is correct
    cout << "Vector and its running minimum: " << endl;
    for(size_t i=0; i<10; i++) printf("%6.0f ",v[i]);
    cout << endl;
    for(size_t i=0; i<10; i++) printf("%6.0f ",run_min_ser[i]);
    cout << endl;

    return 0;
}
About these ads

About dave_vandenbout
President of XESS Corp, a manufacturer of FPGA development boards.

One Response to parallel_scan works … kinda, sorta

  1. Pingback: parallel_scan finally explained! « /// Parallel Panorama ///

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 29 other followers

%d bloggers like this: