parallel_scan works … kinda, sorta

In a previous post, I showed a program that uses the parallel_scan construct but gets the wrong result. Since then, I received a working, running sum example for parallel_scan (from Mike Deskevich of ITT) that I could poke at and observe the results. Doing that I found a mistake I was making, and I found a mistake that TBB is making.

My mistake was creating an incorrect mental model of how reverse_join works. In my original version for the running minimum program, reverse_join stored the incoming minimum to a subrange and compared it to the minimum found within the subrange so that the updated minimum could be passed on to the next subrange:

void reverse_join (vector_run_min &vrm) {
min_in = vrm.min_out; // min at start of subrange k = min at end of subrange k-1
// min at end of subrange k is lesser of min at end of subranges k, k-1
min_out = (min_out < vrm.min_out) ? min_out : vrm.min_out; }[/sourcecode] It actually doesn't have to be that complicated. The vector_run_min object needs only a single member, mmin, that stores the minimum at the end of the subrange. Then the correct version of reverse_join can be written this way:

void reverse_join (vector_run_min &vrm) {
// min at end of subrange k is lesser of min at end of subranges k, k-1
mmin = (mmin < vrm.mmin) ? mmin : vrm.mmin; }[/sourcecode] Obviously, TBB is doing something in the background to make this work. All I can figure is that, during the final scan, each vector_min_object is passed the subrange that followed it during the pre-scan. Then mmin would hold the minimum at the start of that subrange.

The second problem I had was with the () operator that handles the pre-scan and final scan. During my debugging of my original running minimum program, it seemed that the pre-scan version of the () operator wasn’t being called at the right times. So I re-wrote the original operators:

// overload () so it finds the minimum within each subrange
void operator() (const blocked_range &r, pre_scan_tag) {
for(size_t i=r.begin(); i!=r.end(); i++)
mmin = (v[i] &r, final_scan_tag) {
size_t i = r.begin();
run_min[i] = (v[i] &r, pre_scan_tag) {
// find the running minimum over the subrange
size_t i = r.begin();
run_min[i] = (v[i] &r, final_scan_tag) {
// find the running minimum over the subrange
size_t i = r.begin();
run_min[i] = (v[i]() operator needlessly computes the running minimum while it is finding the minimum for a subrange. And the final scan version needlessly updates the minimum after it calculates the running minimum. So there is some wasted effort, but at least it works.

Now, this is not the way I wanted to do it. And I’m sure it’s not the way the TBB developers want me to do it. But it’s the only way I could do it.

So here’s the current, working program for finding the running minimum. You can also download it here. Let me know if you find a way to fix my problem with the pre-scan and final scan.

// example4.cpp : Defines the entry point for the console application.
//

#include “stdafx.h”
#include
#include
#include
#include
#include “tbb/task_scheduler_init.h”
#include “tbb/blocked_range.h”
#include “tbb/parallel_scan.h”

using namespace std;
using namespace tbb;

class vector_run_min {
const double *const v; // vector of data
public:
double *const run_min; // vector of running minimums of data
double mmin; // minimum at beginning of subrange
// constructor copies the arguments into local storage
vector_run_min (double *v_, double *run_min_)
: v(v_), run_min(run_min_), mmin(FLT_MAX) { }
// splitting constructor
vector_run_min (vector_run_min &vrm, split)
: v(vrm.v), run_min(vrm.run_min), mmin(FLT_MAX) { }
// overload () so it finds the minimum within each subrange
void operator() (const blocked_range &r, pre_scan_tag) {
// find the running minimum over the subrange
size_t i = r.begin();
run_min[i] = (v[i] &r, final_scan_tag) {
// find the running minimum over the subrange
size_t i = r.begin();
run_min[i] = (v[i](0,vec_size,1000), vrm );

// compare serial and parallel versions of the running minimum
for(size_t i=0; i

About dave_vandenbout
President of XESS Corp, a manufacturer of FPGA development boards.

2 Responses to parallel_scan works … kinda, sorta

  1. cars says:

    Buying a used car can be tricky, no matter how much you already
    know about cars. There are lots of different things to consider so
    that you don’t end up buying a piece of junk that breaks down right away. Use some great tips of the trade in the following article to help you make your next car choice.

  2. Pingback: parallel_scan finally explained! « /// Parallel Panorama ///

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: