Reductio ad absurdum

I used parallel_for in my first TBB program to multiply two vectors element-by-element and store the products into a third vector. But what about calculating their dot product where the vector products are added together? TBB supports this type of operation with the parallel_reduce construct.

In order to use parallel_reduce, I needed to add two methods to the object that performs the parallel operations:

  • a splitting constructor that can cut off a piece of an existing object and initialize it;
  • a join method that combines the answers of two smaller problems into a result that applies to their total.

I modified my vector multiplication program to compute the dot product using parallel_reduce. Here’s the result:

// example3.cpp : Defines the entry point for the console application.
//

#include “stdafx.h”
#include
#include “tbb/task_scheduler_init.h”
#include “tbb/parallel_reduce.h”
#include “tbb/blocked_range.h”
using namespace tbb;
using namespace std;

class vector_dot_prod {
double *const v1, *const v2; // dot-product these vectors
public:
double result; // put the result here
// constructor copies the arguments into local storage
vector_dot_prod(double *vec1, double *vec2)
: v1(vec1), v2(vec2), result(0) { }
// splitting constructor
vector_dot_prod(vector_dot_prod &vdp, split)
: v1(vdp.v1), v2(vdp.v2), result(0) { }
// overload () so it does a dot product
void operator() (const blocked_range &r) {
for(size_t i=r.begin(); i!=r.end(); i++)
result += v1[i] * v2[i];
}
// join method adds partial results
void join(vector_dot_prod &v) { result += v.result; }
};

const size_t vec_size = 100000000;

int _tmain(int argc, _TCHAR* argv[])
{
// allocate storage for vectors
double *v1, *v2, result;
v1 = (double*)malloc(vec_size*sizeof(double));
v2 = (double*)malloc(vec_size*sizeof(double));

// init and dot-prod the vectors serially
for(size_t i=0; i(0,vec_size,1000), v );

// print the result to make sure it is correct
cout << "Parallel dot product of the two vectors is " << v.result << endl; return 0; } [/sourcecode] I'll go over the changes on a line-by-line basis:Line 7: I changed the include file from parallel_for.h to parallel_reduce.h since that’s what I’m using in this program.

Line 12: I changed the class name of the object to vector_dot_prod for obvious reasons.

Line 15: I removed the product vector and added a result member to hold the scalar dot product value.

Lines 17-18: I added an initialization that clears result when the object constructor is called.

Lines 20-21: I added the new splitting constructor which accepts an existing vector_dot_prod object and creates another one with the same pointers to the input vectors while clearing the partial result scalar member. The split argument is a dummy argument that just serves to distinguish this constructor from any others. The original object and the newly constructed object will each get a subrange of the input vectors on which to compute the dot product (although there may be additional splitting of each subrange before actual computations occur). In effect, the parent object that is passed as an argument is transformed into two sibling objects that each operate over smaller subranges of the input vectors.

Lines 23-26: The () operator handles the actual dot product calculation. A subrange of each input vector is multiplied and added to the result member.

Line 28: The join method takes the partial dot product from one sibling (passed as the argument) and adds it to the result in the other sibling, which is transformed back into the parent object. Of course, this parent object may be a sibling of another object to which its result would be added. This joining process will repeat until the original object that spans the full range of the vectors is rebuilt, at which point the complete dot product will be in its result member.

Lines 41-43: The members of each vector are initialized with the value of their respective indices, and then the dot product is computed serially.

Line 46: The dot product is output so I could check the result. For the given input vectors, the dot product is going to be approximately N*N*N / 3 where N is the length of the vectors.

Lines 49-51: This computes the dot product in parallel. The task scheduler is started and then an object is constructed that holds the pointers to the input vectors. This object is passed to the parallel_reduce function along with a blocked_range object that spans the input vectors. The parallel_reduce function uses the vector_dot_prod object’s split method to create objects that span subranges of the input based on the grainsize of the blocked_range object. Then the () operator is used to create the partial dot products of each subrange. Finally, the join method is used to add all the partial dot products together.

Line 54: The result of the parallel dot product calculation is output so I could check to make sure it matched the result of the serial calculation.

When I ran the program, both the serial and parallel portions calculated a dot product of 3.33333 x 10^23, which is correct.

As with the vector product example, the dot product doesn’t do enough computations to make parallelism worthwhile so I didn’t print out the speedup. The dot product calculation serves as an easy-to-understand example of how to use parallel_reduce, but you would never parallelize this in practice.

What other types of operations can be used with parallel_reduce? Any operation that is associative, i.e. where the order of evaluation is not important. Some examples are addition, multiplication, finding the minimum or maximum of a set of numbers, and finding intersections and unions of sets.

Here’s the source code for this example if you want to try it.

About dave_vandenbout
President of XESS Corp, a manufacturer of FPGA development boards.

One Response to Reductio ad absurdum

  1. Noah says:

    Looks like this has been up for a while, but I just found it Googling around. It’s been very helpful getting started on my project. Thank you!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: