parallel_do? Parallel done!

parallel_do is a new TBB construct. It isn’t even in the Commercial Aligned or Stable releases; I had to install a Development release (tbb20_20080226oss) in order to get access to it.

The parallel_do construct is used when you don’t know how much data you have to process. parallel_do starts up tasks from a list, but these tasks can add further work to the list. parallel_do only shuts down when the list is empty and all the tasks are done.

As an example of using parallel_do, the following program works its way through a tree structure by processing a node and then adding further tasks for each child node. It doesn’t do much with each node, but it does show how to get parallel_do started and how to feed new tasks to it.

// example6.cpp : Defines the entry point for the console application.

#include “stdafx.h”
#include “tbb/task_scheduler_init.h”
#include “tbb/parallel_do.h”
#include “tbb/tick_count.h”

using namespace tbb;
using namespace std;

class tree {
tree *parent; // parent node of this node
vector children; // children of this node
unsigned depth; // depth of this node below root
string id; // identifier for this node
tick_count start, stop; // computation start & stop times
// construct an empty tree
tree() { parent=NULL; children.clear(); id=””; }
// add a tree as a child of this node
void add_tree(tree *t_) { children.push_back(t_); }
// build a tree of a given depth with a given #children per node
tree* build_tree(tree *parent_, unsigned width_, unsigned depth_) {
parent = parent_;
if(depth_ > 0) // stop descent when depth is zero
for(unsigned i=0; ibuild_tree(this,width_,depth_-1));
return this;
// stuff for parallel_do follows…
typedef tree* argument_type; // typedef for function object
void operator()(argument_type item,
parallel_do_feeder& feed_it)
const {
item->start = tick_count::now(); // record start time
char tmp_id[20]; // temporary string for use with tmpnam
if(item->parent == NULL) { // this is the root of the tree
item->depth = 0; // its depth is zero
item->id = tmpnam(tmp_id); // assign unique string to root id
else { // this is a node within the tree
item->depth = item->parent->depth + 1; // depth is one more than parent
// node id is parent id with a unique string appended
item->id = item->parent->id;
item->id += tmpnam(tmp_id);
// dummy computation that increases with node depth
volatile float v=1, sum=0;
for(unsigned i=0; i<(item->depth+1)*(INT_MAX/1000); i++) sum += v;
// done with this node, so add children to parallel_do task list
for(unsigned i=0; ichildren.size(); i++)
item->stop = tick_count::now(); // record end time

// print a tree structure
ostream& operator<< (ostream &os, tree *t) { for(unsigned i=0; idepth; i++) os << " "; os << "(" << t->id.c_str();
os << " , " << (t->stop-t->start).seconds();
os << (t->children.size() ? “:\n” : “”);
for(unsigned i=0; ichildren.size(); i++)
os << t->children[i];
for(size_t i=0; idepth; i++) os << " "; os << ")\n"; return os; } int _tmain(int argc, _TCHAR* argv[]) { vector tv(1);
tv[0] = (new tree)->build_tree(NULL,3,3);
task_scheduler_init init;
parallel_do(tv.begin(), tv.end(), tree());
cout << tv[0] << endl; return 0; } [/sourcecode] I'll start the explanation in _tmain. A single-element vector is created and initialized with a pointer to a tree object (lines 78-79). Each node in the tree has three children and the tree goes three deep starting from the root. Then, as usual, the TBB task scheduler is started (line 80). Next, parallel_do is passed two iterators pointing to the beginning and end of the vector and a tree object that provides access to a function object which is applied to each element in the vector (line 81). After everything is processed, the tree is displayed (line 82) using a recursive routine that prints the tree in an indented format (lines 63-74).

The tree object is the heart of this program. It contains some members (lines 18-22), most importantly the pointer from a node to its parent (line 18) and pointers to each of its children (line 19) that keep the tree tied together. There is a constructor for an empty tree (line 24) and a method for adding a tree to a node as a child (line 26). This is used by the build_tree method to construct regular trees of a given depth and with a certain number of children per node (lines 28-34). These are all things that might be found in any tree object.

The remainder of the tree object supports the use of parallel_do. The argument_type is defined to be a pointer to a tree (line 36), mirroring what is stored in the vector object passed to parallel_do. This is also the type of the first argument to the () operator which defines the function object called for each entry in the vector (lines 37-60). This operator records the time the operator starts (line 40) and ends (line 58). In between, the operator sets the depth of the node (0 if its the root, parent-depth+1 if its a child node) and its identifier (tmpnam generates a unique identifier that is appended to the identifier of the node’s parent). Then a dummy computation is performed where the number of iterations increases with the depth of the node (lines 53-54). Finally, the node places pointers to each of its children onto the task list using the feed_it function object (lines 56-57) that was passed in as the second argument to the()operator. A template provided by TBB is used to declare this function object as parallel_do_feeder<tree*>& (line ??).

An example of the output from this program is shown below. Each line shows the identifier and duration of a node, with the indentation reflecting the depth of the node in the tree. We can see the node identifier starts with the identifier of its parent followed by a unique string. And the duration of the computation performed by each node increases with the depth of the node. So parallel_do seems to be processing the nodes as I wanted.

\s5qk. , 0.0140926:
    (\s5qk.\s5qk.r , 0.0271141:
        (\s5qk.\s5qk.r\s5qk.14 , 0.0409479:
            (\s5qk.\s5qk.r\s5qk.14\s5qk.17 , 0.056522)
            (\s5qk.\s5qk.r\s5qk.14\s5qk.15 , 0.0583038)
            (\s5qk.\s5qk.r\s5qk.14\s5qk.16 , 0.0584535)
        (\s5qk.\s5qk.r\s5qk.t , 0.04395:
            (\s5qk.\s5qk.r\s5qk.t\s5qk.12 , 0.0590581)
            (\s5qk.\s5qk.r\s5qk.t\s5qk.10 , 0.0577009)
            (\s5qk.\s5qk.r\s5qk.t\s5qk.v , 0.0568818)
        (\s5qk.\s5qk.r\s5qk.s , 0.0432876:
            (\s5qk.\s5qk.r\s5qk.s\s5qk.13 , 0.0587647)
            (\s5qk.\s5qk.r\s5qk.s\s5qk.11 , 0.0577445)
            (\s5qk.\s5qk.r\s5qk.s\s5qk.u , 0.0592897)

It is important to note that I created a vector of tree pointers to pass to parallel_do, and I also specified a tree pointer as the argument_type. This program would not have worked if I used the tree objects directly. That’s because parallel_do makes multiple copies of the objects as it processes them and passes these objects by value to the () operator, so any modifications made to the object are lost upon leaving. By using pointers, the results have a safe, permanent place where they can be stored.

Here’s the source code for this example if you want to try it.

This entry completes my survey of the parallel_* constructs in TBB. (I know, some of you are asking: “What about parallel_while?” Well, parallel_while is deprecated for future use. It isn’t even included in the documentation anymore. So it may be around a bit longer, but I’m not going to help keep it alive.)

About dave_vandenbout
President of XESS Corp, a manufacturer of FPGA development boards.

One Response to parallel_do? Parallel done!

  1. Mohan Radhakrishnan says:

    Very useful.
    I am a java programmer and I just read Brian Goetz’s book titled ‘Concurrency if practice”. Where can I read TBB concurrency patterns like those explained in that book ? I am just trying to understand TBB without writing complex calculations. I am looking for high level API constructs like java.util.concurrent.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: