1

To optimize the execution of some libraries I am making, I have to parallelize some calculations. Unfortunately, I can not use openmp for that, so I am trying to do some similar alternative using boost::thread. Anyone knows of some implementation like this? I have special problems with the sharing of variables between threads (to define variables as 'shared' and 'pribate' of openmp). Any sugestions?

4

1 に答える 1

4

As far as I know you'll have to do that explicitly with anything other than OpenMP.

As an example if we have a parallelized loop in OpenMP

int i;
size_t length = 10000;
int someArray[] = new int[length];

#pragma omp parallel private(i)
{

    #pragma omp for schedule(dynamic, 8)
    for (i = 0; i < length; ++i) {

        someArray[i] = i*i;

    }
}

You'll have to factor out the logic into a "generic" loop that can work on a sub-range of your problem, and then explicitly schedule the threads. Each thread will then work on a chunk of the whole problem. In that way you explicitly declare the "private" variables- the ones that go into the subProblem function.

void subProblem(int* someArray, size_t startIndex, size_t subLength) {
    size_t end = startIndex+subLength;

    for (size_t i = startIndex; i < end; ++i) {
        someArray[i] = i*i;         
    }
}

void algorithm() {

    size_t i;
    size_t length = 10000;
    int someArray[] = new int[length];
    int numThreads = 4; // how to subdivide
    int thread = 0;

    // a vector of all threads working on the problem
    std::vector<boost::thread> threadVector;

    for(thread = 0; thread < numThreads; ++thread) {
        // size of subproblem
        size_t subLength = length / numThreads;
        size_t startIndex = subLength*thread;

        // use move semantics to create a thread in the vector
        // requires c++11. If you can't use c++11,
        // perhaps look at boost::move?
        threadVector.emplace(boost::bind(subProblem, someArray, startIndex, subLength));            
    }
    // threads are now working on subproblems

    // now go through the thread vector and join with the threads.
    // left as an exercise :P

}

The above is one of many scheduling algorithms- it just cuts the problem into as many chunks as you have threads.

The OpenMP way is more complicated- it cuts the problem into many small sized chunks (of 8 in my example), and then uses work-stealing scheduling to give these chunks to threads in a thread pool. The difficulty of implementing the OpenMP way, is that you need "persistent" threads that wait for work ( a thread pool ). Hope this makes sense.

An even simpler way would be to do async on every iteration (scheduling a piece of work for each iteration). This can work, if the each iteration is very expensive and takes a long time. However, if it's small pieces of work with MANY iterations, most of the overhead will go into the scheduling and thread creation, rendering the parallelization useless.

In conclusion, depending on your problem, there are be many ways to schedule the work, it's up to you to find out what works best for your problem.

TL;DR: Try Intel Threading Building Blocks (or Microsoft PPL) which schedule for you, provided you give the "sub-range" function:

http://cache-www.intel.com/cd/00/00/30/11/301132_301132.pdf#page=14

于 2012-10-08T16:07:04.567 に答える