1

I am a college freshman and Python newbie so bear with me. I am trying to parallelize some matrix operations. Here is my attempt using the ParallelPython module:

 def testfunc(connectionMatrix, qCount, iCount, Htry, tStepCount):
        test = connectionMatrix[0:qCount,0:iCount].dot(Htry[tStepCount-1, 0:iCount]) 
        return test  

    f1 = job_server.submit(testfunc, (self.connectionMatrix, self.qCount, self.iCount, self.iHtry, self.tStepCount), modules = ("scipy.sparse",))
    f2 = job_server.submit(testfunc, (self.connectionMatrix, self.qCount, self.iCount, self.didtHtry, self.tStepCount), modules = ("scipy.sparse",))
    r1 = f1()
    r2 = f2()
    self.qHtry[self.tStepCount, 0:self.qCount] = self.qHtry[self.tStepCount-1, 0:self.qCount] + self.delT * r1 + 0.5 * (self.delT**2) * r2

It seems that there is a normal curve with size of the matrix on the x-axis and the percent speed-up on the y-axis. It seems to cap out at a 30% speed increase at 100x100 matrices. Smaller and larger matrices result in less increase and with small enough and large enough matrices, the serial code is faster. My guess is that the problem lies within the passing of the arguments. The overhead of copying the large matrix is actually taking longer than the job itself. What can I do to get around this? Is there some way to incorporate memory sharing and passing the matrix by reference? As you can see, none of the arguments are modified so it could be read-only access.

Thanks.

4

1 に答える 1

1

Well, the point of ParallelPython is that you can write code that doesn't care whether it's distributed across threads, processes, or even multiple computers, and using memory sharing would break that abstraction.

One option is to use something like a file on a shared filesystem, where you mmap that file in each worker. Of course that's more complicated, and whether it's better or worse will depend on a lot of details about the filesystem, sharing protocol, and network, but it's an option.

If you're willing to give up the option of distributed processing, you can use multiprocessing.Array (or multiprocessing,Value, or multiprocessing.sharedctypes) to access shared memory. But at that point, you might want to consider just using multiprocessing instead of ParallelPython for the job distribution, since multiprocessing is part of the standard library, and has a more powerful API, and you're explicitly giving up the one major advantage of ParallelPython.

Or you could combine the two options, for the worst of both worlds in many ways, but maybe the best in terms of how little you need to change your existing code: Just use a local file and mmap it.

However, before you do any of this, you may want to consider profiling to see if copying the matrix really is the bottleneck. And, if it is, you may want to consider whether there's an algorithmic fix, just copying the part each job needs instead of copying the entire matrix. (Whether that makes sense depends on whether the part each job needs is significantly less than the whole thing, of course.)

于 2012-07-20T01:56:41.323 に答える