Here is a function to execute an array of commands in parallel and return an array of results, optionally filtered by a callback function and a limit to the number of processes, it runs great on my Mac in Lion:
// executes commands in parallel, limited by the number of file descriptors or maxProcesses.
// pass an optional callback to filter each result (return a non-null value to stop execution).
function execParallel($commands, $callback = null, $maxProcesses = PHP_INT_MAX)
{
$handles = array();
$results = array();
for($i = 0; $i < count($commands) || count($handles);)
{
if($i < count($commands) && count($handles) < $maxProcesses)
if(($handle = @popen($commands[$i].' 2> /dev/null &', "r")) !== false)
{
stream_set_blocking($handle, 0); // does nothing on Windows
$handles[$i] = $handle;
$results[$i] = '';
$i++;
}
foreach($handles as $key => $handle)
{
$results[$key] .= fread($handle, 4096);
if(feof($handle))
{
pclose($handles[$key]);
unset($handles[$key]);
if($callback)
if($callback($results[$key], $key))
break;
}
}
}
foreach($handles as $key => $handle) // clear any incomplete tasks
{
pclose($handles[$key]);
unset($handles[$key]);
unset($results[$key]);
}
return $results;
}
Here is an example of calling it with an anonymous function and a limit of 64 threads in php 5.3:
$commands = array_fill(0, 256, 'sleep 1; echo | date');
$results = execParallel($commands, function($result, $key)
{
echo "$key\n";
if($key == 128) return false;
}, 64);
print_r($results);
The output is a sequence of numbers up to 128 and then roughly 128 timestamps, depending on clock accuracy and if some of the other processes finished or not. The filter is handy for bailing midway if something goes wrong.
You can play around with changing the process limit from 64 to 1 to see the results arrive serially, or commenting out the $key == 128 line to prevent stopping. If you don't set a process number limit, it opens as many file handles to the processes as it can and waits for more to become available internally.
The way it works is, it's almost identical to shell_exec() but launches the process in the background and continues execution. It pipes any warnings about broken pipes to null, in case you shut down the processes prematurely by closing the handles.
My problem is, I need to make it cross-platform but don't have easy access to a PC or to all of the flavors of Windows like NT. Here's my TODO list:
- fread() blocks on Windows because stream_set_blocking() is a NOP. Could maybe work around this with fread(1) but it's only pseudo-preemptive. Need a way to peek if there are bytes waiting, maybe with select or making the stream nonblocking a different way somehow.
- spawn processes in the background, probably with "START /B [command]". This really needs to work similarly to the "&" in unix, and not generate extra output, block stout or force it to only be accessible through a file, force the user to create a batch file or other weirdness. It might require the use of proc_ open() or pcntl_fork() or something completely different, I'm not really sure.
- explore using curl_multi or another thread tool to spawn functions and call shell_exec() inside them as usual (this needs to be done in a way that doesn't involve installation of any extra libraries).
- use a good test like DIRECTORY_SEPARATOR == '\' to run a separate code branch on Windows.
- need some better test cases on Windows, maybe "TIMEOUT [seconds]" or "PING -n [milliseconds] 127.0.0.1>nul" without output.
This all started because I'm trying to simulate something like a goroutine in php, where you can spawn up to N processes and it just blocks if you reach a limit until more are available, then returns the results. I'm constantly needing to speed up large tasks that should be trivially easy to parallelize, but php makes it difficult because their hands are tied due to a bunch of OS-specific minutia.
So I think whoever comes up with a solid cross-platform version of this function, or something very similar to a goroutine that "just works", would find that their code is quite popular. My guess is that this function falls back to running serially on Windows right now.
Thanks in advance for your help, and good luck!