Saturday, December 18, 2010

Day 18: doing things in parallel (Gearman)

Parallel processing is a complex topic (or at least it can be complex). There are a lot of choices: POE, Coro, threads, processes and the list goes on ...

For a client I had to do three expensive calculations in parallel. I tried threads and processes, but communicating the result back needed some thought. POE would be a good candidate, but I do not have practical experience with it (only a few talks at various conferences). In the end I settled for Gearman. This has the additional charme of being able to spread parallel processing between different hosts.

If you do not know Gearman, have a look at it's Wikipedia page (which is quite brief).

Gearman was originally written in Perl, but was later rewritten in C. CPAN modules for both exist. My examples are for the Perl version. Gearman::Client contains the client and worker code. Gearman::Server the gearmand server script.

Without further ado, here comes the worker script (this contains our "real work" to be executed):
#!/usr/bin/perl
# filename: worker.pl
use strict;
use warnings;
use Gearman::Worker;

my $worker = Gearman::Worker->new(job_servers => ['127.0.0.1']);
$worker->register_function(sleep => \&_sleep);
while (1) {
    $worker->work;
}

sub _sleep {
    my $job     = shift;
    my $seconds = $job->arg;
    print "sleeping\n";
    sleep($seconds);
    return $seconds;
}

And now our client, where we will do three things in parallel:
#!/usr/bin/perl
# filename: client.pl
use strict;
use warnings;
use Gearman::Client;

my $client  = Gearman::Client->new(job_servers => ['127.0.0.1']);
my $waited  = 0;
my $taskset = $client->new_task_set;
for (2..4) {
    $taskset->add_task(sleep => $_, {
        on_complete => sub {
            my $ret = shift;
            $waited += $$ret;
            print "Done.\n";
        },
    });
}
$taskset->wait;
print "Waited $waited seconds.\n";

To test this, start gearmand and at least three workers:
gearmand &
perl worker.pl &
perl worker.pl &
perl worker.pl &
time perl client.pl

It gives the following output:
time perl client.pl
sleeping
sleeping
sleeping
Done.
Done.
Done.
Waited 9 seconds.

real 0m4.062s
user 0m0.030s
sys 0m0.030s

The three sleeping lines appear simultaneously. The jobs run 2, 3 and 4 seconds each. The total run time of the script is just a little bit over 4 seconds (instead of 9 if we were doing the jobs serially).

Two of my team mates have written modules for Gearman: Dennis Schoen has written Gearman::XS, which uses the Gearman C library. Johannes Plunien has written Gearman::Driver, which is very useful if you have a lot of worker processes.

At work we use all of them in production (and the C Gearman server).

Links:

1 comment:

  1. I guess POE won't help with expensive calculations. POE is event driven, the benefit is to not block while waiting for something (the network most of the times).

    ReplyDelete