Proposing a new module: Parallel::Loops [Perl]

Prev: FAQ 5.1 How do I flush/unbuffer an output filehandle? Why must I do this?
Next: what cpu core is running the script?

From: Ted Zlatanov on 25 Jun 2010 11:33

On Fri, 25 Jun 2010 01:14:11 -0700 (PDT) Peter Valdemar M�rch <4ux6as402(a)sneakemail.com> wrote:

PVM> On Jun 25, 3:16�am, Ben Morrow <b...(a)morrow.me.uk> wrote:
>> OK; how is this different from forks and forks::shared?

PVM> It is _much_ more similar to forks and forks::shared than to Coro.

PVM> While the forks and forks::shared API emulate the API of threads and
PVM> threads::shared (perfectly?), Parallel::Loops tries to emulate the
PVM> standard foreach and while loops as close as possible as in:

PVM> $pl->foreach(\@input, sub {
PVM> $output{$_} = do_some_hefty_calculation($_);
PVM> });

I like that syntax better personally than join() and detach().

PVM> I guess Parallel::Loops could have been written with forks and
PVM> forks::shared, and only provided syntactic sugar. (In fact it uses
PVM> Parallel::ForkManager and Tie::Hash/Tie::Array instead.)

`forks' brings in socket IPC which can be an issue. Your approach seems
a little cleaner IIUC.

Ted

From: Ben Morrow on 25 Jun 2010 18:34

Quoth Ted Zlatanov <tzz(a)lifelogs.com>:
> On Fri, 25 Jun 2010 01:14:11 -0700 (PDT) Peter Valdemar M�rch
> <4ux6as402(a)sneakemail.com> wrote:
>
> PVM> On Jun 25, 3:16�am, Ben Morrow <b...(a)morrow.me.uk> wrote:
> >> OK; how is this different from forks and forks::shared?
>
> PVM> It is _much_ more similar to forks and forks::shared than to Coro.
>
> PVM> While the forks and forks::shared API emulate the API of threads and
> PVM> threads::shared (perfectly?), Parallel::Loops tries to emulate the
> PVM> standard foreach and while loops as close as possible as in:
>
> PVM> $pl->foreach(\@input, sub {
> PVM> $output{$_} = do_some_hefty_calculation($_);
> PVM> });
>
> I like that syntax better personally than join() and detach().

Personally I find

my %output :shared;

for my $i (@input) {
async {
$output{$i} = do_some_hefty_calculation($i);
}
}

somewhat clearer, but that's just a matter of taste. (With 5.10
presumably a 'my $_' would make $_ work too.)

> PVM> I guess Parallel::Loops could have been written with forks and
> PVM> forks::shared, and only provided syntactic sugar. (In fact it uses
> PVM> Parallel::ForkManager and Tie::Hash/Tie::Array instead.)
>
> `forks' brings in socket IPC which can be an issue. Your approach seems
> a little cleaner IIUC.

THe IPC has to be done *somehow*. Sockets are probably as reliable as
any other mechanism.

Ben

From: Peter Valdemar Mørch on 26 Jun 2010 03:15

On Jun 26, 12:34 am, Ben Morrow <b...(a)morrow.me.uk> wrote:
> Personally I find
>
> my %output :shared;
>
> for my $i (@input) {
> async {
> $output{$i} = do_some_hefty_calculation($i);
> }
> }
>
> somewhat clearer, but that's just a matter of taste. (With 5.10
> presumably a 'my $_' would make $_ work too.)

In fact, I think that looks better too. I do have a few concerns:

* Having "my %output : shared" and just async without a
Parallel::Loops reference parameter inevitably leads to global
variables. I don't like them. One could have two different
calculations in different sections of the code, that don't need the
same variables "shared", so I'd prefer to have info about shared
variables associated with a specific Parallel::Loops instance. What do
you think?

* About the async {} instead of $pl->foreach: The implementation needs
to wait for the last loop to finish, and only continue after the '}'
after all the processes have finished. I don't know how to do that
unless something like:

for my $i (@input) {
# This fires up the parallel processes
$pl->async {
$output{$i} = do_some_hefty_calculation($i);
}
}
# This waits for them all to finish before continuing.
$pl->joinAll();

This syntax could easily co-exist with the $pl->foreach and $pl->while
syntax. I'm worried though that people will forget to call $pl-
>joinAll()! I guess one could also have async return some reference to
the actual forked process (pid comes to mind) and then $pl->join($pid)
to wait for it to finish.

Regardless, I now think $pl->share(\%output) is a better name than $pl-
>tieOutput(\%output)

The rest of this post is about "my %difficulties : with shared;" :-) -
this syntax is how (threads|forks)::shared does it too. I like it, but
don't yet understand how to implement it. Have looked at "perldoc
attributes" and experimented a little. In fact I could get attributes
like "Shared" (==ucfirst("shared")) to work. "xxshared" works, but
issues a warning, but "shared" simply doesn't work (perl 5.10). Also,
I guess it isn't possible for several packages to be "listening" for
attributes at the same time, as they'd step on each other's exports of
e.g. sub MODIFY_SCALAR_ATTRIBUTES, wouldn't they?

Here is a little snippet I wrote to experiment:

me(a)it:~> cat attributes.pl
#!perl -w
use strict;
use attributes;
use Data::Dumper;

sub MODIFY_SCALAR_ATTRIBUTES {
my ($pkg, $ref, $attributes) = @_;
print Dumper(\@_, attributes::get($ref));
return ();
}

my $shared : shared;
my $xshared : xshared;
my $Shared : Shared;

me(a)it:~> perl attributes.pl
$VAR1 = [
'main',
\undef
];
$VAR1 = [
'main',
\undef,
'xshared'
];
SCALAR package attribute may clash with future reserved word: xshared
at attributes.pl line 13
$VAR1 = [
'main',
\undef,
'Shared'
];

From: Peter Valdemar Mørch on 26 Jun 2010 03:16

On Jun 25, 5:33 pm, Ted Zlatanov <t...(a)lifelogs.com> wrote:
> I like that syntax better personally than join() and detach().

Thanks for the support! :-)

> `forks' brings in socket IPC which can be an issue. Your approach seems
> a little cleaner IIUC.

As Ben says, it has to be done somehow. I use a pipe behind the
scenes.

Peter

From: Ben Morrow on 26 Jun 2010 08:33

Quoth =?ISO-8859-1?Q?Peter_Valdemar_M=F8rch?= <4ux6as402(a)sneakemail.com>:
> On Jun 26, 12:34 am, Ben Morrow <b...(a)morrow.me.uk> wrote:
> > Personally I find
> >
> > my %output :shared;
> >
> > for my $i (@input) {
> > async {
> > $output{$i} = do_some_hefty_calculation($i);
> > }
> > }
> >
> > somewhat clearer, but that's just a matter of taste. (With 5.10
> > presumably a 'my $_' would make $_ work too.)
>
> In fact, I think that looks better too. I do have a few concerns:
>
> * Having "my %output : shared" and just async without a
> Parallel::Loops reference parameter inevitably leads to global
> variables. I don't like them. One could have two different
> calculations in different sections of the code, that don't need the
> same variables "shared", so I'd prefer to have info about shared
> variables associated with a specific Parallel::Loops instance. What do
> you think?

They're not global. %output can be scoped as tightly as you like around
the async call: async takes a closure, so it will make available (either
shared or as copies) any lexicals in scope at the time. (This is why $_
won't work: it isn't a lexical.)

> * About the async {} instead of $pl->foreach: The implementation needs
> to wait for the last loop to finish, and only continue after the '}'
> after all the processes have finished. I don't know how to do that
> unless something like:
>
> for my $i (@input) {
> # This fires up the parallel processes
> $pl->async {
> $output{$i} = do_some_hefty_calculation($i);
> }
> }
> # This waits for them all to finish before continuing.
> $pl->joinAll();

Well, again using forks, you would write

my %output :shared;
my @thr;

for my $i (@input) {
push @thr, async {
$output{$i} = ...;
}
}
$_->join for @thr;

> This syntax could easily co-exist with the $pl->foreach and $pl->while
> syntax.

Not like that it can't, since methods don't have prototypes. If you want
a method call it would have to look like

$pl->async(sub { ... });

> I'm worried though that people will forget to call $pl-
> >joinAll()!

Stick it in DESTROY.

> I guess one could also have async return some reference to
> the actual forked process (pid comes to mind) and then $pl->join($pid)
> to wait for it to finish.
>
> Regardless, I now think $pl->share(\%output) is a better name than $pl-
> >tieOutput(\%output)
>
> The rest of this post is about "my %difficulties : with shared;" :-) -
> this syntax is how (threads|forks)::shared does it too. I like it, but
> don't yet understand how to implement it. Have looked at "perldoc
> attributes" and experimented a little. In fact I could get attributes
> like "Shared" (==ucfirst("shared")) to work. "xxshared" works, but
> issues a warning, but "shared" simply doesn't work (perl 5.10).

Yup. That's by design. Lowercase attributes are reserved to the core;
:shared, specifically, is handled internally as part of the threads
code, and is never seen by MODIFY_*_ATTRIBUTES. (Obviously it is
possible to hijack it, since forks manages to do so, but it can only be
done globally.)

> Also,
> I guess it isn't possible for several packages to be "listening" for
> attributes at the same time, as they'd step on each other's exports of
> e.g. sub MODIFY_SCALAR_ATTRIBUTES, wouldn't they?

That is certainly a possibility. IIRC Attribute::Handlers handles this
for you, since there's then only one MODIFY_*_ATTR sub to install.
Alternatively, keep a ref to the old sub (if there is one) and call it
if you don't see an attr you recognise.

Ben

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: FAQ 5.1 How do I flush/unbuffer an output filehandle? Why must I do this?
Next: what cpu core is running the script?