Prev: FAQ 5.1 How do I flush/unbuffer an output filehandle? Why must I do this?
Next: what cpu core is running the script?
From: Peter Valdemar Mørch on 26 Jun 2010 15:25 Commenting on Ben's post out of order: > > $pl->async { > > bla_bla_bla(); > > } > > This syntax could easily co-exist with the $pl->foreach and $pl->while > > syntax. > > Not like that it can't, since methods don't have prototypes. .... > If you want a method call it would have to look like > > $pl->async(sub { ... }); Yes you're right, of course. > > I'm worried though that people will forget to call $pl->joinAll()! > > Stick it in DESTROY. I don't see how that would help. I'm thinking of a user writing something like: $pl->share(\%results); foreach (0..4) { $pl->async(sub { $results{$_} = foobar($_) } ); } $pl->joinAll(); useResults(\%results); In this case, at the time of the call to useResults, %results will contain the finished results from all forked processes because $pl- >joinAll() waits for them all to finish. If $pl->joinAll() doesn't get called, the user will most likely see an empty %results. I don't see how DESTROY comes in to play here or could help. > They're not global. %output can be scoped as tightly as you like around > the async call: async takes a closure, so it will make available (either > shared or as copies) any lexicals in scope at the time. (This is why $_ > won't work: it isn't a lexical.) I think I haven't made my concern clear. Is it possible to do: my %resultsForCalc1 : Shared($pl1); and have the sharing associated with a particular Parallel::Loops instance (so my attribute handler gets a reference to $pl1, not the string '$pl1')? If so, cool. Don't read any further, I'm satisified (BTW, How?). If not, lets say one does this: my %resultsForCalc1 : Shared; my $pl1 = Parallel::Loops->new(4); $pl1->foreach([0..9], sub { $resultsForCalc11{$_} = doSomething($_); } useResults(\%resultsForCalc1); # Block above duplicated, just s/1/2/g my %resultsForCalc2 : Shared; my $pl2 = Parallel::Loops->new(4); $pl1->foreach([0..9], sub { $resultsForCalc12{$_} = doSomething($_); } useResults(\%resultsForCalc1); Wouldn't the list ( \%resultsForCalc1, \%resultsForCalc2 ) have to be global? How would I/perl keep track of that the user only wants to share %resultsForCalc1 in the first calculation and only %resultsForCalc2 in the second? By the way, how would one avoid that %foo gets handled as shared in the following case, since it has gone out of scope? { my %foo : Shared; } my %resultsForCalc1 : Shared; my $pl1 = Parallel::Loops->new(4); $pl1->foreach([0..9], sub { $resultsForCalc11{$_} = doSomething($_); } useResults(\%resultsForCalc1); I don't (yet?) see how I can detect which of the hashes with the "Shared" attribute that are in scope at the time of the $pl1- >foreach() call. But even if I could detect which of all the shared hashes that were in scope "now", that may not be what the user wants. There could be other reasons that the user wants %resultsForCalc1 (from way above) in an outer scope and not have it shared in some of the calculations where it happens to be in scope. Perhaps we're getting a little off-topic here, but now I'm curious about the attributes business! ;-) Peter
From: Ben Morrow on 26 Jun 2010 16:52 Quoth =?ISO-8859-1?Q?Peter_Valdemar_M=F8rch?= <4ux6as402(a)sneakemail.com>: > Commenting on Ben's post out of order: > > > > I'm worried though that people will forget to call $pl->joinAll()! > > > > Stick it in DESTROY. > > I don't see how that would help. I'm thinking of a user writing > something like: > > $pl->share(\%results); > foreach (0..4) { > $pl->async(sub { $results{$_} = foobar($_) } ); > } > $pl->joinAll(); > useResults(\%results); > > In this case, at the time of the call to useResults, %results will > contain the finished results from all forked processes because $pl- > >joinAll() waits for them all to finish. If $pl->joinAll() doesn't get > called, the user will most likely see an empty %results. I don't see > how DESTROY comes in to play here or could help. Well, if the user wrote my %results; { my $pl = Parallel::Loops->new; $pl->share(\%results); $pl->async(sub { $results{$_} = foobar($_) }) for 0..4; } useResults \%results; then a call to ->joinAll in DESTROY would ensure it was called. Since variables (particularly those containing potentially-expensive object, like $pl) should be minimally-scoped, this would be the correct way to write that code. > > They're not global. %output can be scoped as tightly as you like around > > the async call: async takes a closure, so it will make available (either > > shared or as copies) any lexicals in scope at the time. (This is why $_ > > won't work: it isn't a lexical.) > > I think I haven't made my concern clear. Is it possible to do: > > my %resultsForCalc1 : Shared($pl1); > > and have the sharing associated with a particular Parallel::Loops > instance (so my attribute handler gets a reference to $pl1, not the > string '$pl1')? Not easily. Apart from anything else, attribute declarations are processed at compile-time, before your objects have been constructed. I was still looking at the question 'why aren't you simply using forks?'. forks handles all this for you. > If so, cool. Don't read any further, I'm satisified (BTW, How?). If > not, lets say one does this: > > my %resultsForCalc1 : Shared; > my $pl1 = Parallel::Loops->new(4); > $pl1->foreach([0..9], sub { > $resultsForCalc11{$_} = doSomething($_); > } > useResults(\%resultsForCalc1); > > # Block above duplicated, just s/1/2/g > my %resultsForCalc2 : Shared; > my $pl2 = Parallel::Loops->new(4); > $pl1->foreach([0..9], sub { > $resultsForCalc12{$_} = doSomething($_); > } > useResults(\%resultsForCalc1); > > Wouldn't the list ( \%resultsForCalc1, \%resultsForCalc2 ) have to be > global? When you say 'global' you mean 'shared in all P::L instances', right? Is this a problem? Since (presumably) you would be tying the variable in the attr handler, just make sure DESTROY and UNTIE for the tied object take it off the current list. That way, when the shared variable goes out of scope it will no longer be considered a candidate for sharing. (You don't even need to do that if you just weaken the refs in your master list. Perl will replace any that go out of scope with undef.) I don't know how P::L deals with copying the results back. Presumably you have no idea whether a variable has been modified in the sub-process or not? What do you do if two sub-processes change the same shared var in different ways? > How would I/perl keep track of that the user only wants to > share %resultsForCalc1 in the first calculation and only > %resultsForCalc2 in the second? > > By the way, how would one avoid that %foo gets handled as shared in > the following case, since it has gone out of scope? > > { > my %foo : Shared; > } > my %resultsForCalc1 : Shared; > my $pl1 = Parallel::Loops->new(4); > $pl1->foreach([0..9], sub { > $resultsForCalc11{$_} = doSomething($_); > } > useResults(\%resultsForCalc1); > > I don't (yet?) see how I can detect which of the hashes with the > "Shared" attribute that are in scope at the time of the $pl1- > >foreach() call. > > But even if I could detect which of all the shared hashes that were in > scope "now", that may not be what the user wants. There could be other > reasons that the user wants %resultsForCalc1 (from way above) in an > outer scope and not have it shared in some of the calculations where > it happens to be in scope. > > Perhaps we're getting a little off-topic here, but now I'm curious > about the attributes business! ;-) Not OT at all. FWIW, I would cast this API rather differently. You don't seem to be trying to emulate the forks API of 'you can do anything you like', but instead restricting yourself to iterating over a list. In that case, why not have the API like my $PL = Parallel::Loops->new(sub { dosomething($_) }); my %results = $PL->foreach(0..9); No need for any tying, and there's no chance of forgetting the '->joinAll' since you don't get the results until it's been done. (The subproc that runs the closure will, of course, get a COW copy of anything currently in scope, so there's no need to worry about sharing 'read-only' data.) Ben
From: Peter Valdemar Mørch on 28 Jun 2010 04:05 On Jun 26, 10:52 pm, Ben Morrow <b...(a)morrow.me.uk> wrote: > I was still looking at the question 'why aren't you simply using > forks?'. forks handles all this for you. Well, because I don't want the forks API. I want the foreach syntax. :-) The main reason is that it is so much easier to write and read later on. I could've implemented it using forks, but I didn't. Forks _is_ mentioned in the "SEE ALSO" section so users have a chance to explore alternatives. > When you say 'global' you mean 'shared in all P::L instances', right? Yes. > Is this a problem? A little bit. To me, that speaks in favor of my %output; $pl->share(\%output) over my %output : Shared; (apart from the fact that $pl->share() seems much simpler to understand and implement) > (You don't even need to do that if you just weaken the refs in your > master list. Perl will replace any that go out of scope with undef.) Ah, good point. > I don't know how P::L deals with copying the results back. Presumably > you have no idea whether a variable has been modified in the sub-process > or not? What do you do if two sub-processes change the same shared var > in different ways? I've mentioned in the pod that only setting of hash keys and pushing to arrays is supported in the child. I'll append to that that setting the same key from different iterations preserves a random one of them. > FWIW, I would cast this API rather differently. Yeah, I'm beginning to gather that! :-) Fine, you won't be one of P::L's users I take it... > You don't seem to be > trying to emulate the forks API of 'you can do anything you like', but > instead restricting yourself to iterating over a list. Exactly. > In that case, why not have the API like > > my $PL = Parallel::Loops->new(sub { dosomething($_) }); > my %results = $PL->foreach(0..9); I guess if I change that to: my $PL = Parallel::Loops->new( 4 ); my %results = $PL->foreach( [0..9], sub { ( $_ => dosomething($_) ) }); We could be in business. I'm presuming I can use wantarray() in the foreach method to test if the caller is going to use the return value and only transfer the return value from the child if it is going to be used. It kind of breaks the analogy with foreach but doesn't hurt otherwise, so why not. > Well, if the user wrote > > my %results; > { > my $pl = Parallel::Loops->new; > $pl->share(\%results); > $pl->async(sub { $results{$_} = foobar($_) }) > for 0..4; > } > useResults \%results; > > then a call to ->joinAll in DESTROY would ensure it was called. Since > variables (particularly those containing potentially-expensive object, > like $pl) should be minimally-scoped, this would be the correct way to > write that code. I don't understand how that can be guaranteed. perldoc perltoot says: > Perl's notion of the right time to call a destructor is not well-defined > currently, which is why your destructors should not rely on when they > are called. Given that, how can i be sure that DESTROY has been called at the time of the useResults call? Peter
From: Ben Morrow on 28 Jun 2010 09:29 Quoth =?ISO-8859-1?Q?Peter_Valdemar_M=F8rch?= <4ux6as402(a)sneakemail.com>: > On Jun 26, 10:52 pm, Ben Morrow <b...(a)morrow.me.uk> wrote: > > I was still looking at the question 'why aren't you simply using > > forks?'. forks handles all this for you. > > Well, because I don't want the forks API. I want the foreach > syntax. :-) The main reason is that it is so much easier to write and > read later on. OK. > > You don't seem to be > > trying to emulate the forks API of 'you can do anything you like', but > > instead restricting yourself to iterating over a list. > > Exactly. > > > In that case, why not have the API like > > > > my $PL = Parallel::Loops->new(sub { dosomething($_) }); > > my %results = $PL->foreach(0..9); > > I guess if I change that to: > > my $PL = Parallel::Loops->new( 4 ); > my %results = $PL->foreach( [0..9], sub { > ( $_ => dosomething($_) ) > }); > > We could be in business. I'm presuming I can use wantarray() in the > foreach method to test if the caller is going to use the return value > and only transfer the return value from the child if it is going to be > used. It kind of breaks the analogy with foreach but doesn't hurt > otherwise, so why not. It's now more analogous to map than foreach, but I don't see that as a problem. > > > Well, if the user wrote > > > > my %results; > > { > > my $pl = Parallel::Loops->new; > > $pl->share(\%results); > > $pl->async(sub { $results{$_} = foobar($_) }) > > for 0..4; > > } > > useResults \%results; > > > > then a call to ->joinAll in DESTROY would ensure it was called. Since > > variables (particularly those containing potentially-expensive object, > > like $pl) should be minimally-scoped, this would be the correct way to > > write that code. > > I don't understand how that can be guaranteed. perldoc perltoot says: > > > Perl's notion of the right time to call a destructor is not well-defined > > currently, which is why your destructors should not rely on when they > > are called. > > Given that, how can i be sure that DESTROY has been called at the time > of the useResults call? Hmm, I'd forgotten that was there. It's complete nonsense: in Perl 5, destructors are always called promptly, and there are *lots* of modules relying on that fact so it isn't going to go away. (Perl 6 is a different matter, of course.) Ben
From: Willem on 28 Jun 2010 11:07 Peter Valdemar M?rch wrote: )> > my %output; )> > $pl->tieOutput( \%output ); )> )> Why are you using tie here? ) ) Hmm... I thought the idea would be more obvious than it apparently ) is... ) ) Outside the $pl->foreach() loop, we're running in the parent process. ) Inside the $pl->foreach() loop, we're running in a child process. $pl- )>tieOutput is actually the raison d'etre of Parallel::Loops. When the ) child process has a result, it stores it in %output (which is tied ) with Tie::Hash behind the scenes in the child process). ) ) Behind the scenes, when the child process exits, it sends the results ) (the keys written to %output) back to the parent process's version/ ) copy of %output, so that the user of Parallel::Loops doesn't have to ) do any inter-process communication. Isn't there some easier method, where you don't have to screw around with output maps at all ? If the following API would work, that would be the easiest, IMO: my @result = async_map { do_something($_) } @array; Where async_map takes care of all the details of creating the threads, gathering all the output, et cetera. Or does that already exist ? (The simple implementation is only a few lines of code, but it could then be easily extended to use a limited number of threads, or keep a thread pool handy, or something like that.) SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: FAQ 5.1 How do I flush/unbuffer an output filehandle? Why must I do this? Next: what cpu core is running the script? |