Prev: Spawning an interactive interpreter in a running python process?
Next: unable to read the __main__ namespace
From: Tom Machinski on 31 Dec 2009 19:28 On Thu, Dec 31, 2009 at 12:18 PM, Stephen Hansen <apt.shansen(a)gmail.com> wrote: > Hmm? Just use a sentinel which /can't/ exist in the list: then its truly > safe. If the list can contain all the usual sort of sentinels (False, None, > 0, -1, whatever), then just make a unique one all your own. > sentinel = object() > if next(g(), sentinel) is sentinel: > ... > Its impossible to get a false-positive then, as nothing g() can ever produce > would ever be precisely "sentinel" (which would usually for me be some > global const if I need to do such things in multiple places). > --S That's not a bad idea. Another nice feature is support for callable "default" values; it would make several new things easier, including raising an exception when you really want that (i.e. if not finding a single element is truly exceptional). -- Tom
From: Steven D'Aprano on 31 Dec 2009 20:47 On Thu, 31 Dec 2009 11:34:39 -0800, Tom Machinski wrote: > On Wed, Dec 30, 2009 at 4:01 PM, Steven D'Aprano > <steve(a)remove-this-cybersource.com.au> wrote: >> On Wed, 30 Dec 2009 15:18:11 -0800, Tom Machinski wrote: >>> Bottom line, I'm going to have to remove this pattern from my code: >>> >>> foo = (foo for foo in foos if foo.bar).next() >> >> I don't see why. What's wrong with it? Unless you embed it in a call to >> list, or similar, it will explicitly raise StopIteration as expected. > > Exactly; this seems innocuous, but if some caller of this code uses it > in a list() constructor, a very subtle and dangerous bug is introduced - > see OP. This is the entire point of this post. Then don't use it in a list() constructor. That's a glib answer, of course. A better answer is to point out that the problem is not with the above expression, but with letting StopIteration bubble up as an error exception instead of dealing with it immediately. That's not what it's for, and you can't trust it not to be captured by something. If StopIteration represents an error condition, you need to deal with it immediately and convert it to an exception which isn't likely to disappear. > In a large, non-trivial application, you simply cannot afford the > assumption that no caller will ever do that. Even if you have perfect > memory, some of your other developers or library users may not. You shouldn't put the responsibility of dealing with the StopIteration on the caller, because StopIteraction is a signal not an error condition, and you can't tell when that signal will disappear. The responsibility lies on the writer of the function containing the line (that is, the Original Poster of this thread). So you need something like this: def my_function(): try: foo = (foo for foo in foos if foo.bar).next() except StopIteration: handle_empty_foos() else: handle_at_least_one_foo() handle_empty_foos may be as simple as raising a new exception and letting that bubble up to whatever layer of the application is expected to deal with it. > As for what's wrong with the "if not any" solution, Benjamin Kaplan's > post hits the nail on its head. This is a bioinformatics application, so > the iterable "foos" tends to be very large, so saving half the runtime > makes a big difference. Possibly you haven't seen my reply to Benjamin, so I'll paraphrase: that's incorrect, because any() is lazy and will return as soon as it hits a non-false item. See the docs: http://docs.python.org/library/functions.html#any If the foo items are considered true (e.g. non-empty strings), then you can guarantee that any() will return on the very first item. If the foo items are arbitrary objects which have an equal chance of being considered true or false, then on average it will have to look at half the list, which is O(N) and may be a tad expensive for large N. But how likely is that? One has to be realistic here, and consider the type of data you realistically need to deal with and not pathological cases. There's no limit to the problems you may have with sufficiently pathological data: class Evil(object): @property def bar(self): import time time.sleep(1e8) return True foos = [Evil(), "a", "b", "c", "d"] foo = (foo for foo in foos if foo.bar).next() any() is the standard, idiomatic solution for solving this sort of problem. Before rejecting it on the basis of slowness, you need to determine that long runs of false items ahead of the first true item is a realistic scenario, and that calling any() really is a bottleneck. Anything less is premature optimization. -- Steven
From: Wolfram Hinderer on 1 Jan 2010 08:19 On 1 Jan., 02:47, Steven D'Aprano <st...(a)REMOVE-THIS- cybersource.com.au> wrote: > On Thu, 31 Dec 2009 11:34:39 -0800, Tom Machinski wrote: > > On Wed, Dec 30, 2009 at 4:01 PM, Steven D'Aprano > > <st...(a)remove-this-cybersource.com.au> wrote: > >> On Wed, 30 Dec 2009 15:18:11 -0800, Tom Machinski wrote: > >>> Bottom line, I'm going to have to remove this pattern from my code: > > >>> foo = (foo for foo in foos if foo.bar).next() > > >> I don't see why. What's wrong with it? Unless you embed it in a call to > >> list, or similar, it will explicitly raise StopIteration as expected. > > > Exactly; this seems innocuous, but if some caller of this code uses it > > in a list() constructor, a very subtle and dangerous bug is introduced - > > see OP. This is the entire point of this post. > > Then don't use it in a list() constructor. > > That's a glib answer, of course. A better answer is to point out that the > problem is not with the above expression, but with letting StopIteration > bubble up as an error exception instead of dealing with it immediately. > That's not what it's for, and you can't trust it not to be captured by > something. If StopIteration represents an error condition, you need to > deal with it immediately and convert it to an exception which isn't > likely to disappear. > > > In a large, non-trivial application, you simply cannot afford the > > assumption that no caller will ever do that. Even if you have perfect > > memory, some of your other developers or library users may not. > > You shouldn't put the responsibility of dealing with the StopIteration on > the caller, because StopIteraction is a signal not an error condition, > and you can't tell when that signal will disappear. The responsibility > lies on the writer of the function containing the line (that is, the > Original Poster of this thread). > > So you need something like this: > > def my_function(): > try: > foo = (foo for foo in foos if foo.bar).next() > except StopIteration: > handle_empty_foos() > else: > handle_at_least_one_foo() > > handle_empty_foos may be as simple as raising a new exception and letting > that bubble up to whatever layer of the application is expected to deal > with it. > > > As for what's wrong with the "if not any" solution, Benjamin Kaplan's > > post hits the nail on its head. This is a bioinformatics application, so > > the iterable "foos" tends to be very large, so saving half the runtime > > makes a big difference. > > Possibly you haven't seen my reply to Benjamin, so I'll paraphrase: > that's incorrect, because any() is lazy and will return as soon as it > hits a non-false item. Tom's point is that if not any(foo for foo in foos if foo.bar): foo = (foo for foo in foos if foo.bar).next() iterates twice over (the same first few elements of) foos, which should take about twice as long as iterating once. The lazyness of "any" does not seem to matter here. Of course, you're right that the iteration might or might not be the bottleneck. On the other hand, foos might not even be reiterable. > If the foo items are arbitrary objects which have an equal chance of > being considered true or false, then on average it will have to look at > half the list, By which definition of chance? :-) Wolfram
From: Steven D'Aprano on 1 Jan 2010 09:42 On Fri, 01 Jan 2010 05:19:02 -0800, Wolfram Hinderer wrote: > On 1 Jan., 02:47, Steven D'Aprano <st...(a)REMOVE-THIS- > cybersource.com.au> wrote: >> On Thu, 31 Dec 2009 11:34:39 -0800, Tom Machinski wrote: [...] >> > As for what's wrong with the "if not any" solution, Benjamin Kaplan's >> > post hits the nail on its head. This is a bioinformatics application, >> > so the iterable "foos" tends to be very large, so saving half the >> > runtime makes a big difference. >> >> Possibly you haven't seen my reply to Benjamin, so I'll paraphrase: >> that's incorrect, because any() is lazy and will return as soon as it >> hits a non-false item. > > Tom's point is that > if not any(foo for foo in foos if foo.bar): > foo = (foo for foo in foos if foo.bar).next() > iterates twice over (the same first few elements of) foos, which should > take about twice as long as iterating once. The lazyness of "any" does > not seem to matter here. That's no different from any "Look Before You Leap" idiom. If you do this: if key in dict: x = dict[key] you search the dict twice, once to see if the key is there, and the second time to fetch the value. Whether that is better or faster than the alternative: try: x = dict[key] except KeyError: pass depends on how often you expect the lookup to fail. In any case, I would claim that Tom's argument is a classic example of premature optimization: by his own admission: 'the iterable "foos" tends to be very large' which implies that whatever happens to the foos after this test, it will probably be very time consuming. If it takes (for the sake of the argument) 10 milliseconds to process the entire iterable, who cares whether it takes 0.01 or 0.02 ms to check that the iterable is valid? > Of course, you're right that the iteration might or might not be the > bottleneck. On the other hand, foos might not even be reiterable. If that's the case, then the existing solution potentially throws away the first value of foos every time the caller tests to see if it is empty. Dealing with non-reiterable iterators can be a nuisance. In such a case, it may be best to avoid Look Before You Leap altogether: empty = True for foo in foos: if foo.bar: empty = False process(foo) if empty: handle_error_condition() -- Steven
From: Martin v. Loewis on 2 Jan 2010 15:17
>> Bottom line, I'm going to have to remove this pattern from my code: >> >> foo = (foo for foo in foos if foo.bar).next() I recommend to rewrite this like so: def first(gen): try: return gen.next() except StopIteration: raise ValueError, "No first value" foo = first(foo for foo in foos if foo.bar) As others have said: don't let StopIteration appear unexpectedly; IOW, consume generators right away in a loop construct (where this first function is a loop construct as well). A different way of writing it would be def first(gen): for value in gen: return value raise ValueError, "empty collection" Regards, Martin |