Dangerous behavior of list(generator) [Python]

Prev: Spawning an interactive interpreter in a running python process?
Next: unable to read the __main__ namespace

From: Benjamin Kaplan on 30 Dec 2009 23:20

On Wed, Dec 30, 2009 at 7:01 PM, Steven D'Aprano
<steve(a)remove-this-cybersource.com.au> wrote:
>
> I don't see why. What's wrong with it? Unless you embed it in a call to
> list, or similar, it will explicitly raise StopIteration as expected.
>
>
>> I used to have that a lot in cases where not finding at least one valid
>> foo is an actual fatal error.
>
> What's wrong with the obvious solution?
>
> if not any(foo for foo in foos if foo.bar):
> raise ValueError('need at least one valid foo')

That would require 2 iterations through foos- once in the test, once
for the assignment if successful. If foos takes a long time to iterate
through, it might be faster to put a try-except around the original
statement, catch the StopIteration, and raise a ValueError in its
place. Which I agree is much better practice than letting the
StopIteration signal the fatal error.

From: Peter Otten on 31 Dec 2009 04:54

Tom Machinski wrote:

> It would be nice if there was a builtin for "get the first element in
> a genexp, or raise an exception (which isn't StopIteration)", sort of
> like:
>
> from itertools import islice
>
> def first_or_raise(genexp):
> L = list(islice(genexp, 1))
> if not L:
> raise RuntimeError('no elements found')
> return L[0]

Somewhat related in 2.6 there's the next() built-in which accepts a default
value. You can provide a sentinel and test for that instead of using
try...except:

>>> from random import randrange
>>> from functools import partial
>>> def g():
.... return iter(partial(randrange, 3), 2)
....
>>> next(g(), "empty")
1
>>> next(g(), "empty")
1
>>> next(g(), "empty")
'empty'
>>> next(g(), "empty")
'empty'
>>> next(g(), "empty")
'empty'
>>> next(g(), "empty")
0

Peter

From: Steven D'Aprano on 31 Dec 2009 06:44

On Wed, 30 Dec 2009 23:20:06 -0500, Benjamin Kaplan wrote:

>>> I used to have that a lot in cases where not finding at least one
>>> valid foo is an actual fatal error.
>>
>> What's wrong with the obvious solution?
>>
>> if not any(foo for foo in foos if foo.bar):
>> raise ValueError('need at least one valid foo')
>
> That would require 2 iterations through foos- once in the test, once for
> the assignment if successful.

Remember though that any is a lazy test: it returns as soon as it gets a
result. In the case of an empty list, it returns immediately with False,
and in the case of a non-empty list, it returns immediately it reaches a
true item. It doesn't matter if there are twenty thousand items, it will
only look at the first so long as it is true.

Which of course answers my own question... what's wrong with using any is
that it fails if the objects are all considered false in a boolean
context, or if they might be. That means it will work for some objects
(e.g. the re module's MatchObject instances which are always true), but
not for arbitrary objects which may be false.

--
Steven

From: Tom Machinski on 31 Dec 2009 14:34

On Wed, Dec 30, 2009 at 4:01 PM, Steven D'Aprano
<steve(a)remove-this-cybersource.com.au> wrote:
> On Wed, 30 Dec 2009 15:18:11 -0800, Tom Machinski wrote:
>> Bottom line, I'm going to have to remove this pattern from my code:
>>
>> foo = (foo for foo in foos if foo.bar).next()
>
> I don't see why. What's wrong with it? Unless you embed it in a call to
> list, or similar, it will explicitly raise StopIteration as expected.

Exactly; this seems innocuous, but if some caller of this code uses it
in a list() constructor, a very subtle and dangerous bug is introduced
- see OP. This is the entire point of this post.

In a large, non-trivial application, you simply cannot afford the
assumption that no caller will ever do that. Even if you have perfect
memory, some of your other developers or library users may not.

As for what's wrong with the "if not any" solution, Benjamin Kaplan's
post hits the nail on its head. This is a bioinformatics application,
so the iterable "foos" tends to be very large, so saving half the
runtime makes a big difference.

-- Tom

From: Tom Machinski on 31 Dec 2009 14:42

On Thu, Dec 31, 2009 at 1:54 AM, Peter Otten <__peter__(a)web.de> wrote:
> Somewhat related in 2.6 there's the next() built-in which accepts a default
> value. You can provide a sentinel and test for that instead of using
> try...except:

Thanks. This can be useful in some of the simpler cases. As you surely
realize, to be perfectly safe, especially when the iterable can
contain any value (including your sentinel), we must use an
out-of-band return value, hence an exception is the only truly safe
solution.

-- Tom

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: Spawning an interactive interpreter in a running python process?
Next: unable to read the __main__ namespace