list.pop(0) vs. collections.dequeue [Python]

Prev: distutils not finding all of my pure python modules
Next: iterating lists

From: Steve Howell on 23 Jan 2010 02:58

On Jan 22, 11:10 pm, a...(a)pythoncraft.com (Aahz) wrote:
> In article <83082e19-9130-45a8-91f2-8601c1fda...(a)22g2000yqr.googlegroups.com>,
> Steve Howell <showel...(a)yahoo.com> wrote:
>
>
>
>
>
> >I really want to use list *normally* with all its perfectly good
> >semantics and reasonable implementation, except for its blind spot
> >with respect to popping the first element off the list. The whole
> >reason I use CPython vs. C in the first place is that CPython
> >programmers can generally program basic data structures better than I
> >can. But list.pop(0) is the exception. And, with the possible
> >exception of dicts, lists are the most fundamental data structures
> >that Python has.
>
> >I know Python's number one concern will never be speed, but if Python
> >makes an O(1) operation into an unnecessarily O(N) operation for no
> >good reasons other than "it's too complicated, " or it "adds another
> >pointer to the structure," or "it adds another conditional check to
> >list_ass_slice for operations that aren't popping off the top," I
> >think it's reasonable to challenge the design philosophy.
>
> "Rough consensus and running code."
>
> You have a good point, but nobody will ever give your idea serious
> attention until there's a patch and benchmarks.

Another benchmark is that deques are slower than lists for accessing
elements.

showell(a)showell-laptop:~$ python foo.py
0.0215361118317 <- list
0.0429010391235 <- deque

import time
from collections import deque

n = 40000
lst = []
for i in range(n):
lst.append(i)

t = time.time()
for i in range(n):
lst[i]
print time.time() - t

lst = deque(lst)
t = time.time()
for i in range(n):
lst[i]
print time.time() - t

So substituting deque for list suffers not just in convenience, but
also in performance.

From: Terry Reedy on 23 Jan 2010 03:13

On 1/23/2010 12:58 AM, Steve Howell wrote:

> I really want to use list *normally* with all its perfectly good
> semantics and reasonable implementation, except for its blind spot
> with respect to popping the first element off the list.

It was not designed for that. .pop() was added to lists about 10 years
ago because I asked for it (with no parameter, pop off end only) and
wrote what would now be a PEP -- and because Tim Peters later supported
the idea. Adding the optional parameter was something of an afterthought
(never publicly discussed as far as I know) for occasional use for
'short' lists where O(n) is tolerable. You have half persuaded me that
that the parameter addition was a mistake. Perhaps is is too attractice
a nuisance for some people ;=).

> The whole
> reason I use CPython vs. C in the first place is that CPython
> programmers can generally program basic data structures better than I
> can.

They have given us three options other that .pop(0).

1. listiterator
2. queue.Queue
3. collections.deque\

Why are you so stubborn about not using a data structure intended for
your use case instead of one that was not?

There is also
4. a two-list design for queues: iterator through one (or pop() from a
reversed version thereof) while appending to another; switch when the
first is empty. I plan to write it up with tests some day this year.

> I know Python's number one concern will never be speed, but if Python
> makes an O(1) operation into an unnecessarily O(N) operation for no
> good reasons other than "it's too complicated, " or it "adds another
> pointer to the structure," or "it adds another conditional check to
> list_ass_slice for operations that aren't popping off the top," I
> think it's reasonable to challenge the design philosophy.

Challenge yes, mock no.

Part of writing good basic data structures is not adding needless
complication from featuritis and not penalizing 99.99% of access to
satify a .01% need better satisfied another way.

Terry Jan Reedy

From: Steve Howell on 23 Jan 2010 03:23

On Jan 23, 12:13 am, Terry Reedy <tjre...(a)udel.edu> wrote:
>
> Challenge yes, mock no.
>
> Part of writing good basic data structures is not adding needless
> complication from featuritis and not penalizing 99.99% of access to
> satify a .01% need better satisfied another way.
>

I would like to challenge your assertion that advancing ob_item
instead of doing memmove during list_ass_slice would impact the
performance of list accesses in any way. It would only slow down
operations that add/insert items into the list by, and then only by a
single conditional statement, and those add/insert operations are
already O(N) to begin with.

From: Alf P. Steinbach on 23 Jan 2010 03:32

* Steve Howell:
> On Jan 23, 12:13 am, Terry Reedy <tjre...(a)udel.edu> wrote:
>> Challenge yes, mock no.
>>
>> Part of writing good basic data structures is not adding needless
>> complication from featuritis and not penalizing 99.99% of access to
>> satify a .01% need better satisfied another way.
>>
>
> I would like to challenge your assertion that advancing ob_item
> instead of doing memmove during list_ass_slice would impact the
> performance of list accesses in any way. It would only slow down
> operations that add/insert items into the list by, and then only by a
> single conditional statement, and those add/insert operations are
> already O(N) to begin with.

I'm sorry, no, the last part is incorrect.

Appending to a 'list' can currently be constant time, if OS reallocation is
constant time (as the string '+' optimization relies on).

With the pop optimization it can no longer be constant time without risking an
accumulation of unused memory, a memory leak, although it can be amortized
constant time, at the cost of wasting some percentage of memory.

Cheers & hth.,

- Alf

From: Steve Howell on 23 Jan 2010 03:54

On Jan 23, 12:32 am, "Alf P. Steinbach" <al...(a)start.no> wrote:
> * Steve Howell:
>
> > On Jan 23, 12:13 am, Terry Reedy <tjre...(a)udel.edu> wrote:
> >> Challenge yes, mock no.
>
> >> Part of writing good basic data structures is not adding needless
> >> complication from featuritis and not penalizing 99.99% of access to
> >> satify a .01% need better satisfied another way.
>
> > I would like to challenge your assertion that advancing ob_item
> > instead of doing memmove during list_ass_slice would impact the
> > performance of list accesses in any way. It would only slow down
> > operations that add/insert items into the list by, and then only by a
> > single conditional statement, and those add/insert operations are
> > already O(N) to begin with.
>
> I'm sorry, no, the last part is incorrect.
>
> Appending to a 'list' can currently be constant time, if OS reallocation is
> constant time (as the string '+' optimization relies on).
>

That's true, but it's also irrelevant, as the pop optimization would
happen in a branch of code that never gets executed during list
appending:

if (d < 0) { /* Delete -d items */
/* ADD CODE HERE TO AVOID memmove
when ilow == 0 (i.e. ihigh + d == 0)
*/
memmove(&item[ihigh+d], &item[ihigh],
(Py_SIZE(a) - ihigh)*sizeof(PyObject *));
list_resize(a, Py_SIZE(a) + d);
item = a->ob_item;
}

> With the pop optimization it can no longer be constant time without risking an
> accumulation of unused memory, a memory leak, although it can be amortized
> constant time, at the cost of wasting some percentage of memory.

list_resize already overallocates memory to allow room for growth.
Whenever you did an append to the list that would force a realloc, you
could first check to see if there is unused stuff at the front and do
the memmove then and reclaim the unfreed memory. So instead of doing
a paying for memmove on every pop, you only pay for it when the list
goes to size 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, etc.

>

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Prev: distutils not finding all of my pure python modules
Next: iterating lists