From: Arnaud Delobelle on 22 Jan 2010 16:08 Steve Howell <showell30(a)yahoo.com> writes: > On Jan 22, 12:14 pm, Chris Rebert <c...(a)rebertia.com> wrote: >> On Fri, Jan 22, 2010 at 11:14 AM, Steve Howell <showel...(a)yahoo.com> wrote: >> > The v2.6.4 version of the tutorial says this: >> >> > ''' >> > It is also possible to use a list as a queue, where the first element >> > added is the first element retrieved (“first-in, first-out”); however, >> > lists are not efficient for this purpose. While appends and pops from >> > the end of list are fast, doing inserts or pops from the beginning of >> > a list is slow (because all of the other elements have to be shifted >> > by one). >> > ''' >> >> > Is that really true in CPython? It seems like you could advance the >> > pointer instead of shifting all the elements. It would create some >> > nuances with respect to reclaiming the memory, but it seems like an >> > easy way to make lists perform better under a pretty reasonable use >> > case. >> >> > Does anybody know off the top of their head if the "have-to-be-shifted- >> > by-one" warning is actually valid? >> >> Judging by the "Sorted dictionary" thread responses: Yes. >> > > I think you are referring to this comment: > > ''' > Insertion and deletion are still O(n) as all items to the right of the > inserted/deleted one have to be shifted by one place. > ''' > > > I can certainly see why most reasonable implementations of a list > would have insertion/deletion in the middle of the list be O(N), but I > don't think that limitation has to apply for the special cases of the > beginning and end of the list. I made the comment you quoted. In CPython, it is O(n) to delete/insert an element at the start of the list - I know it because I looked at the implementation a while ago. This is why collections.deque exists I guess. I don't know how they are implemented but insertion/deletion at either end are O(1) and so is random access. So they are the data structure that you want. If you want evidence for lists, rather than my word, try: >>> import timeit >>> timeit.Timer('while t:del t[0]', 't=[0]*100000').timeit(1) 1.8452401161193848 >>> timeit.Timer('while t:del t[-1]', 't=[0]*100000').timeit(1) 0.017747163772583008 >>> timeit.Timer( 'while t:del t[0]', 'from collections import deque; t=deque([0]*100000)' ).timeit(1) 0.022077083587646484 >>> timeit.Timer( 'while t:del t[-1]', 'from collections import deque; t=deque([0]*100000)' ).timeit(1) 0.027813911437988281 -- Arnaud
From: Steve Howell on 22 Jan 2010 16:26 On Jan 22, 1:08 pm, Arnaud Delobelle <arno...(a)googlemail.com> wrote: > Steve Howell <showel...(a)yahoo.com> writes: > > On Jan 22, 12:14 pm, Chris Rebert <c...(a)rebertia.com> wrote: > >> On Fri, Jan 22, 2010 at 11:14 AM, Steve Howell <showel...(a)yahoo.com> wrote: > >> > The v2.6.4 version of the tutorial says this: > > >> > ''' > >> > It is also possible to use a list as a queue, where the first element > >> > added is the first element retrieved (first-in, first-out); however, > >> > lists are not efficient for this purpose. While appends and pops from > >> > the end of list are fast, doing inserts or pops from the beginning of > >> > a list is slow (because all of the other elements have to be shifted > >> > by one). > >> > ''' > > >> > Is that really true in CPython? It seems like you could advance the > >> > pointer instead of shifting all the elements. It would create some > >> > nuances with respect to reclaiming the memory, but it seems like an > >> > easy way to make lists perform better under a pretty reasonable use > >> > case. > > >> > Does anybody know off the top of their head if the "have-to-be-shifted- > >> > by-one" warning is actually valid? > > >> Judging by the "Sorted dictionary" thread responses: Yes. > > > I think you are referring to this comment: > > > ''' > > Insertion and deletion are still O(n) as all items to the right of the > > inserted/deleted one have to be shifted by one place. > > ''' > > > I can certainly see why most reasonable implementations of a list > > would have insertion/deletion in the middle of the list be O(N), but I > > don't think that limitation has to apply for the special cases of the > > beginning and end of the list. > > I made the comment you quoted. In CPython, it is O(n) to delete/insert > an element at the start of the list - I know it because I looked at the > implementation a while ago. This is why collections.deque exists I > guess. I don't know how they are implemented but insertion/deletion at > either end are O(1) and so is random access. So they are the data > structure that you want. > > If you want evidence for lists, rather than my word, try: > > >>> import timeit > >>> timeit.Timer('while t:del t[0]', 't=[0]*100000').timeit(1) > 1.8452401161193848 > >>> timeit.Timer('while t:del t[-1]', 't=[0]*100000').timeit(1) > > 0.017747163772583008 > > >>> timeit.Timer( > > 'while t:del t[0]', > 'from collections import deque; t=deque([0]*100000)' > ).timeit(1) > 0.022077083587646484>>> timeit.Timer( > > 'while t:del t[-1]', > 'from collections import deque; t=deque([0]*100000)' > ).timeit(1) > 0.027813911437988281 > Ok, thanks, good to know. I assume it's the colorly named list_ass_slice that would have to special case ilow == 0 and you'd need a memory manager that would let you realloc from ilow:ihigh to ilow+n:high. Am I reading that much of the code correctly?
From: Christian Heimes on 22 Jan 2010 16:29 Steve Howell wrote: > I disagree that Python code rarely pops elements off the top of a > list. There are perfectly valid use cases for wanting a list over a > dequeue without having to pay O(N) for pop(0). Maybe we are just > quibbling over the meaning of "rarely." I was speaking from my own point of view. I've written several tenths of thousands of lines of Python code in the last seven years, mostly related to data manipulation, web applications and operating system interaction but also GUI stuff and scientific code. I can't recall any performance critical or prominent code that modifies the head of a list a lot. Of course there a use cases where you may want to use list.pop(). Unless you need a FILO structure you can always replace a LILO with a FIFO -- instead of list.insert(0, value) and list.pop(0) use list.append(value) and list.pop(). It's not possible to optimize a data structure for all use cases. > I am not proposing a linked list of pointers. I am wondering about > something like this: > > p = &p[1]; > (and then reclaim p[0] as free memory, I already said I understood > that was the tricky bit) > > The pointer arithmetic for accessing each element would still work in O > (1), right? You mean it's an impossible trick unless you come up with your own memory management system. Realloc(), the standard function to change the size of chunk of allocated memory in C, doesn't support your desired operation. Christian
From: Terry Reedy on 22 Jan 2010 16:32 On 1/22/2010 2:14 PM, Steve Howell wrote: > The v2.6.4 version of the tutorial says this: > Is that really true in CPython? It seems like you could advance the > pointer instead of shifting all the elements. It would create some > nuances with respect to reclaiming the memory, but it seems like an > easy way to make lists perform better under a pretty reasonable use > case. Something like this was one proposed (ie, leave space at both ends of a list) but was rejected as over-complicating the list implementation for *relatively* rare use cases. I believe deque was written subsequently to address such other use cases.
From: Christian Heimes on 22 Jan 2010 16:35
Arnaud Delobelle wrote: > I made the comment you quoted. In CPython, it is O(n) to delete/insert > an element at the start of the list - I know it because I looked at the > implementation a while ago. This is why collections.deque exists I > guess. I don't know how they are implemented but insertion/deletion at > either end are O(1) and so is random access. So they are the data > structure that you want. Your assumption is correct. The collections.dequeue type uses a double linked list of blocks. Each blocks contains a fixed amount of pointers to Python objects. The implementation makes the implementation less memory hungry than a standard double linked list with just one element for each block. Christian |