From: Steve Howell on 23 Jan 2010 11:41 On Jan 23, 5:46 am, Christian Heimes <li...(a)cheimes.de> wrote: > Steve Howell wrote: > > Another benchmark is that deques are slower than lists for accessing > > elements. > > deques are optimized for accessing, inserting and removing data from > both ends. For anything else it's slower than the list type. The fact > was explained in this very thread yesterday. > And the benchmark confirmed it. The slowness is fairly negligible, though.
From: Steve Howell on 23 Jan 2010 12:03 On Jan 23, 4:12 am, Steven D'Aprano <st...(a)REMOVE-THIS- cybersource.com.au> wrote: > > > An alternative would be to do exactly what you want lists to do: track > the start of the list. Untested: > > def recurse(prefix_lines): > start = 0 > end = len(prefix_lines) > while start < end: > prefix, line = prefix_lines[start] > if line == '': > start += 1 > append('') > else: > block_size = get_block(prefix_lines) > if block_size == 1: > start += 1 > if line == pass_syntax: > pass > elif line.startswith(flush_left_syntax): > append(line[len(flush_left_syntax):]) > elif line.startswith(flush_left_empty_line): > append('') > else: > append(prefix + leaf_method(line)) > else: > block = prefix_lines[:block_size] > start = block_size > branch_method(output, block, recurse) > return > > No more O(N) deletions. Problem solved. > A minor modification of your solution does work, but it also slightly + complicates the implementation. Keeping track of the start variable requires bookkeeping not just in recurse(), but also in methods that it calls. This is a pretty small program, so it's acceptable to pass around an offset variable to anybody else who might want to be consuming the list. ############ Generic indentation stuff follows -def get_indented_block(prefix_lines): - prefix, line = prefix_lines[0] +def get_indented_block(prefix_lines, start): + prefix, line = prefix_lines[start] len_prefix = len(prefix) i = 1 - while i < len(prefix_lines): - new_prefix, line = prefix_lines[i] + while i + start < len(prefix_lines): + new_prefix, line = prefix_lines[start+i] if line and len(new_prefix) <= len_prefix: break i += 1 - while i-1 > 0 and prefix_lines[i-1][1] == '': + while i-1 > 0 and prefix_lines[start+i-1][1] == '': i -= 1 return i @@ -190,15 +190,16 @@ ): append = output.append def recurse(prefix_lines): - while prefix_lines: - prefix, line = prefix_lines[0] + start = 0 + while start < len(prefix_lines): + prefix, line = prefix_lines[start] if line == '': - prefix_lines.pop(0) + start += 1 append('') else: - block_size = get_block(prefix_lines) + block_size = get_block(prefix_lines, start) if block_size == 1: - prefix_lines.pop(0) + start += 1 if line == pass_syntax: pass elif line.startswith(flush_left_syntax): @@ -208,8 +209,8 @@ else: append(prefix + leaf_method(line)) else: - block = prefix_lines[:block_size] - prefix_lines = prefix_lines[block_size:] + block = prefix_lines[start:start+block_size] + start += block_size branch_method(output, block, recurse) return prefix_lines = map(indentation_method, lines)
From: Steve Howell on 23 Jan 2010 12:17 On Jan 23, 4:12 am, Steven D'Aprano <st...(a)REMOVE-THIS- cybersource.com.au> wrote: > On Fri, 22 Jan 2010 21:42:43 -0800, Steve Howell wrote: > > This innocent program here literally moves about a million bytes of > > memory around for no good reason: > > > lst = [] > > for i in range(2000): > > lst.append(i) > > while lst: > > print lst.pop(0) > > > Why? Because list.pop(0) is implemented in O(N) instead of O(1). > > > Why wouldn't you get a competent C programmer simply make list_ass_slice > > smart enough to make list.pop(0) O(1)? > > Because there are always trade-offs, and the competent C programmers who > coded the implementation for lists choose different tradeoffs to the ones > you would prefer. > > Seems to me that the simple solution to your problem is for you to > implement your own data structure that makes whatever trade-offs you > like. If it is good enough and popular enough, it might even replace the > existing list implementation. > The data structure that would make the tradeoffs I want would be implemented within CPython itself. I give a sketch of the changes elsewhere in this thread. Terry Reedy said: ''' If you try writing a full patch, as I believe someone did, or at least a prototype thereof, when the idea was discussed, you might have a better idea of what the tradeoffs are and why it was rejected. ''' I have to run, but tomorrow I will try to dig through python-dev archives and find the patch. If anybody has hints on where to look for it (anybody remember the author, for example?), it would be much appreciated. If the patch looks simple, I will try to pitch the idea that its time has come. Now that the specification of the language itself is frozen, I think there might be more room for improving implementations. Also, I might be able to make the argument that tradeoffs of memory vs. CPU vs. code complexity have different forces in the 2010s. Thanks for your reply.
From: Steve Howell on 23 Jan 2010 12:38 On Jan 23, 7:54 am, Steven D'Aprano <st...(a)REMOVE-THIS- cybersource.com.au> wrote: > On Sat, 23 Jan 2010 09:57:04 -0500, Roy Smith wrote: > > In article <hje979$kc...(a)news.eternal-september.org>, > > "Alf P. Steinbach" <al...(a)start.no> wrote: > > >> But it would IMHO have been better if it wasn't called "list", which > >> brings in the wrong associations for someone used to other languages. > > > +1. > > > When I first started using Python (back in the 1.4 days), I assumed a > > list was a singly-linked list. > > Why would you do that? I can think of at least eight different > implementations of the abstract list data structure: > > constant-size array > variable-size array > variable-size array with amortised O(1) appends > singly-linked list > singly-linked list with CDR coding > doubly-linked list > skip list > indexable skip list > > One can reasonably disregard constant-sized arrays as a possibility, > given that Python lists aren't fixed size (pity the poor Pascal and > Fortran coders who are stuck with static arrays!), but the rest are all > reasonable possibilities. Why assume one specific implementation in the > absence of documentation promising certain performance characteristics? > > > Oddly enough, I was going to write in the above paragraph, "like a C++ > > STL list", until I happened to glance at the STL docs and refreshed my > > memory that an STL list is doubly-linked. Which just goes to show that > > making assumptions based on names is a bad idea. > > Exactly :) > > > So, we're right back to my statement earlier in this thread that the > > docs are deficient in that they describe behavior with no hint about > > cost. Given that, it should be no surprise that users make incorrect > > assumptions about cost. > > There are quite a few problems with having the documentation specify cost: > > (1) Who is going to do it? Any volunteers? > > (2) Big-oh notation can be misleading, especially for naive users, or > those whose intuition for what's fast has been shaped by other languages. > Big-oh doesn't tell you whether something is fast or slow, only how it > scales -- and sometimes not even then. > > (3) Having documented a particular performance, that discourages > implementation changes. Any would-be patch or new implementation not only > has to consider that the functional behaviour doesn't change, but that > the performance doesn't either. > > In practice the Python developers are unlikely to make an implementation > change which leads to radically worse performance, particularly for > critical types like list and dict. But in other cases, they might choose > to change big-oh behaviour, and not wish to be tied down by documentation > of the cost of operations. > > (4) How much detail is necessary? What about degenerate cases? E.g. dict > lookup in CPython is typically O(1) amortised, but if all the keys hash > to the same value, it falls to O(N). > > (5) Should the language guarantee such degenerate behaviour? Who decides > which costs are guaranteed and which are not? > > (6) Such performance guarantees should be implementation specific, not > language specific. CPython is only one implementation of the language out > of many. > Bringing this thread full circle, does it make sense to strike this passage from the tutorial?: ''' It is also possible to use a list as a queue, where the first element added is the first element retrieved (first-in, first-out); however, lists are not efficient for this purpose. While appends and pops from the end of list are fast, doing inserts or pops from the beginning of a list is slow (because all of the other elements have to be shifted by one). ''' I think points #3 and #6 possibly apply. Regarding points #2 and #4, the tutorial is at least not overly technical or specific; it just explains the requirement to shift other elements one by one in simple layman's terms.
From: Aahz on 23 Jan 2010 13:50
In article <8e4d3fe2-c4bd-4a73-9c50-7a336dab25be(a)o28g2000yqh.googlegroups.com>, Steve Howell <showell30(a)yahoo.com> wrote: >On Jan 22, 11:10=A0pm, a...(a)pythoncraft.com (Aahz) wrote: >> >>>I know Python's number one concern will never be speed, but if Python >>>makes an O(1) operation into an unnecessarily O(N) operation for no >>>good reasons other than "it's too complicated, " or it "adds another >>>pointer to the structure," or "it adds another conditional check to >>>list_ass_slice for operations that aren't popping off the top," I >>>think it's reasonable to challenge the design philosophy. >> >> "Rough consensus and running code." >> >> You have a good point, but nobody will ever give your idea serious >> attention until there's a patch and benchmarks. > >Here is a benchmark of O(N*N) vs. O(N) for two C programs. One does >memmove; the other simply advances the pointer. You should provide pybench numbers and probably also use the benchmarks produced by the Unladen Swallow project. The next step is to file a patch on bugs.python.org and request review. -- Aahz (aahz(a)pythoncraft.com) <*> http://www.pythoncraft.com/ import antigravity |