Prev: StringChain -- a data structure for managing large sequences of chunks of bytes
Next: python module/utility equivalent to 'time' (linux) and/or 'ntimer'(Windows)
From: MRAB on 12 Mar 2010 08:40 Steven D'Aprano wrote: > On Fri, 12 Mar 2010 00:11:37 -0700, Zooko O'Whielacronx wrote: > >> Folks: >> >> Every couple of years I run into a problem where some Python code that >> worked well at small scales starts burning up my CPU at larger scales, >> and the underlying issue turns out to be the idiom of accumulating data >> by string concatenation. > > I don't mean to discourage you, but the simple way to avoid that is not > to accumulate data by string concatenation. > > The usual Python idiom is to append substrings to a list, then once, at > the very end, combine into a single string: > > > accumulator = [] > for item in sequence: > accumulator.append(process(item)) > string = ''.join(accumulator) > > >> It just happened again >> (http://foolscap.lothar.com/trac/ticket/149 ), and as usual it is hard >> to make the data accumulator efficient without introducing a bunch of >> bugs into the surrounding code. > > I'm sorry, I don't agree about that at all. I've never come across a > situation where I wanted to use string concatenation and couldn't easily > modify it to use the list idiom above. > > [...] >> Here are some benchmarks generated by running python -OOu -c 'from >> stringchain.bench import bench; bench.quick_bench()' as instructed by >> the README.txt file. > > To be taken seriously, I think you need to compare stringchain to the > list idiom. If your benchmarks favourably compare to that, then it might > be worthwhile. > IIRC, someone did some work on making concatenation faster by delaying it until a certain threshold had been reached (in the string class implementation). |