Prev: ANNOUNCE: RSON v 0.02 available
Next: StringChain -- a data structure for managing large sequences of chunks of bytes
From: Zooko O'Whielacronx on 12 Mar 2010 02:11 Folks: Every couple of years I run into a problem where some Python code that worked well at small scales starts burning up my CPU at larger scales, and the underlying issue turns out to be the idiom of accumulating data by string concatenation. It just happened again (http://foolscap.lothar.com/trac/ticket/149 ), and as usual it is hard to make the data accumulator efficient without introducing a bunch of bugs into the surrounding code. So this time around I decided to try encapsulating my preferred more efficient idiom into a reusable class. So I present to you StringChain, which is an efficient way to accumulate and process data in many chunks: http://tahoe-lafs.org/trac/stringchain Here are some benchmarks generated by running python -OOu -c 'from stringchain.bench import bench; bench.quick_bench()' as instructed by the README.txt file. The N: in the left-hand column is how many bytes were in the test dataset. The ave rate: number in the right-hand column is how many bytes per second were processed. "naive" means the string-based idiom sketched above and "strch" means using the StringChain class. _buildup init_naive N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 890, ave rate: 58350579 N: 131072, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 265, ave rate: 34800398 N: 262144, time: best: 0.01, 2th-best: 0.01, ave: 0.01, 2th-worst: 0.01, worst: 0.01 (of 5), reps/s: 79, ave rate: 20745346 N: 524288, time: best: 0.05, 2th-best: 0.05, ave: 0.05, 2th-worst: 0.05, worst: 0.05 (of 5), reps/s: 20, ave rate: 10823850 _buildup init_strch N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 25543, ave rate: 1674043282 N: 131072, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 14179, ave rate: 1858538925 N: 262144, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 8016, ave rate: 2101513050 N: 524288, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 4108, ave rate: 2154215572 _consume init_naive_loaded N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 931, ave rate: 61037862 N: 131072, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 270, ave rate: 35454393 N: 262144, time: best: 0.01, 2th-best: 0.01, ave: 0.01, 2th-worst: 0.01, worst: 0.01 (of 5), reps/s: 74, ave rate: 19471963 N: 524288, time: best: 0.05, 2th-best: 0.05, ave: 0.05, 2th-worst: 0.05, worst: 0.06 (of 5), reps/s: 19, ave rate: 10146747 _consume init_strch_loaded N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 4309, ave rate: 282447500 N: 131072, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 2313, ave rate: 303263357 N: 262144, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 1186, ave rate: 311159052 N: 524288, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 606, ave rate: 317814669 _randomy init_naive N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 479, ave rate: 31450561 N: 131072, time: best: 0.01, 2th-best: 0.01, ave: 0.01, 2th-worst: 0.01, worst: 0.01 (of 5), reps/s: 140, ave rate: 18461191 N: 262144, time: best: 0.02, 2th-best: 0.02, ave: 0.02, 2th-worst: 0.03, worst: 0.03 (of 5), reps/s: 42, ave rate: 11127714 N: 524288, time: best: 0.06, 2th-best: 0.07, ave: 0.08, 2th-worst: 0.08, worst: 0.09 (of 5), reps/s: 13, ave rate: 6906341 _randomy init_strch N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 973, ave rate: 63827127 N: 131072, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 495, ave rate: 64970669 N: 262144, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 239, ave rate: 62913360 N: 524288, time: best: 0.01, 2th-best: 0.01, ave: 0.01, 2th-worst: 0.01, worst: 0.01 (of 5), reps/s: 121, ave rate: 63811569 The naive approach is slower than the StringChain class, and the bigger the dataset the slower it goes. The StringChain class is fast and also it is scalable (with regard to these benchmarks at least...). Thanks! Regards, Zooko
From: Steven D'Aprano on 12 Mar 2010 03:52 On Fri, 12 Mar 2010 00:11:37 -0700, Zooko O'Whielacronx wrote: > Folks: > > Every couple of years I run into a problem where some Python code that > worked well at small scales starts burning up my CPU at larger scales, > and the underlying issue turns out to be the idiom of accumulating data > by string concatenation. I don't mean to discourage you, but the simple way to avoid that is not to accumulate data by string concatenation. The usual Python idiom is to append substrings to a list, then once, at the very end, combine into a single string: accumulator = [] for item in sequence: accumulator.append(process(item)) string = ''.join(accumulator) > It just happened again > (http://foolscap.lothar.com/trac/ticket/149 ), and as usual it is hard > to make the data accumulator efficient without introducing a bunch of > bugs into the surrounding code. I'm sorry, I don't agree about that at all. I've never come across a situation where I wanted to use string concatenation and couldn't easily modify it to use the list idiom above. [...] > Here are some benchmarks generated by running python -OOu -c 'from > stringchain.bench import bench; bench.quick_bench()' as instructed by > the README.txt file. To be taken seriously, I think you need to compare stringchain to the list idiom. If your benchmarks favourably compare to that, then it might be worthwhile. -- Steven
From: Zooko O'Whielacronx on 22 Mar 2010 01:09 Folks: I failed to make something sufficiently clear in my original message about StringChain. The use case that I am talking about is not simply that you need to accumulate a sequence of incoming chunks of data, concatenate them together, and then process the entire result. If that is all you need to do then indeed you can accumulate the incoming strings in a list and then run ''.join(thelist) when you have received all of them and are ready to process them. But the use case that I am talking about is where you need to accumulate new incoming strings into your buffer while alternately processing leading prefixes of the buffer. The typical example is that sequences of bytes are arriving on your TCP socket and you are processing the stream incrementally. You need to process the leading prefixes of the stream as soon as you can (e.g. as soon as a line followed by a newline has accumulated, or as soon as another complete element in your data format has accumulated, etc.); you can't simply wait until the bytes stop coming and then concatenate the entire collection together at once. This is exactly the current situation in foolscap [1] which is causing a performance problem in Tahoe-LAFS [2]. To illustrate what I mean I cooked up some benchmarks showing the task of "accumulate a bunch of things then consume them in a single gulp" versus the task of "alternate between accumulating and consuming" (with accumulation going faster than consumption until the input stream runs dry). I implemented a few data structures for benchmarking: StringChain, the naive "accumulatorstr += newstr" approach (named "Stringy" in the benchmarks), one based on cStringIO (named "StringIOy"), one based on accumulating the new strings into a list and then doing ''.join(accumulatorlist) whenever you need to consume a leading prefix (called "SimplerStringChain"). Below are the abbreviated results of the benchmark. You can easily run this benchmark yourself by following the README.txt in the StringChain source package [3]. These results show that for the load imposed by this benchmark StringChain is the only one of the four that scales and that it is also nice and fast. The left-hand column is the total number of bytes that were run through it. The results are shown in nanoseconds per byte in exponential notation. ("e+01" means times 10, "e+02" means times 100, and "e+03" means times 1000, etc.) Each result is the best of 10 tries, or of 5 tries for the tasks which were taking too long to run it 10 times. Regards, Zooko [1] http://foolscap.lothar.com/trac/ticket/149 [2] http://allmydata.org/pipermail/tahoe-dev/2010-March/004181.html [3] http://tahoe-lafs.org/trac/stringchain impl: StringChain task: _accumulate_then_one_gulp 10000 best: 2.694e+00 50000 best: 2.742e+00 100000 best: 2.310e+00 500000 best: 2.040e+00 1000000 best: 1.988e+00 5000000 best: 2.193e+00 task: _alternate_str 10000 best: 6.509e+00 50000 best: 4.559e+00 100000 best: 4.308e+00 500000 best: 4.070e+00 1000000 best: 3.991e+00 5000000 best: 4.000e+00 impl: SimplerStringChain task: _accumulate_then_one_gulp 10000 best: 1.407e+00 50000 best: 2.317e+00 100000 best: 2.012e+00 500000 best: 1.902e+00 1000000 best: 1.897e+00 5000000 best: 2.104e+00 task: _alternate_str 10000 best: 4.888e+00 50000 best: 5.198e+00 100000 best: 1.750e+01 500000 best: 6.233e+01 1000000 best: 1.134e+02 5000000 best: 7.599e+02 impl: StringIOy task: _accumulate_then_one_gulp 10000 best: 4.196e+00 50000 best: 5.522e+00 100000 best: 4.499e+00 500000 best: 3.756e+00 1000000 best: 4.176e+00 5000000 best: 5.414e+00 task: _alternate_str 10000 best: 5.484e+00 50000 best: 7.863e+00 100000 best: 2.126e+01 500000 best: 6.972e+01 1000000 best: 1.219e+02 5000000 best: 9.463e+02 task: _accumulate_then_one_gulp 10000 best: 1.502e+00 50000 best: 1.420e+01 100000 best: 2.245e+01 500000 best: 8.577e+01 1000000 best: 2.295e+02 5000000 best: 1.326e+03 task: _alternate_str 10000 best: 3.290e+00 50000 best: 4.220e+00 100000 best: 1.665e+01 500000 best: 6.281e+01 1000000 best: 1.127e+02 5000000 best: 7.626e+02
From: Steven D'Aprano on 22 Mar 2010 04:07 On Sun, 21 Mar 2010 23:09:46 -0600, Zooko O'Whielacronx wrote: > But the use case that I am talking about is where you need to accumulate > new incoming strings into your buffer while alternately processing > leading prefixes of the buffer. [...] > Below are the abbreviated results of the benchmark. You can easily run > this benchmark yourself by following the README.txt in the StringChain > source package [3]. These results show that for the load imposed by this > benchmark StringChain is the only one of the four that scales and that > it is also nice and fast. I was reading this email, and thinking "Why do you need this StringChain data structure, from the description it sounds like a job for collections.deque?" And funnily enough, following the links you supplied I came across this: "You can get the package from http://pypi.python.org/pypi/stringchain or with darcs get http://tahoe-lafs.org/source/stringchain/trunk. It has unit tests. It is in pure Python (it uses collections.deque and string)." http://tahoe-lafs.org/trac/stringchain Perhaps you should have said that it was a wrapper around deque giving richer functionality, rather than giving the impression that it was a brand new data structure invented by you. People are naturally going to be more skeptical about a newly invented data structure than one based on a reliable, component like deque. In any case, it looks to me that the StringChain data structure itself is a little too application specific for me to be personally interested in it. Maybe you should consider linking to it on PyPI and seeing if there is any interest from others? Regards, Steven
From: Zooko O'Whielacronx on 22 Mar 2010 17:21
On Mon, Mar 22, 2010 at 2:07 AM, Steven D'Aprano <steven(a)remove.this.cybersource.com.au> wrote: > > Perhaps you should have said that it was a wrapper around deque giving > richer functionality, rather than giving the impression that it was a > brand new data structure invented by you. People are naturally going to > be more skeptical about a newly invented data structure than one based on > a reliable, component like deque. The fact that StringChain uses deque to hold the queue of strings isn't that important. I just benchmarked it by swapping in the deque for a list and using the list costs about one third of a nanosecond per byte at the scales that the benchmark exercises (namely 10,000,000 bytes in about 10,000 strings). A third of a nanosecond per byte is about 4% of the runtime. I also implemented and benchmarked a simpler deque-based scheme which puts all the actual bytes from the strings into a deque with self.d.extend(newstr). As you would expect, this shows good asymptotic performance but the constants are relatively bad so that at all of the actual loads measured it was three orders of magnitude worse than StringChain and than String-Chain-with-a-list-instead-of-a-deque. Moral: the constants matter! Those benchmarks are appended. You can run the benchmarks yourself per the README.txt. But anyway, I take your point and I updated the StringChain README to explain that it is a pretty simple data structure that holds a list (actually a deque) of strings and isn't anything too clever or novel. By the way, we could further micro-optimize this kind of thing if ''.join() would accept either strings or buffers instead of requiring strings: >>> ''.join([buffer("abc"), "def"]) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: sequence item 0: expected string, buffer found Then whenever StringChain needs to slice a string into leading and trailing parts, it could construct a buffer() viewing each part instead of making a copy of each part. > it. Maybe you should consider linking to it on PyPI and seeing if there > is any interest from others? http://pypi.python.org/pypi/stringchain Regards, Zooko impl: StringChain task: _accumulate_then_one_gulp 10000 best: 5.698e+00, 3th-best: 7.486e+00, mean: 7.758e+00, 100000 best: 4.640e+00, 3th-best: 4.690e+00, mean: 7.674e+00, 1000000 best: 3.789e+00, 3th-best: 3.806e+00, mean: 3.995e+00, 10000000 best: 4.095e+00, 1th-best: 4.095e+00, mean: 4.095e+00, task: _alternate_str 10000 best: 1.378e+01, 3th-best: 1.390e+01, mean: 1.500e+01, 100000 best: 9.198e+00, 3th-best: 9.248e+00, mean: 9.385e+00, 1000000 best: 8.715e+00, 3th-best: 8.755e+00, mean: 8.808e+00, 10000000 best: 8.738e+00, 1th-best: 8.738e+00, mean: 8.738e+00, impl: StringChainWithList task: _accumulate_then_one_gulp 10000 best: 3.600e+00, 3th-best: 3.695e+00, mean: 4.129e+00, 100000 best: 4.070e+00, 3th-best: 4.079e+00, mean: 4.162e+00, 1000000 best: 3.662e+00, 3th-best: 3.688e+00, mean: 3.721e+00, 10000000 best: 3.902e+00, 1th-best: 3.902e+00, mean: 3.902e+00, 1th-worst: 3.902e+00, worst: 3.902e+00 (of 1) task: _alternate_str 10000 best: 1.369e+01, 3th-best: 1.380e+01, mean: 1.442e+01, 100000 best: 9.251e+00, 3th-best: 9.289e+00, mean: 9.416e+00, 1000000 best: 8.809e+00, 3th-best: 8.876e+00, mean: 8.943e+00, 10000000 best: 9.095e+00, 1th-best: 9.095e+00, mean: 9.095e+00, impl: Dequey task: _accumulate_then_one_gulp 10000 best: 2.772e+02, 3th-best: 2.785e+02, mean: 2.911e+02, 100000 best: 2.314e+02, 3th-best: 2.334e+02, mean: 2.422e+02, 1000000 best: 2.282e+02, 3th-best: 2.288e+02, mean: 2.370e+02, 10000000 best: 2.587e+02, 1th-best: 2.587e+02, mean: 2.587e+02, task: _alternate_str 10000 best: 1.576e+03, 3th-best: 1.580e+03, mean: 1.608e+03, 100000 best: 1.301e+03, 3th-best: 1.303e+03, mean: 1.306e+03, 1000000 best: 1.275e+03, 3th-best: 1.276e+03, mean: 1.278e+03, 10000000 best: 1.280e+03, 1th-best: 1.280e+03, mean: 1.280e+03, |