Prev: MBT shoes($62,1:1 quality),online shopping www.promptc.com
Next: Need help with my 1st python program
From: dasacc22 on 8 May 2010 13:19 Hi This is a simple question. I'm looking for the fastest way to calculate the leading whitespace (as a string, ie ' '). Here are some different methods I have tried so far --- solution 1 a = ' some content\n' b = a.strip() c = ' '*(len(a)-len(b)) --- solution 2 a = ' some content\n' b = a.strip() c = a.partition(b[0])[0] --- solution 3 def get_leading_whitespace(s): def _get(): for x in s: if x != ' ': break yield x return ''.join(_get()) --- Solution 1 seems to be about as fast as solution 2 except in certain circumstances where the value of b has already been determined for other purposes. Solution 3 is slower due to the function overhead. Curious to see what other types of solutions people might have. Thanks, Daniel
From: Patrick Maupin on 8 May 2010 13:59 On May 8, 12:19 pm, dasacc22 <dasac...(a)gmail.com> wrote: > Hi > > This is a simple question. I'm looking for the fastest way to > calculate the leading whitespace (as a string, ie ' '). > > Here are some different methods I have tried so far > --- solution 1 > > a = ' some content\n' > b = a.strip() > c = ' '*(len(a)-len(b)) > > --- solution 2 > > a = ' some content\n' > b = a.strip() > c = a.partition(b[0])[0] > > --- solution 3 > > def get_leading_whitespace(s): > def _get(): > for x in s: > if x != ' ': > break > yield x > return ''.join(_get()) > > --- > > Solution 1 seems to be about as fast as solution 2 except in certain > circumstances where the value of b has already been determined for > other purposes. Solution 3 is slower due to the function overhead. > > Curious to see what other types of solutions people might have. > > Thanks, > Daniel Well, you could try a solution using re, but that's probably only likely to be faster if you can use it on multiple concatenated lines. I usually use something like your solution #1. One thing to be aware of, though, is that strip() with no parameters will strip *any* whitespace, not just spaces, so the implicit assumption in your code that what you have stripped is spaces may not be justified (depending on the source data). OTOH, depending on how you use that whitespace information, it may not really matter. But if it does matter, you can use strip(' ') If speed is really an issue for you, you could also investigate mxtexttools, but, like re, it might perform better if the source consists of several batched lines. Regards, Pat
From: dasacc22 on 8 May 2010 14:16 On May 8, 12:59 pm, Patrick Maupin <pmau...(a)gmail.com> wrote: > On May 8, 12:19 pm, dasacc22 <dasac...(a)gmail.com> wrote: > > > > > > > Hi > > > This is a simple question. I'm looking for the fastest way to > > calculate the leading whitespace (as a string, ie ' '). > > > Here are some different methods I have tried so far > > --- solution 1 > > > a = ' some content\n' > > b = a.strip() > > c = ' '*(len(a)-len(b)) > > > --- solution 2 > > > a = ' some content\n' > > b = a.strip() > > c = a.partition(b[0])[0] > > > --- solution 3 > > > def get_leading_whitespace(s): > > def _get(): > > for x in s: > > if x != ' ': > > break > > yield x > > return ''.join(_get()) > > > --- > > > Solution 1 seems to be about as fast as solution 2 except in certain > > circumstances where the value of b has already been determined for > > other purposes. Solution 3 is slower due to the function overhead. > > > Curious to see what other types of solutions people might have. > > > Thanks, > > Daniel > > Well, you could try a solution using re, but that's probably only > likely to be faster if you can use it on multiple concatenated lines. > I usually use something like your solution #1. One thing to be aware > of, though, is that strip() with no parameters will strip *any* > whitespace, not just spaces, so the implicit assumption in your code > that what you have stripped is spaces may not be justified (depending > on the source data). OTOH, depending on how you use that whitespace > information, it may not really matter. But if it does matter, you can > use strip(' ') > > If speed is really an issue for you, you could also investigate > mxtexttools, but, like re, it might perform better if the source > consists of several batched lines. > > Regards, > Pat Hi, thanks for the info. Using .strip() to remove all whitespace in solution 1 is a must. If you only stripped ' ' spaces then line endings would get counted in the len() call and when multiplied against ' ', would produce an inaccurate result. Regex is significantly slower for my purposes but ive never heard of mxtexttools. Even if it proves slow its spurred my curiousity as to what functionality it provides (on an unrelated note)
From: Steven D'Aprano on 8 May 2010 14:46 On Sat, 08 May 2010 10:19:16 -0700, dasacc22 wrote: > Hi > > This is a simple question. I'm looking for the fastest way to calculate > the leading whitespace (as a string, ie ' '). Is calculating the amount of leading whitespace really the bottleneck in your application? If not, then trying to shave off microseconds from something which is a trivial part of your app is almost certainly a waste of your time. [...] > a = ' some content\n' > b = a.strip() > c = ' '*(len(a)-len(b)) I take it that you haven't actually tested this code for correctness, because it's buggy. Let's test it: >>> leading_whitespace = " "*2 + "\t"*2 >>> a = leading_whitespace + "some non-whitespace text\n" >>> b = a.strip() >>> c = " "*(len(a)-len(b)) >>> assert c == leading_whitespace Traceback (most recent call last): File "<stdin>", line 1, in <module> AssertionError Not only doesn't it get the whitespace right, but it doesn't even get the *amount* of whitespace right: >>> assert len(c) == len(leading_whitespace) Traceback (most recent call last): File "<stdin>", line 1, in <module> AssertionError It doesn't even work correctly if you limit "whitespace" to mean spaces and nothing else! It's simply wrong in every possible way. This is why people say that premature optimization is the root of all (programming) evil. Instead of wasting time and energy trying to optimise code, you should make it correct first. Your solutions 2 and 3 are also buggy. And solution 3 can be easily re- written to be more straightforward. Instead of the complicated: > def get_leading_whitespace(s): > def _get(): > for x in s: > if x != ' ': > break > yield x > return ''.join(_get()) try this version: def get_leading_whitespace(s): accumulator = [] for c in s: if c in ' \t\v\f\r\n': accumulator.append(c) else: break return ''.join(accumulator) Once you're sure this is correct, then you can optimise it: def get_leading_whitespace(s): t = s.lstrip() return s[:len(s)-len(t)] >>> c = get_leading_whitespace(a) >>> assert c == leading_whitespace >>> Unless your strings are very large, this is likely to be faster than any other pure-Python solution you can come up with. -- Steven
From: Wolfram Hinderer on 8 May 2010 15:15 On 8 Mai, 20:46, Steven D'Aprano <st...(a)REMOVE-THIS- cybersource.com.au> wrote: > def get_leading_whitespace(s): > t = s.lstrip() > return s[:len(s)-len(t)] > > >>> c = get_leading_whitespace(a) > >>> assert c == leading_whitespace > > Unless your strings are very large, this is likely to be faster than any > other pure-Python solution you can come up with. Returning s[:-1 - len(t)] is faster.
|
Next
|
Last
Pages: 1 2 3 4 Prev: MBT shoes($62,1:1 quality),online shopping www.promptc.com Next: Need help with my 1st python program |