Prev: MBT shoes($62,1:1 quality),online shopping www.promptc.com
Next: Need help with my 1st python program
From: Steven D'Aprano on 8 May 2010 15:46 On Sat, 08 May 2010 12:15:22 -0700, Wolfram Hinderer wrote: > On 8 Mai, 20:46, Steven D'Aprano <st...(a)REMOVE-THIS- cybersource.com.au> > wrote: > >> def get_leading_whitespace(s): >> t = s.lstrip() >> return s[:len(s)-len(t)] >> >> >>> c = get_leading_whitespace(a) >> >>> assert c == leading_whitespace >> >> Unless your strings are very large, this is likely to be faster than >> any other pure-Python solution you can come up with. > > Returning s[:-1 - len(t)] is faster. I'm sure it is. Unfortunately, it's also incorrect. >>> z = "*****abcde" >>> z[:-1-5] '****' >>> z[:len(z)-5] '*****' However, s[:-len(t)] should be both faster and correct. -- Steven
From: Mark Dickinson on 8 May 2010 16:46 On May 8, 8:46 pm, Steven D'Aprano <st...(a)REMOVE-THIS- cybersource.com.au> wrote: > On Sat, 08 May 2010 12:15:22 -0700, Wolfram Hinderer wrote: > > On 8 Mai, 20:46, Steven D'Aprano <st...(a)REMOVE-THIS- cybersource.com.au> > > wrote: > > >> def get_leading_whitespace(s): > >> t = s.lstrip() > >> return s[:len(s)-len(t)] > > >> >>> c = get_leading_whitespace(a) > >> >>> assert c == leading_whitespace > > >> Unless your strings are very large, this is likely to be faster than > >> any other pure-Python solution you can come up with. > > > Returning s[:-1 - len(t)] is faster. > > I'm sure it is. Unfortunately, it's also incorrect. > > >>> z = "*****abcde" > >>> z[:-1-5] > '****' > >>> z[:len(z)-5] > > '*****' > > However, s[:-len(t)] should be both faster and correct. Unless len(t) == 0, surely? -- Mark
From: dasacc22 on 8 May 2010 17:27 U presume entirely to much. I have a preprocessor that normalizes documents while performing other more complex operations. Theres nothing buggy about what im doing On May 8, 1:46 pm, Steven D'Aprano <st...(a)REMOVE-THIS- cybersource.com.au> wrote: > On Sat, 08 May 2010 10:19:16 -0700, dasacc22 wrote: > > Hi > > > This is a simple question. I'm looking for the fastest way to calculate > > the leading whitespace (as a string, ie ' '). > > Is calculating the amount of leading whitespace really the bottleneck in > your application? If not, then trying to shave off microseconds from > something which is a trivial part of your app is almost certainly a waste > of your time. > > [...] > > > a = ' some content\n' > > b = a.strip() > > c = ' '*(len(a)-len(b)) > > I take it that you haven't actually tested this code for correctness, > because it's buggy. Let's test it: > > >>> leading_whitespace = " "*2 + "\t"*2 > >>> a = leading_whitespace + "some non-whitespace text\n" > >>> b = a.strip() > >>> c = " "*(len(a)-len(b)) > >>> assert c == leading_whitespace > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > AssertionError > > Not only doesn't it get the whitespace right, but it doesn't even get the > *amount* of whitespace right: > > >>> assert len(c) == len(leading_whitespace) > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > AssertionError > > It doesn't even work correctly if you limit "whitespace" to mean spaces > and nothing else! It's simply wrong in every possible way. > > This is why people say that premature optimization is the root of all > (programming) evil. Instead of wasting time and energy trying to optimise > code, you should make it correct first. > > Your solutions 2 and 3 are also buggy. And solution 3 can be easily re- > written to be more straightforward. Instead of the complicated: > > > def get_leading_whitespace(s): > > def _get(): > > for x in s: > > if x != ' ': > > break > > yield x > > return ''.join(_get()) > > try this version: > > def get_leading_whitespace(s): > accumulator = [] > for c in s: > if c in ' \t\v\f\r\n': > accumulator.append(c) > else: > break > return ''.join(accumulator) > > Once you're sure this is correct, then you can optimise it: > > def get_leading_whitespace(s): > t = s.lstrip() > return s[:len(s)-len(t)] > > >>> c = get_leading_whitespace(a) > >>> assert c == leading_whitespace > > Unless your strings are very large, this is likely to be faster than any > other pure-Python solution you can come up with. > > -- > Steven
From: Patrick Maupin on 8 May 2010 18:18 On May 8, 1:16 pm, dasacc22 <dasac...(a)gmail.com> wrote: > On May 8, 12:59 pm, Patrick Maupin <pmau...(a)gmail.com> wrote: > > > > > On May 8, 12:19 pm, dasacc22 <dasac...(a)gmail.com> wrote: > > > > Hi > > > > This is a simple question. I'm looking for the fastest way to > > > calculate the leading whitespace (as a string, ie ' '). > > > > Here are some different methods I have tried so far > > > --- solution 1 > > > > a = ' some content\n' > > > b = a.strip() > > > c = ' '*(len(a)-len(b)) > > > > --- solution 2 > > > > a = ' some content\n' > > > b = a.strip() > > > c = a.partition(b[0])[0] > > > > --- solution 3 > > > > def get_leading_whitespace(s): > > > def _get(): > > > for x in s: > > > if x != ' ': > > > break > > > yield x > > > return ''.join(_get()) > > > > --- > > > > Solution 1 seems to be about as fast as solution 2 except in certain > > > circumstances where the value of b has already been determined for > > > other purposes. Solution 3 is slower due to the function overhead. > > > > Curious to see what other types of solutions people might have. > > > > Thanks, > > > Daniel > > > Well, you could try a solution using re, but that's probably only > > likely to be faster if you can use it on multiple concatenated lines. > > I usually use something like your solution #1. One thing to be aware > > of, though, is that strip() with no parameters will strip *any* > > whitespace, not just spaces, so the implicit assumption in your code > > that what you have stripped is spaces may not be justified (depending > > on the source data). OTOH, depending on how you use that whitespace > > information, it may not really matter. But if it does matter, you can > > use strip(' ') > > > If speed is really an issue for you, you could also investigate > > mxtexttools, but, like re, it might perform better if the source > > consists of several batched lines. > > > Regards, > > Pat > > Hi, > > thanks for the info. Using .strip() to remove all whitespace in > solution 1 is a must. If you only stripped ' ' spaces then line > endings would get counted in the len() call and when multiplied > against ' ', would produce an inaccurate result. Regex is > significantly slower for my purposes but ive never heard of > mxtexttools. Even if it proves slow its spurred my curiousity as to > what functionality it provides (on an unrelated note) Could you reorganize your code to do multiple lines at a time? That might make regex competitive. Regards, Pat
From: dasacc22 on 8 May 2010 23:48 On May 8, 5:18 pm, Patrick Maupin <pmau...(a)gmail.com> wrote: > On May 8, 1:16 pm, dasacc22 <dasac...(a)gmail.com> wrote: > > > > > > > On May 8, 12:59 pm, Patrick Maupin <pmau...(a)gmail.com> wrote: > > > > On May 8, 12:19 pm, dasacc22 <dasac...(a)gmail.com> wrote: > > > > > Hi > > > > > This is a simple question. I'm looking for the fastest way to > > > > calculate the leading whitespace (as a string, ie ' '). > > > > > Here are some different methods I have tried so far > > > > --- solution 1 > > > > > a = ' some content\n' > > > > b = a.strip() > > > > c = ' '*(len(a)-len(b)) > > > > > --- solution 2 > > > > > a = ' some content\n' > > > > b = a.strip() > > > > c = a.partition(b[0])[0] > > > > > --- solution 3 > > > > > def get_leading_whitespace(s): > > > > def _get(): > > > > for x in s: > > > > if x != ' ': > > > > break > > > > yield x > > > > return ''.join(_get()) > > > > > --- > > > > > Solution 1 seems to be about as fast as solution 2 except in certain > > > > circumstances where the value of b has already been determined for > > > > other purposes. Solution 3 is slower due to the function overhead. > > > > > Curious to see what other types of solutions people might have. > > > > > Thanks, > > > > Daniel > > > > Well, you could try a solution using re, but that's probably only > > > likely to be faster if you can use it on multiple concatenated lines. > > > I usually use something like your solution #1. One thing to be aware > > > of, though, is that strip() with no parameters will strip *any* > > > whitespace, not just spaces, so the implicit assumption in your code > > > that what you have stripped is spaces may not be justified (depending > > > on the source data). OTOH, depending on how you use that whitespace > > > information, it may not really matter. But if it does matter, you can > > > use strip(' ') > > > > If speed is really an issue for you, you could also investigate > > > mxtexttools, but, like re, it might perform better if the source > > > consists of several batched lines. > > > > Regards, > > > Pat > > > Hi, > > > thanks for the info. Using .strip() to remove all whitespace in > > solution 1 is a must. If you only stripped ' ' spaces then line > > endings would get counted in the len() call and when multiplied > > against ' ', would produce an inaccurate result. Regex is > > significantly slower for my purposes but ive never heard of > > mxtexttools. Even if it proves slow its spurred my curiousity as to > > what functionality it provides (on an unrelated note) > > Could you reorganize your code to do multiple lines at a time? That > might make regex competitive. > > Regards, > Pat I have tried this already, the problem here is that it's not a trivial matter. Iterating over each line is unavoidable, and I found that using various python builtins to perform string operations (like say the wonderful partition builtin) during each iteration works 3 fold faster then regexing the entire document with various needs. Another issue is having to keep a line count and when iterating over regex matches and counting lines, it doesn't scale nearly as well as a straight python solution using builtins to process the information. At the heart of this here, determining the leading white-space is a trivial matter. I have much more complex problems to deal with. I was much more interested in seeing what kind of solutions ppl would come up with to such a problem, and perhaps uncover something new in python that I can apply to a more complex problem. What spurred the thought was this piece written up by guido concerning "what's the best way to convert a list of integers into a string". It's a simple question where concepts are introduced that can lead to solving more complex problems. http://www.python.org/doc/essays/list2str.html
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: MBT shoes($62,1:1 quality),online shopping www.promptc.com Next: Need help with my 1st python program |