From: magnus.lycka on 5 Jun 2010 04:30 It seems that Python treats non-breaking space (\xa0) as a normal whitespace character, e.g. when splitting a string. See below: >>> s='hello\xa0there' >>> s.split() ['hello', 'there'] Surely this is not intended behaviour?
From: Steven D'Aprano on 5 Jun 2010 04:59 On Sat, 05 Jun 2010 01:30:40 -0700, magnus.lycka(a)gmail.com wrote: > It seems that Python treats non-breaking space (\xa0) as a normal > whitespace character, e.g. when splitting a string. See below: > >>>> s='hello\xa0there' >>>> s.split() > ['hello', 'there'] > > Surely this is not intended behaviour? Yes it is. str.split() breaks on whitespace, and \xa0 is whitespace according to the Unicode standard. To put it another way, str.split() is not a word- wrapping split. This has been reported before, and rejected as a won't- fix. http://mail.python.org/pipermail/python-bugs-list/2006-January/031531.html -- Steven
|
Pages: 1 Prev: Replace in large text file ? Next: error in importing numpy |