From: Muhammad Adeel on 6 Aug 2010 05:07 Hi, Does any one know how to tokenize a string in python that returns the byte offsets and tokens? Moreover, the sentence splitter that returns the sentences and byte offsets? Finally n-grams returned with byte offsets. Input: This is a string. Output: This 0 is 5 a 8 string. 10 thanks
From: Gabriel Genellina on 6 Aug 2010 05:49 En Fri, 06 Aug 2010 06:07:32 -0300, Muhammad Adeel <nawabadeel(a)gmail.com> escribi�: > Does any one know how to tokenize a string in python that returns the > byte offsets and tokens? Moreover, the sentence splitter that returns > the sentences and byte offsets? Finally n-grams returned with byte > offsets. > > Input: > This is a string. > > Output: > This 0 > is 5 > a 8 > string. 10 Like this? py> import re py> s = "This is a string." py> for g in re.finditer("\S+", s): .... print g.group(), g.start() .... This 0 is 5 a 8 string. 10 -- Gabriel Genellina
From: Muhammad Adeel on 6 Aug 2010 06:06 On Aug 6, 10:49 am, "Gabriel Genellina" <gagsl-...(a)yahoo.com.ar> wrote: > En Fri, 06 Aug 2010 06:07:32 -0300, Muhammad Adeel <nawabad...(a)gmail.com> > escribió: > > > Does any one know how to tokenize a string in python that returns the > > byte offsets and tokens? Moreover, the sentence splitter that returns > > the sentences and byte offsets? Finally n-grams returned with byte > > offsets. > > > Input: > > This is a string. > > > Output: > > This 0 > > is 5 > > a 8 > > string. 10 > > Like this? > > py> import re > py> s = "This is a string." > py> for g in re.finditer("\S+", s): > ... print g.group(), g.start() > ... > This 0 > is 5 > a 8 > string. 10 > > -- > Gabriel Genellina Hi, Thanks. Can you please tell me how to do for n-grams and sentences as well?
|
Pages: 1 Prev: How to read large amounts of output via popen Next: Python Portability |