Replace various regex [Python]

Prev: ANN: Leo 4.7 rc1 released
Next: Configuring apache to execute python scripts using mod_pythonhandler

From: Martin on 12 Feb 2010 14:39

Hi,

I am trying to come up with a more generic scheme to match and replace
a series of regex, which look something like this...

19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)

Ideally match the pattern to the right of the "!" sign (e.g. lai), I
would then like to be able to replace one or all of the corresponding
numbers on the line. So far I have a rather unsatisfactory solution,
any suggestions would be appreciated...

The file read in is an ascii file.

f = open(fname, 'r')
s = f.read()

if CANHT:
s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
canht_ft", CANHT, s)

where CANHT might be

CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'

But this involves me passing the entire string.

Thanks.

Martin

From: McColgst on 12 Feb 2010 14:57

On Feb 12, 2:39 pm, Martin <mdeka...(a)gmail.com> wrote:
> Hi,
>
> I am trying to come up with a more generic scheme to match and replace
> a series of regex, which look something like this...
>
> 19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
> 5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
>
> Ideally match the pattern to the right of the "!" sign (e.g. lai), I
> would then like to be able to replace one or all of the corresponding
> numbers on the line. So far I have a rather unsatisfactory solution,
> any suggestions would be appreciated...
>
> The file read in is an ascii file.
>
> f = open(fname, 'r')
> s = f.read()
>
> if CANHT:
> s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
> canht_ft", CANHT, s)
>
> where CANHT might be
>
> CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'
>
> But this involves me passing the entire string.
>
> Thanks.
>
> Martin

If I understand correctly, there are a couple ways to do it.
One is to use .split() and split by the '!' sign, given that you wont
have more than one '!' on a line. This will return a list of the words
split by the delimiter, in this case being '!', so you should get back
(19.01,16.38,0.79,1.26,1.00 , canht_ft(1:npft) ) and you can do
whatever replace functions you want using the list.

check out split: http://docs.python.org/library/stdtypes.html#str.split

Another, is in your regular expression, you can match the first part
or second part of the string by specifying where the '!' is,
if you want to match the part after the '!' I would do something like
r"[^! cahnt_ft]", or something similar (i'm not particularly up-to-
date with my regex syntax, but I think you get the idea.)

I hope I understood correctly, and I hope that helps.

-sean

From: Martin on 12 Feb 2010 15:28

On Feb 12, 7:57 pm, McColgst <mccol...(a)gmail.com> wrote:
> On Feb 12, 2:39 pm, Martin <mdeka...(a)gmail.com> wrote:
>
>
>
>
>
> > Hi,
>
> > I am trying to come up with a more generic scheme to match and replace
> > a series of regex, which look something like this...
>
> > 19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
> > 5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
>
> > Ideally match the pattern to the right of the "!" sign (e.g. lai), I
> > would then like to be able to replace one or all of the corresponding
> > numbers on the line. So far I have a rather unsatisfactory solution,
> > any suggestions would be appreciated...
>
> > The file read in is an ascii file.
>
> > f = open(fname, 'r')
> > s = f.read()
>
> > if CANHT:
> > s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
> > canht_ft", CANHT, s)
>
> > where CANHT might be
>
> > CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'
>
> > But this involves me passing the entire string.
>
> > Thanks.
>
> > Martin
>
> If I understand correctly, there are a couple ways to do it.
> One is to use .split() and split by the '!' sign, given that you wont
> have more than one '!' on a line. This will return a list of the words
> split by the delimiter, in this case being '!', so you should get back
> (19.01,16.38,0.79,1.26,1.00 , canht_ft(1:npft) ) and you can do
> whatever replace functions you want using the list.
>
> check out split:http://docs.python.org/library/stdtypes.html#str.split
>
> Another, is in your regular expression, you can match the first part
> or second part of the string by specifying where the '!' is,
> if you want to match the part after the '!' I would do something like
> r"[^! cahnt_ft]", or something similar (i'm not particularly up-to-
> date with my regex syntax, but I think you get the idea.)
>
> I hope I understood correctly, and I hope that helps.
>
> -sean

Hi I like the second suggestion, so this wouldn't rely on me having to
match the numbers only the string canht for example but still allow me
to replace the whole line, is that what you mean?

I tried it and the expression seemed to replace the entire file, so
perhaps i am doing something wrong. But in principle I think that
might be a better scheme than my current one. i tried

if CANHT:
#s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
canht_ft", CANHT, s)
s = re.sub(r"[^! canht_ft]", CANHT, s)

From: MRAB on 12 Feb 2010 15:30

McColgst wrote:
> On Feb 12, 2:39 pm, Martin <mdeka...(a)gmail.com> wrote:
>> Hi,
>>
>> I am trying to come up with a more generic scheme to match and replace
>> a series of regex, which look something like this...
>>
>> 19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
>> 5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
>>
>> Ideally match the pattern to the right of the "!" sign (e.g. lai), I
>> would then like to be able to replace one or all of the corresponding
>> numbers on the line. So far I have a rather unsatisfactory solution,
>> any suggestions would be appreciated...
>>
>> The file read in is an ascii file.
>>
>> f = open(fname, 'r')
>> s = f.read()
>>
>> if CANHT:
>> s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
>> canht_ft", CANHT, s)
>>
>> where CANHT might be
>>
>> CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'
>>
>> But this involves me passing the entire string.
>>
>> Thanks.
>>
>> Martin
>
> If I understand correctly, there are a couple ways to do it.
> One is to use .split() and split by the '!' sign, given that you wont
> have more than one '!' on a line. This will return a list of the words
> split by the delimiter, in this case being '!', so you should get back
> (19.01,16.38,0.79,1.26,1.00 , canht_ft(1:npft) ) and you can do
> whatever replace functions you want using the list.
>
> check out split: http://docs.python.org/library/stdtypes.html#str.split
>
The .split method is the best way if you process the file a line at a
time. The .split method, incidentally, accepts a maxcount argument so
that you can split a line no more than once.

> Another, is in your regular expression, you can match the first part
> or second part of the string by specifying where the '!' is,
> if you want to match the part after the '!' I would do something like
> r"[^! cahnt_ft]", or something similar (i'm not particularly up-to-
> date with my regex syntax, but I think you get the idea.)
>
The regex would be r"(?m)^[^!]*(!.*)" to capture the '!' and the rest of
the line.

> I hope I understood correctly, and I hope that helps.
>

From: Martin on 12 Feb 2010 16:02

On Feb 12, 8:30 pm, MRAB <pyt...(a)mrabarnett.plus.com> wrote:
> McColgst wrote:
> > On Feb 12, 2:39 pm, Martin <mdeka...(a)gmail.com> wrote:
> >> Hi,
>
> >> I am trying to come up with a more generic scheme to match and replace
> >> a series of regex, which look something like this...
>
> >> 19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
> >> 5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
>
> >> Ideally match the pattern to the right of the "!" sign (e.g. lai), I
> >> would then like to be able to replace one or all of the corresponding
> >> numbers on the line. So far I have a rather unsatisfactory solution,
> >> any suggestions would be appreciated...
>
> >> The file read in is an ascii file.
>
> >> f = open(fname, 'r')
> >> s = f.read()
>
> >> if CANHT:
> >> s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
> >> canht_ft", CANHT, s)
>
> >> where CANHT might be
>
> >> CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'
>
> >> But this involves me passing the entire string.
>
> >> Thanks.
>
> >> Martin
>
> > If I understand correctly, there are a couple ways to do it.
> > One is to use .split() and split by the '!' sign, given that you wont
> > have more than one '!' on a line. This will return a list of the words
> > split by the delimiter, in this case being '!', so you should get back
> > (19.01,16.38,0.79,1.26,1.00 , canht_ft(1:npft) ) and you can do
> > whatever replace functions you want using the list.
>
> > check out split:http://docs.python.org/library/stdtypes.html#str.split
>
> The .split method is the best way if you process the file a line at a
> time. The .split method, incidentally, accepts a maxcount argument so
> that you can split a line no more than once.
>
> > Another, is in your regular expression, you can match the first part
> > or second part of the string by specifying where the '!' is,
> > if you want to match the part after the '!' I would do something like
> > r"[^! cahnt_ft]", or something similar (i'm not particularly up-to-
> > date with my regex syntax, but I think you get the idea.)
>
> The regex would be r"(?m)^[^!]*(!.*)" to capture the '!' and the rest of
> the line.
>
>
>
> > I hope I understood correctly, and I hope that helps.

I guess I could read the file a line at a time and try splitting it,
though I though it would be better to read it all once then search for
the various regex I need to match and replace?

I am not sure that regex helps, as that would match and replace every
line which had a "!". Perhaps if i explain more thoroughly?

So the input file looks something like this...

9*0.0 ! canopy(1:ntiles)
12.100 ! cs
0.0 ! gs
9*50.0 ! rgrain(1:ntiles)
0.749, 0.743, 0.754, 0.759 ! stheta(1:sm_levels)(top to bottom)
9*0.46 ! snow_tile(1:ntiles)
0.46 ! snow_grnd
276.78,277.46,278.99,282.48 ! t_soil(1:sm_levels)(top to bottom)
9*276.78 ! tstar_tile(1:ntiles)
19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
200.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)

So for each of the strings following the "!" I may potentially want to
match them and replace some of the numbers. That is I might search for
the expression snow_grnd with the intention of substituting 0.46 for
another number. What i came up with was a way to match all the numbers
and pass the replacement string.

| Next | Last
Pages: 1 2
Prev: ANN: Leo 4.7 rc1 released
Next: Configuring apache to execute python scripts using mod_pythonhandler