Replace various regex [Python]

Prev: ANN: Leo 4.7 rc1 released
Next: Configuring apache to execute python scripts using mod_pythonhandler

From: Jean-Michel Pichavant on 15 Feb 2010 09:03

Martin wrote:
> Hi,
>
> I am trying to come up with a more generic scheme to match and replace
> a series of regex, which look something like this...
>
> 19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
> 5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
>
> Ideally match the pattern to the right of the "!" sign (e.g. lai), I
> would then like to be able to replace one or all of the corresponding
> numbers on the line. So far I have a rather unsatisfactory solution,
> any suggestions would be appreciated...
>
> The file read in is an ascii file.
>
> f = open(fname, 'r')
> s = f.read()
>
> if CANHT:
> s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
> canht_ft", CANHT, s)
>
> where CANHT might be
>
> CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'
>
> But this involves me passing the entire string.
>
> Thanks.
>
> Martin
>

I remove all lines containing things like 9*0.0 in your file, cause I
don't know what they mean and how to handle them. These are not numbers.

import re

replace = {
'snow_grnd' : (1, '99.99,'), # replace the 1st number by 99.99
't_soil' : (2, '88.8,'), # replace the 2nd number by 88.88
}

testBuffer = """
0.749, 0.743, 0.754, 0.759 ! stheta(1:sm_levels)(top to bottom)
0.46 ! snow_grnd
276.78,277.46,278.99,282.48 ! t_soil(1:sm_levels)(top to bottom)
19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
200.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
"""

outputBuffer = ''
for line in testBuffer.split('\n'):
for key, (index, repl) in replace.items():
if key in line:
parameters = {
'n' : '[\d\.]+', # given you example you have to change
this one, I don't know what means 9*0.0 in your file
'index' : index - 1,
}
# the following pattern will silently match any digit before
the <index>th digit is found, and use a capturing parenthesis for the last
pattern =
'(\s*(?:(?:%(n)s)[,\s]+){0,%(index)s})(?:(%(n)s)[,\s]+)(.*!.*)' %
parameters # regexp are sometimes a nightmare to read
line = re.sub(pattern, r'\1 '+repl+r'\3' , line)
break
outputBuffer += line +'\n'

print outputBuffer

From: Martin on 15 Feb 2010 09:13

On Feb 15, 2:03 pm, Jean-Michel Pichavant <jeanmic...(a)sequans.com>
wrote:
> Martin wrote:
> > Hi,
>
> > I am trying to come up with a more generic scheme to match and replace
> > a series of regex, which look something like this...
>
> > 19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
> > 5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
>
> > Ideally match the pattern to the right of the "!" sign (e.g. lai), I
> > would then like to be able to replace one or all of the corresponding
> > numbers on the line. So far I have a rather unsatisfactory solution,
> > any suggestions would be appreciated...
>
> > The file read in is an ascii file.
>
> > f = open(fname, 'r')
> > s = f.read()
>
> > if CANHT:
> > s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
> > canht_ft", CANHT, s)
>
> > where CANHT might be
>
> > CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'
>
> > But this involves me passing the entire string.
>
> > Thanks.
>
> > Martin
>
> I remove all lines containing things like 9*0.0 in your file, cause I
> don't know what they mean and how to handle them. These are not numbers.
>
> import re
>
> replace = {
> 'snow_grnd' : (1, '99.99,'), # replace the 1st number by 99.99
> 't_soil' : (2, '88.8,'), # replace the 2nd number by 88.88
> }
>
> testBuffer = """
> 0.749, 0.743, 0.754, 0.759 ! stheta(1:sm_levels)(top to bottom)
> 0.46 ! snow_grnd
> 276.78,277.46,278.99,282.48 ! t_soil(1:sm_levels)(top to bottom)
> 19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
> 200.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
> """
>
> outputBuffer = ''
> for line in testBuffer.split('\n'):
> for key, (index, repl) in replace.items():
> if key in line:
> parameters = {
> 'n' : '[\d\.]+', # given you example you have to change
> this one, I don't know what means 9*0.0 in your file
> 'index' : index - 1,
> }
> # the following pattern will silently match any digit before
> the <index>th digit is found, and use a capturing parenthesis for the last
> pattern =
> '(\s*(?:(?:%(n)s)[,\s]+){0,%(index)s})(?:(%(n)s)[,\s]+)(.*!.*)' %
> parameters # regexp are sometimes a nightmare to read
> line = re.sub(pattern, r'\1 '+repl+r'\3' , line)
> break
> outputBuffer += line +'\n'
>
> print outputBuffer

Thanks I will take a look. I think perhaps I was having a very slow
day when I posted and realised I could solve the original problem more
efficiently and the problem wasn't perhaps as I first perceived. It is
enough to match the tag to the right of the "!" sign and use this to
adjust what lies on the left of the "!" sign. Currently I have
this...if anyone thinks there is a neater solution I am happy to hear
it. Many thanks.

variable_tag = 'lai'
variable = [200.0, 60.030, 0.060, 0.030, 0.030]

# generate adjustment string
variable = ",".join(["%s" % i for i in variable]) + ' ! ' +
variable_tag

# call func to adjust input file
adjustStandardPftParams(variable, variable_tag, in_param_fname,
out_param_fname)

and the inside of this func looks like this

def adjustStandardPftParams(self, variable, variable_tag, in_fname,
out_fname):

f = open(in_fname, 'r')
of = open(out_fname, 'w')
pattern_found = False

while True:
line = f.readline()
if not line:
break
pattern = re.findall(r"!\s+"+variable_tag, line)
if pattern:
print 'yes'
print >> of, "%s" % variable
pattern_found = True

if pattern_found:
pattern_found = False
else:
of.write(line)

f.close()
of.close()

return

From: Jean-Michel Pichavant on 15 Feb 2010 09:27

Martin wrote:
> On Feb 15, 2:03 pm, Jean-Michel Pichavant <jeanmic...(a)sequans.com>
> wrote:
>
>> Martin wrote:
>>
>>> Hi,
>>>
>>> I am trying to come up with a more generic scheme to match and replace
>>> a series of regex, which look something like this...
>>>
>>> 19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
>>> 5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
>>>
>>> Ideally match the pattern to the right of the "!" sign (e.g. lai), I
>>> would then like to be able to replace one or all of the corresponding
>>> numbers on the line. So far I have a rather unsatisfactory solution,
>>> any suggestions would be appreciated...
>>>
>>> The file read in is an ascii file.
>>>
>>> f = open(fname, 'r')
>>> s = f.read()
>>>
>>> if CANHT:
>>> s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
>>> canht_ft", CANHT, s)
>>>
>>> where CANHT might be
>>>
>>> CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'
>>>
>>> But this involves me passing the entire string.
>>>
>>> Thanks.
>>>
>>> Martin
>>>
>> I remove all lines containing things like 9*0.0 in your file, cause I
>> don't know what they mean and how to handle them. These are not numbers.
>>
>> import re
>>
>> replace = {
>> 'snow_grnd' : (1, '99.99,'), # replace the 1st number by 99.99
>> 't_soil' : (2, '88.8,'), # replace the 2nd number by 88.88
>> }
>>
>> testBuffer = """
>> 0.749, 0.743, 0.754, 0.759 ! stheta(1:sm_levels)(top to bottom)
>> 0.46 ! snow_grnd
>> 276.78,277.46,278.99,282.48 ! t_soil(1:sm_levels)(top to bottom)
>> 19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
>> 200.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
>> """
>>
>> outputBuffer = ''
>> for line in testBuffer.split('\n'):
>> for key, (index, repl) in replace.items():
>> if key in line:
>> parameters = {
>> 'n' : '[\d\.]+', # given you example you have to change
>> this one, I don't know what means 9*0.0 in your file
>> 'index' : index - 1,
>> }
>> # the following pattern will silently match any digit before
>> the <index>th digit is found, and use a capturing parenthesis for the last
>> pattern =
>> '(\s*(?:(?:%(n)s)[,\s]+){0,%(index)s})(?:(%(n)s)[,\s]+)(.*!.*)' %
>> parameters # regexp are sometimes a nightmare to read
>> line = re.sub(pattern, r'\1 '+repl+r'\3' , line)
>> break
>> outputBuffer += line +'\n'
>>
>> print outputBuffer
>>
>
> Thanks I will take a look. I think perhaps I was having a very slow
> day when I posted and realised I could solve the original problem more
> efficiently and the problem wasn't perhaps as I first perceived. It is
> enough to match the tag to the right of the "!" sign and use this to
> adjust what lies on the left of the "!" sign. Currently I have
> this...if anyone thinks there is a neater solution I am happy to hear
> it. Many thanks.
>
> variable_tag = 'lai'
> variable = [200.0, 60.030, 0.060, 0.030, 0.030]
>
> # generate adjustment string
> variable = ",".join(["%s" % i for i in variable]) + ' ! ' +
> variable_tag
>
> # call func to adjust input file
> adjustStandardPftParams(variable, variable_tag, in_param_fname,
> out_param_fname)
>
> and the inside of this func looks like this
>
> def adjustStandardPftParams(self, variable, variable_tag, in_fname,
> out_fname):
>
> f = open(in_fname, 'r')
> of = open(out_fname, 'w')
> pattern_found = False
>
> while True:
> line = f.readline()
> if not line:
> break
> pattern = re.findall(r"!\s+"+variable_tag, line)
> if pattern:
> print 'yes'
> print >> of, "%s" % variable
> pattern_found = True
>
> if pattern_found:
> pattern_found = False
> else:
> of.write(line)
>
> f.close()
> of.close()
>
> return
>

Are you sure a simple
if variable_tag in line:
# do some stuff

is not enough ?

People will usually prefer to write

for line in open(in_fname, 'r') :

instead of your ugly while loop ;-)

JM

From: Martin on 15 Feb 2010 17:26

On Feb 15, 2:27 pm, Jean-Michel Pichavant <jeanmic...(a)sequans.com>
wrote:
> Martin wrote:
> > On Feb 15, 2:03 pm, Jean-Michel Pichavant <jeanmic...(a)sequans.com>
> > wrote:
>
> >> Martin wrote:
>
> >>> Hi,
>
> >>> I am trying to come up with a more generic scheme to match and replace
> >>> a series ofregex, which look something like this...
>
> >>> 19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
> >>> 5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
>
> >>> Ideally match the pattern to the right of the "!" sign (e.g. lai), I
> >>> would then like to be able to replace one or all of the corresponding
> >>> numbers on the line. So far I have a rather unsatisfactory solution,
> >>> any suggestions would be appreciated...
>
> >>> The file read in is an ascii file.
>
> >>> f = open(fname, 'r')
> >>> s = f.read()
>
> >>> if CANHT:
> >>> s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
> >>> canht_ft", CANHT, s)
>
> >>> where CANHT might be
>
> >>> CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'
>
> >>> But this involves me passing the entire string.
>
> >>> Thanks.
>
> >>> Martin
>
> >> I remove all lines containing things like 9*0.0 in your file, cause I
> >> don't know what they mean and how to handle them. These are not numbers.
>
> >> import re
>
> >> replace = {
> >> 'snow_grnd' : (1, '99.99,'), # replace the 1st number by 99.99
> >> 't_soil' : (2, '88.8,'), # replace the 2nd number by 88.88
> >> }
>
> >> testBuffer = """
> >> 0.749, 0.743, 0.754, 0.759 ! stheta(1:sm_levels)(top to bottom)
> >> 0.46 ! snow_grnd
> >> 276.78,277.46,278.99,282.48 ! t_soil(1:sm_levels)(top to bottom)
> >> 19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
> >> 200.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
> >> """
>
> >> outputBuffer = ''
> >> for line in testBuffer.split('\n'):
> >> for key, (index, repl) in replace.items():
> >> if key in line:
> >> parameters = {
> >> 'n' : '[\d\.]+', # given you example you have to change
> >> this one, I don't know what means 9*0.0 in your file
> >> 'index' : index - 1,
> >> }
> >> # the following pattern will silently match any digit before
> >> the <index>th digit is found, and use a capturing parenthesis for the last
> >> pattern =
> >> '(\s*(?:(?:%(n)s)[,\s]+){0,%(index)s})(?:(%(n)s)[,\s]+)(.*!.*)' %
> >> parameters # regexp are sometimes a nightmare to read
> >> line = re.sub(pattern, r'\1 '+repl+r'\3' , line)
> >> break
> >> outputBuffer += line +'\n'
>
> >> print outputBuffer
>
> > Thanks I will take a look. I think perhaps I was having a very slow
> > day when I posted and realised I could solve the original problem more
> > efficiently and the problem wasn't perhaps as I first perceived. It is
> > enough to match the tag to the right of the "!" sign and use this to
> > adjust what lies on the left of the "!" sign. Currently I have
> > this...if anyone thinks there is a neater solution I am happy to hear
> > it. Many thanks.
>
> > variable_tag = 'lai'
> > variable = [200.0, 60.030, 0.060, 0.030, 0.030]
>
> > # generate adjustment string
> > variable = ",".join(["%s" % i for i in variable]) + ' ! ' +
> > variable_tag
>
> > # call func to adjust input file
> > adjustStandardPftParams(variable, variable_tag, in_param_fname,
> > out_param_fname)
>
> > and the inside of this func looks like this
>
> > def adjustStandardPftParams(self, variable, variable_tag, in_fname,
> > out_fname):
>
> > f = open(in_fname, 'r')
> > of = open(out_fname, 'w')
> > pattern_found = False
>
> > while True:
> > line = f.readline()
> > if not line:
> > break
> > pattern = re.findall(r"!\s+"+variable_tag, line)
> > if pattern:
> > print 'yes'
> > print >> of, "%s" % variable
> > pattern_found = True
>
> > if pattern_found:
> > pattern_found = False
> > else:
> > of.write(line)
>
> > f.close()
> > of.close()
>
> > return
>
> Are you sure a simple
> if variable_tag in line:
> # do some stuff
>
> is not enough ?
>
> People will usually prefer to write
>
> for line in open(in_fname, 'r') :
>
> instead of your ugly while loop ;-)
>
> JM

My while loop is suitably offended. I have changed it as you
suggested...though if I do: if pattern (variable_tag) in line as you
suggested i would in my example correctly pick the tag lai, but also
one called dcatch_lai, which I wouldn't want. No doubt there is an
obvious solution I am again missing!

of = open(out_fname, 'w')
pattern_found = False

for line in open(in_fname, 'r'):
pattern = re.findall(r"!\s+"+variable_tag, line)
if pattern:
print >> of, "%s" % variable
pattern_found = True

if pattern_found:
pattern_found = False
else:
of.write(line)

of.close()

Many Thanks.

First | Prev |
Pages: 1 2
Prev: ANN: Leo 4.7 rc1 released
Next: Configuring apache to execute python scripts using mod_pythonhandler