From: MRAB on
187braintrust(a)berkeley.edu wrote:
> I am trying to write a program in Python that will edit .txt log files
> that contain regression output from R. Any thoughts or suggestions
> would be greatly appreciated.
>
> To get an idea of what I am trying to do, note that I include fixed
> effects in the R regressions, resulting in hundreds of extra lines per
> regression which I am not interested in right now. Basically, I want to
> save a shortened version of the .txt files in which the blocks of fixed
> effects coefficients are replaced by a line that says includes fixed
> effects for whatever variable it is.
>
> All the lines that are to be deleted start with the same six characters
> -- 'factor(xyz)' where xyz is the variable name -- so my idea is to have
> Python copy each line to a new file if the first six characters do not
> match 'factor('.
>
> That part I at least know how to approach. However, I am not sure how
> to approach adding the line that says, "includes fixed effects for xyz."
> The problem I am having is how to approach the following:
>
>
> 1. In the resulting file, I will be skipping blocks of lines, say
> anywhere from 10 to 500 or so, and inserting one line -- i.e.,
> whether it inserts the line needs to depend on whether it's the
> first line or one of the remaining 499 lines.
>
> 2. the xyz variable name is different lengths depending on what
> variable it is. For example, one block might be 'state' and another
> block might be 'yr'. Maybe I can use the fact that the var name
> starts after the first '(' and ends at the first ')' in the line? I
> think I can use the re module for this?
>
>
> Any suggestions on any aspect of this, but especially the latter part,
> would be greatly appreciated. Thank you.
>
How's this:

input_file = open(input_path)
output_file = open(output_path, "w")
for line in input_file:
if line.startswith("factor("):
open_paren = line.find("(")
close_paren = line.find(")")
variable = line[open_paren + 1 : close_paren]
output_file.write("*** Factors for %s ***\n" % variable)
prefix = line[ : close_paren + 1]
while line.startswith(prefix):
line = input_file.readline()
output_file.write(line)
input_file.close()
output_file.close()

From: Terry Reedy on
On 6/15/2010 9:28 PM, 187braintrust(a)berkeley.edu wrote:
> I am trying to write a program in Python that will edit .txt log files
> that contain regression output from R. Any thoughts or suggestions
> would be greatly appreciated.

I once wrote programs (in C, Python should be easier) to process bmdp
statistical output. I usually took a state machine approach, where each
state corresponded to a block of output. The state machine was hard
coded rather than table driven.

For each block, read a line, analyze it to determine whether it signals
a transition, and process as appropriate to the state.

When states followed in strict sequence A > B > C, etc, the code was
pretty easy, with a block of code, typically with a loop, for each
state. When loops were possible, then a master loop with 'if state is
A:' conditionals for each.

Good luck.

Terry Jan Reedy



From: 187braintrust on
MRAB <python <at> mrabarnett.plus.com> writes:
> input_file = open(input_path)
> output_file = open(output_path, "w")
> for line in input_file:
> if line.startswith("factor("):
> open_paren = line.find("(")
> close_paren = line.find(")")
> variable = line[open_paren + 1 : close_paren]
> output_file.write("*** Factors for %s ***\n" % variable)
> prefix = line[ : close_paren + 1]
> while line.startswith(prefix):
> line = input_file.readline()
> output_file.write(line)
> input_file.close()
> output_file.close()

This code is very helpful. Thank you. I have been working with it, but
encounter an error that I thought I should mention (line = input_file.readline()
ValueError: Mixing iteration and read methods would lose data).

From: MRAB on
187braintrust(a)berkeley.edu wrote:
> From: MRAB <python(a)mrabarnett.plus.com
> <mailto:python(a)mrabarnett.plus.com>>
> To: python-list(a)python.org <mailto:python-list(a)python.org>
> Date: Wed, 16 Jun 2010 03:06:58 +0100
> Subject: Re: Python editing .txt file
> 187braintrust(a)berkeley.edu <mailto:187braintrust(a)berkeley.edu> wrote:
>
> I am trying to write a program in Python that will edit .txt log
> files that contain regression output from R. Any thoughts or
> suggestions would be greatly appreciated.
>
>
> How's this:
>
> input_file = open(input_path)
> output_file = open(output_path, "w")
> for line in input_file:
> if line.startswith("factor("):
> open_paren = line.find("(")
> close_paren = line.find(")")
> variable = line[open_paren + 1 : close_paren]
> output_file.write("*** Factors for %s ***\n" % variable)
> prefix = line[ : close_paren + 1]
> while line.startswith(prefix):
> line = input_file.readline()
> output_file.write(line)
> input_file.close()
> output_file.close()
>
>
> Thank you very much for your reply. This code works perfectly, except
> that the "line = input_file.readline()" part causes an error:
> "ValueError: Mixing iteration and read methods would lose data." I've
> been working on figuring out a work around. Do you have any ideas?
>
You could try replacing:

line = input_file.readline()

with:

line = input_file.next()