Prev: Updating a module level shared dictionary
Next: Overriding "__setattr__" of a module - possible?
From: MRAB on 15 Jun 2010 22:06 187braintrust(a)berkeley.edu wrote: > I am trying to write a program in Python that will edit .txt log files > that contain regression output from R. Any thoughts or suggestions > would be greatly appreciated. > > To get an idea of what I am trying to do, note that I include fixed > effects in the R regressions, resulting in hundreds of extra lines per > regression which I am not interested in right now. Basically, I want to > save a shortened version of the .txt files in which the blocks of fixed > effects coefficients are replaced by a line that says includes fixed > effects for whatever variable it is. > > All the lines that are to be deleted start with the same six characters > -- 'factor(xyz)' where xyz is the variable name -- so my idea is to have > Python copy each line to a new file if the first six characters do not > match 'factor('. > > That part I at least know how to approach. However, I am not sure how > to approach adding the line that says, "includes fixed effects for xyz." > The problem I am having is how to approach the following: > > > 1. In the resulting file, I will be skipping blocks of lines, say > anywhere from 10 to 500 or so, and inserting one line -- i.e., > whether it inserts the line needs to depend on whether it's the > first line or one of the remaining 499 lines. > > 2. the xyz variable name is different lengths depending on what > variable it is. For example, one block might be 'state' and another > block might be 'yr'. Maybe I can use the fact that the var name > starts after the first '(' and ends at the first ')' in the line? I > think I can use the re module for this? > > > Any suggestions on any aspect of this, but especially the latter part, > would be greatly appreciated. Thank you. > How's this: input_file = open(input_path) output_file = open(output_path, "w") for line in input_file: if line.startswith("factor("): open_paren = line.find("(") close_paren = line.find(")") variable = line[open_paren + 1 : close_paren] output_file.write("*** Factors for %s ***\n" % variable) prefix = line[ : close_paren + 1] while line.startswith(prefix): line = input_file.readline() output_file.write(line) input_file.close() output_file.close()
From: Terry Reedy on 16 Jun 2010 00:44 On 6/15/2010 9:28 PM, 187braintrust(a)berkeley.edu wrote: > I am trying to write a program in Python that will edit .txt log files > that contain regression output from R. Any thoughts or suggestions > would be greatly appreciated. I once wrote programs (in C, Python should be easier) to process bmdp statistical output. I usually took a state machine approach, where each state corresponded to a block of output. The state machine was hard coded rather than table driven. For each block, read a line, analyze it to determine whether it signals a transition, and process as appropriate to the state. When states followed in strict sequence A > B > C, etc, the code was pretty easy, with a block of code, typically with a loop, for each state. When loops were possible, then a master loop with 'if state is A:' conditionals for each. Good luck. Terry Jan Reedy
From: 187braintrust on 16 Jun 2010 15:29 MRAB <python <at> mrabarnett.plus.com> writes: > input_file = open(input_path) > output_file = open(output_path, "w") > for line in input_file: > if line.startswith("factor("): > open_paren = line.find("(") > close_paren = line.find(")") > variable = line[open_paren + 1 : close_paren] > output_file.write("*** Factors for %s ***\n" % variable) > prefix = line[ : close_paren + 1] > while line.startswith(prefix): > line = input_file.readline() > output_file.write(line) > input_file.close() > output_file.close() This code is very helpful. Thank you. I have been working with it, but encounter an error that I thought I should mention (line = input_file.readline() ValueError: Mixing iteration and read methods would lose data).
From: MRAB on 16 Jun 2010 19:35 187braintrust(a)berkeley.edu wrote: > From: MRAB <python(a)mrabarnett.plus.com > <mailto:python(a)mrabarnett.plus.com>> > To: python-list(a)python.org <mailto:python-list(a)python.org> > Date: Wed, 16 Jun 2010 03:06:58 +0100 > Subject: Re: Python editing .txt file > 187braintrust(a)berkeley.edu <mailto:187braintrust(a)berkeley.edu> wrote: > > I am trying to write a program in Python that will edit .txt log > files that contain regression output from R. Any thoughts or > suggestions would be greatly appreciated. > > > How's this: > > input_file = open(input_path) > output_file = open(output_path, "w") > for line in input_file: > if line.startswith("factor("): > open_paren = line.find("(") > close_paren = line.find(")") > variable = line[open_paren + 1 : close_paren] > output_file.write("*** Factors for %s ***\n" % variable) > prefix = line[ : close_paren + 1] > while line.startswith(prefix): > line = input_file.readline() > output_file.write(line) > input_file.close() > output_file.close() > > > Thank you very much for your reply. This code works perfectly, except > that the "line = input_file.readline()" part causes an error: > "ValueError: Mixing iteration and read methods would lose data." I've > been working on figuring out a work around. Do you have any ideas? > You could try replacing: line = input_file.readline() with: line = input_file.next()
|
Pages: 1 Prev: Updating a module level shared dictionary Next: Overriding "__setattr__" of a module - possible? |