From: Nobody on 6 Jun 2010 15:55 On Sat, 05 Jun 2010 16:35:42 +0100, MRAB wrote: >>> In plain language what I wish to do is: >>> >>> Remove all comma's >>> Replace all @ with comma's >> input_file = open("some_huge_file.txt", "r") >> output_file = open("newfilename.txt", "w") >> for line in input_file: > I'd probably process it in larger chunks: > > CHUNK_SIZE = 1024 ** 2 # 1MB at a time > input_file = open("some_huge_file.txt", "r") > output_file = open("newfilename.txt", "w") > while True: > chunk = input_file.read(CHUNK_SIZE) This is fine for the exact problem at hand. The moment the problem evolves into replacing a sequence of two or more characters, processing line-by-line eliminates the problem where the chunk boundary occurs in the middle of the sequence.
From: hiral on 9 Jun 2010 06:27 On Jun 6, 7:27 am, Steve <vvw...(a)googlemail.com> wrote: > On 5 June, 08:53, Steve <vvw...(a)googlemail.com> wrote: > > > I am new to Python and am wanting to replace characters in a very > > large text file.....6 GB > > In plain language what I wish to do is: > > > Remove all comma's > > Replace all @ with comma's > > Save as a new file. > > > Any of you clever people know the best way to do this......idiot guide > > please. > > > Thanks > > > Steve > > Many thanks for your suggestions. > > sed -i 's/Hello/hello/g' file > > Run twice on the CL..with the hello's changed for my needs did it in a > few minutes , > > Again thanks > > Steve Hi Steve, You can do... sed "s/,//g" <your_file> | sed "s/@/,/g" > <new_file> Thank you.
From: Tim Chase on 9 Jun 2010 07:00 On 06/09/2010 05:27 AM, hiral wrote: > On Jun 6, 7:27 am, Steve<vvw...(a)googlemail.com> wrote: >> On 5 June, 08:53, Steve<vvw...(a)googlemail.com> wrote: >>> Remove all comma's >>> Replace all @ with comma's >>> Save as a new file. >> >> Many thanks for your suggestions. >> >> sed -i 's/Hello/hello/g' file >> >> Run twice on the CL..with the hello's changed for my needs did it in a >> few minutes , > > You can do... > > sed "s/,//g"<your_file> | sed "s/@/,/g"> <new_file> No need to use 2 sed processes: sed 's/,//g;y/@/,/' your_file > new_file (you could use "s/@/,/g" as well, but the internal implementation of the transliterate "y" should be a lot faster) -tkc
First
|
Prev
|
Pages: 1 2 Prev: Plotting in batch with no display Next: Python treats non-breaking space wrong? |