Best way to replace a set of strings in large files? [General Linux]

Prev: Acai Berry France | Perdre 5 Kilos en 2 Semaines | Essai Gratuit
Next: Why does kompozer have such a high cpu usage when it is sitting idle?

From: Rui Maciel on 21 Dec 2009 16:33

Ryan Chan wrote:

> Hello,
>
> Consider the case:
>
> You have 200 lines of mapping to replace, in a csv format, e.g.
>
> apple,orange
> boy,girl
> ...
>
> You have a 500MB file, you want to replace all 200 lines of mapping,
> what would be the most efficient way to do it?

You could try awk. It doesn't hurt.

Rui Maciel

From: John Hasler on 21 Dec 2009 16:59

Rui Maciel writes:
> You could try awk. It doesn't hurt.

I don't know... it's what my parrot says when something hurts.
--
John "Awk: bailing out near line 10" Hasler
jhasler(a)newsguy.com
Dancing Horse Hill
Elmwood, WI USA

From: Ryan Chan on 22 Dec 2009 11:35

On Dec 22, 3:04 am, unruh <un...(a)wormhole.physics.ubc.ca> wrote:

> Why run it multiple times? sed or even ed can run as many commands as
> you like in a single invocation.
>

Seems I found the answer, not sure if it is exactly what you said.

Do you mean to use a sed script file, such as

1,/^END/{

s/1/a/g
s/2/b/g

}

and replace by

sed -f replace.sed source.txt

From: hongkonger on 25 Dec 2009 21:32

This is exactly what you want.

http://www.linuxask.com/questions/replace-multiple-strings-using-sed

On 12æ21æ¥, ä¸å12æ07å, pk <p....(a)pk.invalid> wrote:
> Ryan Chan wrote:
> > Consider the case:
>
> > You have 200 lines of mapping to replace, in a csv format, e.g.
>
> > apple,orange
> > boy,girl
> > ...
>
> > You have a 500MB file, you want to replace all 200 lines of mapping,
> > what would be the most efficient way to do it?
>
> Not sure about "most efficient", but with awk you can do all of that in a
> single pass (almost) over the data:
>
> awk -F, 'NR==FNR{a[$1]=$2;next}
> {for(i in a)gsub(i,a[i]); print}' mapfile datafile
>
> However, that has at least two problems, which may or may not be relevant
> for your scenario:
>
> 1) Does not know about "words", so if "pineapple" appears in the data, it
> will become "pineorange";
>
> 2) assumes that all the strings don't contain regex metacharacters, and that
> will likely produce wrong outcomes if one of the words to replace is, say
> "a.*b" or similar.

First | Prev |
Pages: 1 2 3
Prev: Acai Berry France | Perdre 5 Kilos en 2 Semaines | Essai Gratuit
Next: Why does kompozer have such a high cpu usage when it is sitting idle?