Prev: Acai Berry France | Perdre 5 Kilos en 2 Semaines | Essai Gratuit
Next: Why does kompozer have such a high cpu usage when it is sitting idle?
From: Ryan Chan on 20 Dec 2009 11:01 Hello, Consider the case: You have 200 lines of mapping to replace, in a csv format, e.g. apple,orange boy,girl .... You have a 500MB file, you want to replace all 200 lines of mapping, what would be the most efficient way to do it? Thanks.
From: The Natural Philosopher on 20 Dec 2009 11:10 Ryan Chan wrote: > Hello, > > Consider the case: > > You have 200 lines of mapping to replace, in a csv format, e.g. > > apple,orange > boy,girl > ... > > You have a 500MB file, you want to replace all 200 lines of mapping, > what would be the most efficient way to do it? > > Thanks. replace what with what?
From: pk on 20 Dec 2009 11:07 Ryan Chan wrote: > Consider the case: > > You have 200 lines of mapping to replace, in a csv format, e.g. > > apple,orange > boy,girl > ... > > You have a 500MB file, you want to replace all 200 lines of mapping, > what would be the most efficient way to do it? Not sure about "most efficient", but with awk you can do all of that in a single pass (almost) over the data: awk -F, 'NR==FNR{a[$1]=$2;next} {for(i in a)gsub(i,a[i]); print}' mapfile datafile However, that has at least two problems, which may or may not be relevant for your scenario: 1) Does not know about "words", so if "pineapple" appears in the data, it will become "pineorange"; 2) assumes that all the strings don't contain regex metacharacters, and that will likely produce wrong outcomes if one of the words to replace is, say "a.*b" or similar.
From: John Hasler on 20 Dec 2009 11:21 man sed -- John Hasler jhasler(a)newsguy.com Dancing Horse Hill Elmwood, WI USA
From: Chris Davies on 20 Dec 2009 14:55
Ryan Chan <ryanchan404(a)gmail.com> wrote: > You have 200 lines of mapping to replace, in a csv format, e.g. > You have a 500MB file, you want to replace all 200 lines of mapping, > what would be the most efficient way to do it? Define "efficiency". For example, is this a one-off, and you want to make most efficient use of your people resources. Or perhaps it's going to run multiple times per hour, so you want to have someone spend a significant amount of time working out and implementing a scheme that runs in as short a time as is realistically possible. Chris |