Prev: Script improvement
Next: object oriented shell scripts
From: AyOut on 6 Nov 2009 15:29 On Nov 6, 2:09 pm, Ed Morton <mortons...(a)gmail.com> wrote: > On Nov 6, 1:39 pm, AyOut <mort...(a)gmail.com> wrote: > > > > > On Nov 4, 10:11 pm, Ed Morton <mortons...(a)gmail.com> wrote: > > > > AyOut wrote: > > > > I have a GC log file with entries like this one: > > > > > 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K > > > > (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0..02 > > > > secs] > > > > > I would like to parse this to output for easy plotting using gnuplot > > > > and would like the following output: > > > > > 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, > > > > 0.04, 0.02 > > > > Assuming the input is all on one line: > > > > $ cat file > > > 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K (502464K), > > > 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 secs] > > > > $ awk '{OFS=", "; gsub(/[^[:digit:].]/," "); $1=$1}1' file > > > 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, 0.04, 0.02 > > > > Ed. > > > That's a beautiful solution! Now, there's a change in the log file > > output. The first field is now a date and time stamp > > > 2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K), 0.0204170 > > secs] > > 2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K), > > 0.0043760 secs] > > > and applying this command > > > cat ${gclogfile}|sed 's/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][A- > > Z]:*//'|sed 's/\.[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]:*//'|awk -F: > > '{print (NR==1||(!$1&&$1!=p)?++c:c),$0;p=$1}' > > "beautiful solution" discarded apparently! > > > generates the following output: > > > 15, 00, 16, 0.405, 2112, 750, 7680, 0.0204170 > > 15, 00, 17, 0.527, 2862, 1010, 7680, 0.0043760 > > > where the time stamp (15:00:16) shows up as 15, 00, 16. Is there a > > way to have the output look like this: > > > 15:00:16, 0.405, 2112, 750, 7680, 0.0204170 > > 15:00:17, 0.527, 2862, 1010, 7680, 0.0043760 > > > Thanks!- Hide quoted text - > > > - Show quoted text - > > Why do you keep going back to pipelines of cat, sed, and awk? If > you're going to use awk anyway, you don't need sed or cat. > > Try this: > > awk '{OFS=", "; t=substr($0,12,8); $0=substr($0,30); > gsub(/[[:digit:].]/," "); $1=$1; print t,$0}' file > > Ed. Thanks, Ed! Well, I'm by no means a shell expert. Running your command on the file, I get the following output: 15:00:16, :, [GC, K->, K(, K),, secs]
From: Ed Morton on 6 Nov 2009 15:37 On Nov 6, 2:29 pm, AyOut <mort...(a)gmail.com> wrote: > On Nov 6, 2:09 pm, Ed Morton <mortons...(a)gmail.com> wrote: > > > > > > > On Nov 6, 1:39 pm, AyOut <mort...(a)gmail.com> wrote: > > > > On Nov 4, 10:11 pm, Ed Morton <mortons...(a)gmail.com> wrote: > > > > > AyOut wrote: > > > > > I have a GC log file with entries like this one: > > > > > > 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K > > > > > (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 > > > > > secs] > > > > > > I would like to parse this to output for easy plotting using gnuplot > > > > > and would like the following output: > > > > > > 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, > > > > > 0.04, 0.02 > > > > > Assuming the input is all on one line: > > > > > $ cat file > > > > 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K (502464K), > > > > 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 secs] > > > > > $ awk '{OFS=", "; gsub(/[^[:digit:].]/," "); $1=$1}1' file > > > > 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, 0..04, 0.02 > > > > > Ed. > > > > That's a beautiful solution! Now, there's a change in the log file > > > output. The first field is now a date and time stamp > > > > 2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K), 0.0204170 > > > secs] > > > 2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K), > > > 0.0043760 secs] > > > > and applying this command > > > > cat ${gclogfile}|sed 's/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][A- > > > Z]:*//'|sed 's/\.[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]:*//'|awk -F: > > > '{print (NR==1||(!$1&&$1!=p)?++c:c),$0;p=$1}' > > > "beautiful solution" discarded apparently! > > > > generates the following output: > > > > 15, 00, 16, 0.405, 2112, 750, 7680, 0.0204170 > > > 15, 00, 17, 0.527, 2862, 1010, 7680, 0.0043760 > > > > where the time stamp (15:00:16) shows up as 15, 00, 16. Is there a > > > way to have the output look like this: > > > > 15:00:16, 0.405, 2112, 750, 7680, 0.0204170 > > > 15:00:17, 0.527, 2862, 1010, 7680, 0.0043760 > > > > Thanks!- Hide quoted text - > > > > - Show quoted text - > > > Why do you keep going back to pipelines of cat, sed, and awk? If > > you're going to use awk anyway, you don't need sed or cat. > > > Try this: > > > awk '{OFS=", "; t=substr($0,12,8); $0=substr($0,30); > > gsub(/[[:digit:].]/," "); $1=$1; print t,$0}' file > > > Ed. > > Thanks, Ed! > > Well, I'm by no means a shell expert. > > Running your command on the file, I get the following output: > > 15:00:16, :, [GC, K->, K(, K),, secs]- Hide quoted text - > > - Show quoted text - Are you sure you copy/pasted my script instead of retyping it? Are you sure your input file is the same as you posted? Look: $ cat file 2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K), 0.0204170 secs] 2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K), 0.0043760 secs] $ awk '{OFS=", "; t=substr($1,12,8); $0=substr($0,30); gsub(/[^ [:digit:].]/," "); $1=$1; print t,$0}' file 15:00:16, 0.405, 2112, 750, 7680, 0.0204170 15:00:17, 0.527, 2862, 1010, 7680, 0.0043760 Please post exactly the same commands and their output so we can see where something's going wrong. Ed.
From: Ed Morton on 6 Nov 2009 15:43 On Nov 6, 2:37 pm, Ed Morton <mortons...(a)gmail.com> wrote: > On Nov 6, 2:29 pm, AyOut <mort...(a)gmail.com> wrote: > > > > > > > On Nov 6, 2:09 pm, Ed Morton <mortons...(a)gmail.com> wrote: > > > > On Nov 6, 1:39 pm, AyOut <mort...(a)gmail.com> wrote: > > > > > On Nov 4, 10:11 pm, Ed Morton <mortons...(a)gmail.com> wrote: > > > > > > AyOut wrote: > > > > > > I have a GC log file with entries like this one: > > > > > > > 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K > > > > > > (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 > > > > > > secs] > > > > > > > I would like to parse this to output for easy plotting using gnuplot > > > > > > and would like the following output: > > > > > > > 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, > > > > > > 0.04, 0.02 > > > > > > Assuming the input is all on one line: > > > > > > $ cat file > > > > > 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K (502464K), > > > > > 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 secs] > > > > > > $ awk '{OFS=", "; gsub(/[^[:digit:].]/," "); $1=$1}1' file > > > > > 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, 0.04, 0.02 > > > > > > Ed. > > > > > That's a beautiful solution! Now, there's a change in the log file > > > > output. The first field is now a date and time stamp > > > > > 2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K), 0.0204170 > > > > secs] > > > > 2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K), > > > > 0.0043760 secs] > > > > > and applying this command > > > > > cat ${gclogfile}|sed 's/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][A- > > > > Z]:*//'|sed 's/\.[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]:*//'|awk -F: > > > > '{print (NR==1||(!$1&&$1!=p)?++c:c),$0;p=$1}' > > > > "beautiful solution" discarded apparently! > > > > > generates the following output: > > > > > 15, 00, 16, 0.405, 2112, 750, 7680, 0.0204170 > > > > 15, 00, 17, 0.527, 2862, 1010, 7680, 0.0043760 > > > > > where the time stamp (15:00:16) shows up as 15, 00, 16. Is there a > > > > way to have the output look like this: > > > > > 15:00:16, 0.405, 2112, 750, 7680, 0.0204170 > > > > 15:00:17, 0.527, 2862, 1010, 7680, 0.0043760 > > > > > Thanks!- Hide quoted text - > > > > > - Show quoted text - > > > > Why do you keep going back to pipelines of cat, sed, and awk? If > > > you're going to use awk anyway, you don't need sed or cat. > > > > Try this: > > > > awk '{OFS=", "; t=substr($0,12,8); $0=substr($0,30); > > > gsub(/[[:digit:].]/," "); $1=$1; print t,$0}' file > > > > Ed. > > > Thanks, Ed! > > > Well, I'm by no means a shell expert. > > > Running your command on the file, I get the following output: > > > 15:00:16, :, [GC, K->, K(, K),, secs]- Hide quoted text - > > > - Show quoted text - > > Are you sure you copy/pasted my script instead of retyping it? > Are you sure your input file is the same as you posted? > > Look: > > $ cat file > 2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K), > 0.0204170 secs] > 2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K), > 0.0043760 secs] > > $ awk '{OFS=", "; t=substr($1,12,8); $0=substr($0,30); gsub(/[^ > [:digit:].]/," "); $1=$1; print t,$0}' file > 15:00:16, 0.405, 2112, 750, 7680, 0.0204170 > 15:00:17, 0.527, 2862, 1010, 7680, 0.0043760 > > Please post exactly the same commands and their output so we can see > where something's going wrong. > > Ed.- Hide quoted text - > > - Show quoted text - Hint: check if you mistyped the gsub() as gsub(/[[:digit:].]/," ") instead of what I had: gsub(/[^[:digit:].]/," ") Note the "^". Regards, Ed.
From: AyOut on 6 Nov 2009 15:44 On Nov 6, 2:37 pm, Ed Morton <mortons...(a)gmail.com> wrote: > On Nov 6, 2:29 pm, AyOut <mort...(a)gmail.com> wrote: > > > > > On Nov 6, 2:09 pm, Ed Morton <mortons...(a)gmail.com> wrote: > > > > On Nov 6, 1:39 pm, AyOut <mort...(a)gmail.com> wrote: > > > > > On Nov 4, 10:11 pm, Ed Morton <mortons...(a)gmail.com> wrote: > > > > > > AyOut wrote: > > > > > > I have a GC log file with entries like this one: > > > > > > > 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K > > > > > > (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 > > > > > > secs] > > > > > > > I would like to parse this to output for easy plotting using gnuplot > > > > > > and would like the following output: > > > > > > > 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, > > > > > > 0.04, 0.02 > > > > > > Assuming the input is all on one line: > > > > > > $ cat file > > > > > 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K (502464K), > > > > > 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 secs] > > > > > > $ awk '{OFS=", "; gsub(/[^[:digit:].]/," "); $1=$1}1' file > > > > > 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, 0.04, 0.02 > > > > > > Ed. > > > > > That's a beautiful solution! Now, there's a change in the log file > > > > output. The first field is now a date and time stamp > > > > > 2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K), 0.0204170 > > > > secs] > > > > 2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K), > > > > 0.0043760 secs] > > > > > and applying this command > > > > > cat ${gclogfile}|sed 's/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][A- > > > > Z]:*//'|sed 's/\.[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]:*//'|awk -F: > > > > '{print (NR==1||(!$1&&$1!=p)?++c:c),$0;p=$1}' > > > > "beautiful solution" discarded apparently! > > > > > generates the following output: > > > > > 15, 00, 16, 0.405, 2112, 750, 7680, 0.0204170 > > > > 15, 00, 17, 0.527, 2862, 1010, 7680, 0.0043760 > > > > > where the time stamp (15:00:16) shows up as 15, 00, 16. Is there a > > > > way to have the output look like this: > > > > > 15:00:16, 0.405, 2112, 750, 7680, 0.0204170 > > > > 15:00:17, 0.527, 2862, 1010, 7680, 0.0043760 > > > > > Thanks!- Hide quoted text - > > > > > - Show quoted text - > > > > Why do you keep going back to pipelines of cat, sed, and awk? If > > > you're going to use awk anyway, you don't need sed or cat. > > > > Try this: > > > > awk '{OFS=", "; t=substr($0,12,8); $0=substr($0,30); > > > gsub(/[[:digit:].]/," "); $1=$1; print t,$0}' file > > > > Ed. > > > Thanks, Ed! > > > Well, I'm by no means a shell expert. > > > Running your command on the file, I get the following output: > > > 15:00:16, :, [GC, K->, K(, K),, secs]- Hide quoted text - > > > - Show quoted text - > > Are you sure you copy/pasted my script instead of retyping it? > Are you sure your input file is the same as you posted? > > Look: > > $ cat file > 2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K), > 0.0204170 secs] > 2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K), > 0.0043760 secs] > > $ awk '{OFS=", "; t=substr($1,12,8); $0=substr($0,30); gsub(/[^ > [:digit:].]/," "); $1=$1; print t,$0}' file > 15:00:16, 0.405, 2112, 750, 7680, 0.0204170 > 15:00:17, 0.527, 2862, 1010, 7680, 0.0043760 > > Please post exactly the same commands and their output so we can see > where something's going wrong. > > Ed. My bad! I lost the ^ in the copy/past. Thanks, Ed!
From: Ben Bacarisse on 6 Nov 2009 15:54
AyOut <morty3e(a)gmail.com> writes: > I have a GC log file with entries like this one: > > 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K > (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 > secs] > > I would like to parse this to output for easy plotting using gnuplot > and would like the following output: > > 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, > 0.04, 0.02 If you can live without the spaces: tr -sc '0-9.' , or (since gnuplot won't mind): tr -sc '0-9.' ' ' <snip> -- Ben. |