Parsing GC Log File [Shell]

Prev: Script improvement
Next: object oriented shell scripts

From: AyOut on 4 Nov 2009 21:07

I have a GC log file with entries like this one:

2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
(502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
secs]

I would like to parse this to output for easy plotting using gnuplot
and would like the following output:

2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
0.04, 0.02

I have tried with a command like this:

awk '{if($1~/[0-9]+/ && $2=="[GC" && $3=="[PSYoungGen:")printf("%s %s
%s %s %s %s\n", $1,$2,$3,$4,$5,$6)}' gc_20091104_024256_psghlc301.log
| sed "s/[0-9][0-9]:.*GC \[PSYoungGen: /, /" | sed "s/K.*->/, /" | sed
"s/K.*(/, /" | sed "s/K)//"

but it jumps over several fields and gives me the following output:

2.7, 70850, 6800, 502464, 0.0165440

How can I set sed to not look at the last match ( "K(" ), but trigger
on the first match?

Thanks

From: Ed Morton on 4 Nov 2009 23:11

AyOut wrote:
> I have a GC log file with entries like this one:
>
> 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
> (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
> secs]
>
> I would like to parse this to output for easy plotting using gnuplot
> and would like the following output:
>
> 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
> 0.04, 0.02

Assuming the input is all on one line:

$ cat file
2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K (502464K),
0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 secs]

$ awk '{OFS=", "; gsub(/[^[:digit:].]/," "); $1=$1}1' file
2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, 0.04, 0.02

Ed.

From: Kaz Kylheku on 5 Nov 2009 00:39

On 2009-11-05, AyOut <morty3e(a)gmail.com> wrote:
> I have a GC log file with entries like this one:
>
> 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
> (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
> secs]
>
> I would like to parse this to output for easy plotting using gnuplot
> and would like the following output:
>
> 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
> 0.04, 0.02

Kaz's txr utility to the rescue.

txr -c '@(collect)
@num: [GC [PSYoungGen: @{size1}K->@{size2}K(@{size3}K)] @{size4}K->@{size5}K (@{size6}K), @secs secs] [Times: user=(a)utime sys=(a)systime, real=(a)realtime secs]
@(end)
@(output)
@(repeat)
@num, @size1, @size2, @size3, @size4, @size5, @size6, @secs, @utime, @systime, @realtime
@(end)
@(end)
' logfile

www.nongnu.org/txr

From: AyOut on 6 Nov 2009 14:39

On Nov 4, 10:11 pm, Ed Morton <mortons...(a)gmail.com> wrote:
> AyOut wrote:
> > I have a GC log file with entries like this one:
>
> > 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
> > (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
> > secs]
>
> > I would like to parse this to output for easy plotting using gnuplot
> > and would like the following output:
>
> > 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
> > 0.04, 0.02
>
> Assuming the input is all on one line:
>
> $ cat file
> 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K (502464K),
> 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 secs]
>
> $ awk '{OFS=", "; gsub(/[^[:digit:].]/," "); $1=$1}1' file
> 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, 0.04, 0..02
>
> Ed.

That's a beautiful solution! Now, there's a change in the log file
output. The first field is now a date and time stamp

2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K), 0.0204170
secs]
2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K),
0.0043760 secs]

and applying this command

cat ${gclogfile}|sed 's/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][A-
Z]:*//'|sed 's/\.[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]:*//'|awk -F:
'{print (NR==1||(!$1&&$1!=p)?++c:c),$0;p=$1}'

generates the following output:

15, 00, 16, 0.405, 2112, 750, 7680, 0.0204170
15, 00, 17, 0.527, 2862, 1010, 7680, 0.0043760

where the time stamp (15:00:16) shows up as 15, 00, 16. Is there a
way to have the output look like this:

15:00:16, 0.405, 2112, 750, 7680, 0.0204170
15:00:17, 0.527, 2862, 1010, 7680, 0.0043760

Thanks!

From: Ed Morton on 6 Nov 2009 15:09

On Nov 6, 1:39 pm, AyOut <mort...(a)gmail.com> wrote:
> On Nov 4, 10:11 pm, Ed Morton <mortons...(a)gmail.com> wrote:
>
>
>
>
>
> > AyOut wrote:
> > > I have a GC log file with entries like this one:
>
> > > 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
> > > (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
> > > secs]
>
> > > I would like to parse this to output for easy plotting using gnuplot
> > > and would like the following output:
>
> > > 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
> > > 0.04, 0.02
>
> > Assuming the input is all on one line:
>
> > $ cat file
> > 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K (502464K),
> > 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 secs]
>
> > $ awk '{OFS=", "; gsub(/[^[:digit:].]/," "); $1=$1}1' file
> > 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, 0.04, 0.02
>
> > Ed.
>
> That's a beautiful solution! Now, there's a change in the log file
> output. The first field is now a date and time stamp
>
> 2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K), 0.0204170
> secs]
> 2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K),
> 0.0043760 secs]
>
> and applying this command
>
> cat ${gclogfile}|sed 's/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][A-
> Z]:*//'|sed 's/\.[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]:*//'|awk -F:
> '{print (NR==1||(!$1&&$1!=p)?++c:c),$0;p=$1}'

"beautiful solution" discarded apparently!

> generates the following output:
>
> 15, 00, 16, 0.405, 2112, 750, 7680, 0.0204170
> 15, 00, 17, 0.527, 2862, 1010, 7680, 0.0043760
>
> where the time stamp (15:00:16) shows up as 15, 00, 16. Is there a
> way to have the output look like this:
>
> 15:00:16, 0.405, 2112, 750, 7680, 0.0204170
> 15:00:17, 0.527, 2862, 1010, 7680, 0.0043760
>
> Thanks!- Hide quoted text -
>
> - Show quoted text -

Why do you keep going back to pipelines of cat, sed, and awk? If
you're going to use awk anyway, you don't need sed or cat.

Try this:

awk '{OFS=", "; t=substr($0,12,8); $0=substr($0,30);
gsub(/[[:digit:].]/," "); $1=$1; print t,$0}' file

Ed.

| Next | Last
Pages: 1 2 3
Prev: Script improvement
Next: object oriented shell scripts