Parsing GC Log File [Shell]

Prev: Script improvement
Next: object oriented shell scripts

From: stan on 7 Nov 2009 21:13

Ed Morton wrote:
<snip>

>> > > > > Assuming the input is all on one line:
>>
>> > > > > $ cat file
>> > > > > 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K (502464K),
>> > > > > 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 secs]
>>
>> > > > > $ awk '{OFS=", "; gsub(/[^[:digit:].]/," "); $1=$1}1' file
>> > > > > 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, 0.04, 0.02

This looks like the original code.

<snip>

>> > > Try this:
>>
>> > > awk '{OFS=", "; t=substr($0,12,8); $0=substr($0,30);
>> > > � � � � gsub(/[[:digit:].]/," "); $1=$1; print t,$0}' file
>>
>> > > � �Ed.

This looks like your ammended code.

>>
>> > Thanks, Ed!
>>
>> > Well, I'm by no means a shell expert.
>>
>> > Running your command on the file, I get the following output:
>>
>> > 15:00:16, :, [GC, K->, K(, K),, secs]
>>
>> Are you sure you copy/pasted my script instead of retyping it?
>> Are you sure your input file is the same as you posted?
>>
>> Look:
>>
>> $ cat file
>> 2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K),
>> 0.0204170 secs]
>> 2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K),
>> 0.0043760 secs]
>>
>> $ awk '{OFS=", "; t=substr($1,12,8); $0=substr($0,30); gsub(/[^
>> [:digit:].]/," "); $1=$1; print t,$0}' file
>> 15:00:16, 0.405, 2112, 750, 7680, 0.0204170
>> 15:00:17, 0.527, 2862, 1010, 7680, 0.0043760
>>
>> Please post exactly the same commands and their output so we can see
>> where something's going wrong.

I could be wrong, but I noticed what I thought was a missing "^" in
your ammended code. It could be that I only saw a munged reply in the
thread and missed the original reply.

I actually stopped for a couple of minutes when I saw your amended code
because I couldn't figure out how it worked and I typically learn
something nearly every time I read yur code. Local events overcame my
studies and I never got to try it out. My point here is not to call
out others errors; I thought my knowledge was leaking through a hole
and your response actually put a finger in the hole! I wanted to say
thanks.

As I get older I can't distuinguish between senior moments and actual
ignorance. The only bright side is that I can enjoy old movies.g

From: w_a_x_man on 8 Nov 2009 04:37

On Nov 6, 1:39 pm, AyOut <mort...(a)gmail.com> wrote:
> On Nov 4, 10:11 pm, Ed Morton <mortons...(a)gmail.com> wrote:
>
>
>
>
>
> > AyOut wrote:
> > > I have a GC log file with entries like this one:
>
> > > 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
> > > (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
> > > secs]
>
> > > I would like to parse this to output for easy plotting using gnuplot
> > > and would like the following output:
>
> > > 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
> > > 0.04, 0.02
>
> > Assuming the input is all on one line:
>
> > $ cat file
> > 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K (502464K),
> > 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 secs]
>
> > $ awk '{OFS=", "; gsub(/[^[:digit:].]/," "); $1=$1}1' file
> > 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, 0.04, 0.02
>
> > Ed.
>
> That's a beautiful solution! Now, there's a change in the log file
> output. The first field is now a date and time stamp
>
> 2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K), 0.0204170
> secs]
> 2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K),
> 0.0043760 secs]
>
> and applying this command
>
> cat ${gclogfile}|sed 's/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][A-
> Z]:*//'|sed 's/\.[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]:*//'|awk -F:
> '{print (NR==1||(!$1&&$1!=p)?++c:c),$0;p=$1}'
>
> generates the following output:
>
> 15, 00, 16, 0.405, 2112, 750, 7680, 0.0204170
> 15, 00, 17, 0.527, 2862, 1010, 7680, 0.0043760
>
> where the time stamp (15:00:16) shows up as 15, 00, 16. Is there a
> way to have the output look like this:
>
> 15:00:16, 0.405, 2112, 750, 7680, 0.0204170
> 15:00:17, 0.527, 2862, 1010, 7680, 0.0043760
>
> Thanks!

ruby -ne'puts [$_[11,8], $_[30..-1].scan(/[\d.]+/)].join(", ")' file

=== output ===
15:00:16, 0.405, 2112, 750, 7680, 0.0204170
15:00:17, 0.527, 2862, 1010, 7680, 0.0043760

From: Ed Morton on 8 Nov 2009 07:31

stan wrote:
> Ed Morton wrote:
> <snip>
>
>>>>>>> Assuming the input is all on one line:
>>>>>>> $ cat file
>>>>>>> 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K (502464K),
>>>>>>> 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 secs]
>>>>>>> $ awk '{OFS=", "; gsub(/[^[:digit:].]/," "); $1=$1}1' file
>>>>>>> 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, 0.04, 0.02
>
> This looks like the original code.
>
> <snip>
>
>>>>> Try this:
>>>>> awk '{OFS=", "; t=substr($0,12,8); $0=substr($0,30);
>>>>> gsub(/[[:digit:].]/," "); $1=$1; print t,$0}' file
>>>>> Ed.
>
> This looks like your ammended code.
>
>>>> Thanks, Ed!
>>>> Well, I'm by no means a shell expert.
>>>> Running your command on the file, I get the following output:
>>>> 15:00:16, :, [GC, K->, K(, K),, secs]
>>> Are you sure you copy/pasted my script instead of retyping it?
>>> Are you sure your input file is the same as you posted?
>>>
>>> Look:
>>>
>>> $ cat file
>>> 2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K),
>>> 0.0204170 secs]
>>> 2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K),
>>> 0.0043760 secs]
>>>
>>> $ awk '{OFS=", "; t=substr($1,12,8); $0=substr($0,30); gsub(/[^
>>> [:digit:].]/," "); $1=$1; print t,$0}' file
>>> 15:00:16, 0.405, 2112, 750, 7680, 0.0204170
>>> 15:00:17, 0.527, 2862, 1010, 7680, 0.0043760
>>>
>>> Please post exactly the same commands and their output so we can see
>>> where something's going wrong.
>
> I could be wrong, but I noticed what I thought was a missing "^" in
> your ammended code.

You're right, looks like I did drop the "^" in one of my posts.

Ed.

From: Michael Paoli on 8 Nov 2009 14:10

On Nov 4, 6:07 pm, AyOut <morty3e(a)gmail.com> wrote:
> I have a GC log file with entries like this one:
> 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
> (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
> secs]
> I would like to parse this to output for easy plotting using gnuplot
> and would like the following output:
> 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
> 0.04, 0.02
>
> I have tried with a command like this:
> awk '{if($1~/[0-9]+/ && $2=="[GC" && $3=="[PSYoungGen:")printf("%s %s
....

sed -e 's/[^0-9.]/ /g;s/ */ /g;s/^ //;s/ $//;s/ /, /g'

From: Rakesh Sharma on 10 Nov 2009 01:13

On Nov 5, 7:07 am, AyOut <mort...(a)gmail.com> wrote:
> I have a GC log file with entries like this one:
>
> 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
> (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
> secs]
>
> I would like to parse this to output for easy plotting using gnuplot
> and would like the following output:
>
> 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
> 0.04, 0.02
>
> I have tried with a command like this:
>
> awk '{if($1~/[0-9]+/ && $2=="[GC" && $3=="[PSYoungGen:")printf("%s %s
> %s %s %s %s\n", $1,$2,$3,$4,$5,$6)}' gc_20091104_024256_psghlc301.log
> | sed "s/[0-9][0-9]:.*GC \[PSYoungGen: /, /" | sed "s/K.*->/, /" | sed
> "s/K.*(/, /" | sed "s/K)//"
>
> but it jumps over several fields and gives me the following output:
>
> 2.7, 70850, 6800, 502464, 0.0165440
>
> How can I set sed to not look at the last match ( "K(" ), but trigger
> on the first match?
>
> Thanks

You could do this in one go:

perl -lne '$,=", ";print/\d+[.]?(?:\d+)?|[.]\d+/g'
yourfile

perl -lpe '$"=", ";$_="@{[/\d+[.]?(?:\d+)?|[.]\d+/g]}"'
yourfile

--Rakesh

First | Prev |
Pages: 1 2 3
Prev: Script improvement
Next: object oriented shell scripts