From: Rahul on
I have two log files foo and bar which are updated continuously as user-
jobs start and end. The ultimate logfile can be constructed by a join on a
common keyword field in files foo and bar like so:

join -t ';' -j 1 /tmp/foo /tmp/bar > /tmp/composite_log

Is there a way to keep a always up-to-date composite_log without having to
manually run the join command at intervals? I mean, I could cron the join
command and get a composite_log with a fairly good granularity but that
seems brute-force.

Is there a way to somehow "tie" the file composite_log to the files foo and
bar so that it will automatically update on any changes to foo / bar? I am
reminded of a "view" in database paralance. Is there a similar construct
for a "derived" file?


--
Rahul
From: Keith Keller on
On 2010-06-04, Rahul <nospam(a)nospam.invalid> wrote:
> Is there a way to somehow "tie" the file composite_log to the files foo and
> bar so that it will automatically update on any changes to foo / bar? I am
> reminded of a "view" in database paralance. Is there a similar construct
> for a "derived" file?

There is much begging of questions here, but let's get the easy question
out of the way: there's nothing comparable to an SQL view. There's a
symlink, but that's only a view on one file, not on combined files.
One could probably use some combination of FIFOs to approximate this
construct (but I'm not sure exactly what that'd look like).

> I have two log files foo and bar which are updated continuously as user-
> jobs start and end. The ultimate logfile can be constructed by a join on a
> common keyword field in files foo and bar like so:
>
> join -t ';' -j 1 /tmp/foo /tmp/bar > /tmp/composite_log

Can the processes that log to /tmp/foo and /tmp/bar simply also log to
/tmp/composite_log? Or better still, use the logger program or
equivalent to log their output to syslog appropriately? That seems like
a cleaner solution than trying to combine the logfiles after the fact.

> Is there a way to keep a always up-to-date composite_log without having to
> manually run the join command at intervals? I mean, I could cron the join
> command and get a composite_log with a fairly good granularity but that
> seems brute-force.

You could try to use logrotate for this. logrotate could rotate the foo
and bar logs (to foo.1 and bar.1), then you could use join to join foo.1
and bar.1 and concatenate that output to the end of composite_log. But
if the processes that write the foo and bar logs keep their logfiles
open while running you'd need to have them close and reopen their logs
(similarly to apachectl graceful; other programs use SIGHUP for this
purpose).

--keith

--
kkeller-usenet(a)wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt
see X- headers for PGP signature information

From: Rahul on
Keith Keller <kkeller-usenet(a)wombat.san-francisco.ca.us> wrote in
news:bufmd7x7i.ln2(a)goaway.wombat.san-francisco.ca.us:

Thanks Keith!

> Can the processes that log to /tmp/foo and /tmp/bar simply also log to
> /tmp/composite_log?

I don't think so. At least not easily. Here are more details:

A line gets added to file foo whenever a job is submitted. The first
field is a unique job id. The other fields are more job specific details.

A line gets added to file bar whenever a job starts running. The first
field is again a unique job id. The other fields are time-job-started-
running etc.

The job id being unique I can do a subsequent join. Now I guess I could
have the script writing to bar grep for the line with the same jobid in
foo and then update that specific line.

But to me this approach seemed less clean. Maybe you have some comments?


Example foo:

53110.euadmin; 06/03/10_22:57; 1275623865; rpnabar;
53107.euadmin; 06/03/10_22:57; 1275623823; stotz;

Example bar:
53110.euadmin; 06/03/10_23:27; low;

Example composite:
53110.euadmin; 06/03/10_22:57; 1275623865; rpnabar; 06/03/10_23:27;low;




--
Rahul
From: Keith Keller on
On 2010-06-04, Rahul <nospam(a)nospam.invalid> wrote:
> Keith Keller <kkeller-usenet(a)wombat.san-francisco.ca.us> wrote in
> news:bufmd7x7i.ln2(a)goaway.wombat.san-francisco.ca.us:
>
>> Can the processes that log to /tmp/foo and /tmp/bar simply also log to
>> /tmp/composite_log?
>
> I don't think so. At least not easily. Here are more details:
>
> A line gets added to file foo whenever a job is submitted. The first
> field is a unique job id. The other fields are more job specific details.

You say "a line gets added". By what? A shell script? Some other
scripting language? A binary program? If it's either of the first two
adding your own logging logic should be fairly easy, and if it's a
binary (that I'm guessing you're not able to modify) even then it might
have options to control how output (in general, not just stdout) is
processed.

I still think it'd be helpful to know more about the processes that
create foo and bar. Without knowing you risk a suboptimal solution.

> The job id being unique I can do a subsequent join. Now I guess I could
> have the script writing to bar grep for the line with the same jobid in
> foo and then update that specific line.

You could add logic to only update composite_log if the job is still
running. Then, your program to generate composite_log could look at the
last jobId and only process lines in foo and bar that a) are greater
than the last jobId from composite_log, and b) note that a job has
finished. I don't think that's very elegant though, and if your jobIds
are not strictly increasing this won't work at all.

Another kind of stupid way to do it would be to create some sort of FIFO
that pipes to the logger program. (You want to avoid the scenario of
e.g.

tail -f /tmp/foo >> /tmp/composite_log &
tail -f /tmp/bar >> /tmp/composite_log &

because composite_log risks becoming a huge mess.)

--keith


--
kkeller-usenet(a)wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt
see X- headers for PGP signature information

From: Maxwell Lol on
Rahul <nospam(a)nospam.invalid> writes:

> I have two log files foo and bar which are updated continuously as user-
> jobs start and end. The ultimate logfile can be constructed by a join on a
> common keyword field in files foo and bar like so:
>
> join -t ';' -j 1 /tmp/foo /tmp/bar > /tmp/composite_log
>
> Is there a way to keep a always up-to-date composite_log without having to
> manually run the join command at intervals?




I have not used it, but people who do a lot of event correlation in
logfiles like a package called splunk. There is a commercial and a
free version.


As I understand it, you can do queries into a database and extract log
data in real time, giving you effectively composite logs.