From: Tom Lane on
Andrew Dunstan <andrew(a)dunslane.net> writes:
> The buildfarm is now going on six years old (time flies when you're
> having fun!) and the database is now rather large - around 76Gb on disk.
> We'd like to reduce that quite a lot, especially by purging out the logs
> of old builds. And while the old data isn't publicly accessible, it has
> occasionally been used to run specialised queries to research particular
> issues. It's also arguably a useful historical resource that shouldn't
> be lightly abandoned.

As long as the historical data is kept somewhere, I agree that it
doesn't need to be readily available on-line. 10GB a year is not a lot
of data these days, so it seems like we ought to be able to archive it
indefinitely; but I can see that keeping it available on the web might
run into some money. (You could also argue that there's no need to
archive more than say five years back, but I think that's a different
discussion.)

> I'd like to get an idea of what the community regards as a reasonable
> amount of data to keep online and readily handy? Six months worth? A
> year? two years? Is it worth keeping logs of error stages longer than
> successful stages? If so, what should the periods be?

Six months is probably plenty, really, especially if that means we can
make the data more available than it is now. I'm not convinced that
"successful" builds should be purged more quickly, as there's often
reason to look for warnings, funny events in the postmaster log, etc.

> One of the things that I'd like to be able to do is FTS on the logs.

+1. +10 even. I think this'd be a quantum jump in the usefulness of
the log archives. I frequently wonder things like "what other machines
are showing this warning", and right now it's impractical to research
that.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers