From: Magnus Hagander on
2010/3/4 Josh Berkus <josh(a)agliodbs.com>:
> All,
>
> Currently, the only way for admin scripts to get individual data items
> out of pg_controldata (such as the next XID or the catalog version) is
> via grep and regex. Given that people are going to be relying on some of
> this data for replication admin in the future, it seems past time to
> have a form of pg_controldata which either outputs machine-readable text
> (XML or JSON), or (my preference) takes options to output just the
> invididual items, e.g.

Huh? It's fixed with, you don't need regexps for that. Just split the
string apart.

Taking options for single fields might have a better usecase, of course :-)


> pg_controldata --catalog_version
>
> Even better would be the ability to get everything which is in
> pg_controldata currently as part of a system view in a running
> PostgreSQL; I can get most of them, but certainly not all.

+1 for having all the information available from inside the backend,
if that's technically possible (which I assume it should be)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Joshua Tolley on
On Thu, Mar 04, 2010 at 10:54:15PM +0100, Magnus Hagander wrote:
> 2010/3/4 Josh Berkus <josh(a)agliodbs.com>:
> > pg_controldata --catalog_version
> >
> > Even better would be the ability to get everything which is in
> > pg_controldata currently as part of a system view in a running
> > PostgreSQL; I can get most of them, but certainly not all.
>
> +1 for having all the information available from inside the backend,
> if that's technically possible (which I assume it should be)

I'd love to see pg_config's various bits of information in there as well. I
just haven't gotten around to writing it. But +1 from me, FWIW.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com
From: Joe Conway on
On 03/04/2010 02:09 PM, Joshua Tolley wrote:
> On Thu, Mar 04, 2010 at 10:54:15PM +0100, Magnus Hagander wrote:
>> 2010/3/4 Josh Berkus <josh(a)agliodbs.com>:
>>> pg_controldata --catalog_version
>>>
>>> Even better would be the ability to get everything which is in
>>> pg_controldata currently as part of a system view in a running
>>> PostgreSQL; I can get most of them, but certainly not all.
>>
>> +1 for having all the information available from inside the backend,
>> if that's technically possible (which I assume it should be)
>
> I'd love to see pg_config's various bits of information in there as well. I
> just haven't gotten around to writing it. But +1 from me, FWIW.

I agree something like this would be useful -- maybe I'll try to come up
with some round tuits...

Joe


From: Greg Smith on
Magnus Hagander wrote:
> Huh? It's fixed with, you don't need regexps for that. Just split the
> string apart.
>
> Taking options for single fields might have a better usecase, of course :-)
>

I do find it a bit hard to imagine that any program capable of shelling
out to call pg_controldata and doing something with the output would hit
a major hurdle parsing the format that's already there. Moving toward
single fields I could see as being better for some cases, but going all
the way to XML/JSON is a step backwards from the plain text format as
far as I'm concerned. Anything that can parse one of those complicated
formats should be able to trivially chew the existing one already.
Seriously: I have bash scripts that parse that thing.

>> Even better would be the ability to get everything which is in
>> pg_controldata currently as part of a system view in a running
>> PostgreSQL; I can get most of them, but certainly not all.
>>
>
> +1 for having all the information available from inside the backend,
> if that's technically possible (which I assume it should be)
>

I revisit this every time I write yet another user-space parser and ask
myself why I haven't exposed it in the server yet. The primary answer
so far has always been "because you can't execute a query on the standby
while it's in recovery", making half the stuff I wanted the data far
(e.g. standby lag monitoring like
http://www.kennygorman.com/wordpress/?p=249 ) unable to use that
interface anyway. Now that Hot Standby cracks that objection, it's
worth talking about for a minute.

pg_controldata itself just reads the file in directly and dumps the
data. There is a copy of it kept around all the time in shared memory
though (ControlFile in xlog.c), protected by a LWLock. At a high level
you can imagine a new function in xlog.c that acquires that lock, copies
the block into a space the backend allocated for saving it, releases the
lock, and then returns the whole structure. Then just wrap some number
of superuser-only UDFs around it (I'd guess nobody wants regular ones
able to hold a lock on ControlFile) and you've exposed the results to
user-space.

Two questions before I'd volunteer to write that:

1) How do you handle the situation where the pg_controldata is invalid?
"Not read in yet" and "CRC is bad" are the two most obvious ones that
can happen. Return a null for every field, try and guess (the way
pg_resetxlog does), don't return a row of output at all, or throw an
error? Each of these has slightly different implications for how admin
code that will do something with these values will have to be structured.

2) While it's easy to say "I only want one or two of these values" and
expose a whole set of UDFs to grab them individually (perhaps wrapping
into a system view via that popular approach), I am concerned that
people are going to call any single-value versions provided one at a
time and get an inconsistent set. I think the only reasonable interface
to this would not return a single field, it would pop out all of them so
you got a matching set from the point in time the lock was held. And if
that's the case, I'm not sure of the most reasonable UI is. Just return
a whole row with a column for each field in the file, and then people
can select out just the ones they want? (That's probably the right
one) Produce the mess as a series of rows with (name,value) pairs? Put
them into an array?

Have re-raised these concerns to myself, this is usually the point in
this exercise where I go "screw it, I'll just parse pg_controldata again
instead" and do that instead. This is happening so much lately that I
think Josh's suggestion it's just unworkable to keep going via that
model forever has merit though. I find it hard to mark this 9.0
territory though, given the data is not actually difficult to grab--and
that trail is already well blazed, nothing new in this version.

In short: I'd vote for TODO item and would happily write myself for 9.1
given reasonable agreement on the questions raised above, -1 for doing
anything about it right now though. Given both the existence of
completely reasonable workarounds and the existence of much more serious
blocker problems sitting on the roadmap to release, can't get real
excited about this as the thing to worry about right now. Same reason I
ignored the idea when Joshua Tolley brought it up last month:
http://archives.postgresql.org/message-id/4b69caeb.9513f30a.731a.3427(a)mx.google.com

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us


From: "Greg Sabino Mullane" on

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> I do find it a bit hard to imagine that any program capable of shelling
> out to call pg_controldata and doing something with the output would hit
> a major hurdle parsing the format that's already there.

+1

> 1) How do you handle the situation where the pg_controldata is invalid?

Throw an error

> 2) While it's easy to say "I only want one or two of these values" and
> expose a whole set of UDFs to grab them individually (perhaps wrapping
> into a system view via that popular approach), I am concerned that
> people are going to call any single-value versions provided one at a
> time and get an inconsistent set.

I'm not too concerned about this. This will be a fairly advanced interface,
and a warning in the docs should suffice. I think a good interface will
help however. I'd lean towards something like pg_settings.

What I *would* like to see is two tweaks to the output of pg_controldata.
First, having the "time of latest checkpoint" appear as an epoch (rather than
or in addition to a localized time string) would help quite a bit. Second, it
can be hard to build regex solutions when you don't know whan language your
end user will be using. Not sure of the best solution for that one off the top of
my head, but there are some workarounds. For example, check_postgres.pl stores
all the languages translations of "Time of latest checkoint" to help it find that
information, but I'd sure like a more elegant solution. (One could count lines,
but that's presumes the order and number of items will never change).

- --
Greg Sabino Mullane greg(a)turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201003050945
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkuRGYMACgkQvJuQZxSWSsiDvgCgxgFtcy99ehUGt7i7gCp8zRTY
044An1JEEwki9KLZu5BhKXCUNGqfyXDf
=ruYL
-----END PGP SIGNATURE-----



--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers