[RFC] nodeToString format and exporting the SQL parser [PgSql]

Prev: [HACKERS] [RFC] nodeToString format and exporting the SQL parser
Next: pgindent bizarreness

From: Tom Lane on 2 Apr 2010 16:16

Michael Tharp <gxti(a)partiallystapled.com> writes:
> Due to popular demand on #postgresql (by which I mean David Fetter), I
> have been spending a little time making the internal SQL parser
> available to clients via a C-language SQL function. The function itself
> is extremely simple: just a wrapper around a call to raw_parser followed
> by nodeToString.

What exactly is the use-case for this? Generally speaking I'm against
exposing that data structure to clients, because there will inevitably
be griping when we change it (as we most certainly will). Your
complaints boil down to "this is hard to parse from the client side",
and that already tells me you're doing something that we will refuse
to support.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Markus Wanner on 3 Apr 2010 09:17

Hi,

Michael Tharp wrote:
> I have been spending a little time making the internal SQL parser
> available to clients via a C-language SQL function.

This sounds very much like one of the Cluster Features:
http://wiki.postgresql.org/wiki/ClusterFeatures#API_into_the_Parser_.2F_Parser_as_an_independent_module

Is this what you (or David) have in mind?

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: David Fetter on 4 Apr 2010 12:10

On Sat, Apr 03, 2010 at 03:17:30PM +0200, Markus Schiltknecht wrote:
> Hi,
>
> Michael Tharp wrote:
> >I have been spending a little time making the internal SQL parser
> >available to clients via a C-language SQL function.
>
> This sounds very much like one of the Cluster Features:
> http://wiki.postgresql.org/wiki/ClusterFeatures#API_into_the_Parser_.2F_Parser_as_an_independent_module
>
> Is this what you (or David) have in mind?

I'm not a fan of statement-based replication of any description. The
use cases I have in mind involve things like known-correct syntax
highlighting in text editors.

Cheers,
David.
--
David Fetter <david(a)fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter(a)gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Michael Tharp on 4 Apr 2010 16:46

On 04/02/2010 04:16 PM, Tom Lane wrote:
> Generally speaking I'm against
> exposing that data structure to clients, because there will inevitably
> be griping when we change it (as we most certainly will). Your
> complaints boil down to "this is hard to parse from the client side",
> and that already tells me you're doing something that we will refuse
> to support.

Sorry, I got the impression that the node format was intended to be
stable. If it's for internal use only then I don't suppose my
suggestions make much sense.

As for the structure of the parse nodes themselves, changes aren't a big
deal to me. What I'm doing is diffing entire files of DDL, so even if
the node format changes over time the changes in two files parsed using
the same code are still evident. In fact, the ambiguities in parsing
don't really affect this use case much because I don't need exact values
to detect changes.

Markus' link suggests using the parser for a load balancer, which would
require that this all be in library form anyway as the balancer cannot
afford a round trip to the server to parse the query, so maybe I should
focus my efforts there. Making it a separate library would also resolve
the compatibility issues as the library could keep its own structures
regardless of where Postgres goes.

Thanks for the input!

-- m. tharp

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: ioguix on 21 Apr 2010 13:40

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/04/2010 18:10, David Fetter wrote:
> On Sat, Apr 03, 2010 at 03:17:30PM +0200, Markus Schiltknecht wrote:
>> Hi,
>>
>> Michael Tharp wrote:
>>> I have been spending a little time making the internal SQL parser
>>> available to clients via a C-language SQL function.
>>
>> This sounds very much like one of the Cluster Features:
>> http://wiki.postgresql.org/wiki/ClusterFeatures#API_into_the_Parser_.2F_Parser_as_an_independent_module
>>
>> Is this what you (or David) have in mind?
>
> I'm not a fan of statement-based replication of any description. The
> use cases I have in mind involve things like known-correct syntax
> highlighting in text editors.

The point here is not to expose the internal data structure, but to
deliver a tokenized version of the given SQL script.

There's actually many different use cases for external projects :
- syntax highlighting
- rewrite query with proper indentation
- replication
- properly splitting queries from a script
- define type of the query (SELECT ? UPDATE/DELETE ? DDL ?)
- checking validity of a query before sending it
- ...

In addition of PgPool needs, I can see 3 or 4 direct use cases for
pgAdmin and phpPgAdmin.

So it seems to me having the parser code in a shared library would be
very useful for external C projects which can link to it. However it
would be useless for other non-C projects which can't use it directly
but are connected to a PostgreSQL backend anyway (phpPgAdmin as instance).

What about having a new SQL command like TOKENIZE ? it would kinda act
like EXPLAIN but giving a tokenized version of the given SQL script. As
EXPLAIN, it could speak XML, YAML, JSON, you name it...

Each token could have :
- a type ('identifier', 'string', 'sql command', 'sql keyword',
'variable'...)
- the start position in the string
- the value
- the line number
- ...

A simple example of a tokenizer is the php one:
http://fr.php.net/token_get_all

And here is a basic example which return pseudo rows here :

=> TOKENIZE $script$
SELECT 1;
UPDATE test SET "a"=2;
$script$;

type | pos | value | line
- -------------+-----+----------+------
SQL_COMMAND | 1 | 'SELECT' | 1
CONSTANT | 8 | '1' | 1
DELIMITER | 9 | ';' | 1
SQL_COMMAND | 11 | 'UPDATE' | 2
IDENTIFIER | 18 | 'test' | 2
SQL_KEYWORD | 23 | 'SET' | 2
IDENTIFIER | 27 | '"a"' | 2
OPERATOR | 30 | '=' | 2
CONSTANT | 31 | '1' | 2

>
> Cheers,
> David.

As a phpPgAdmin dev, I am thinking about this subject since a long time.
I am interested about trying to create such a patch after discussing it
and if you think it is doable.

- --
JGuillaume (ioguix) de Rorthais
http://www.dalibo.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvPOJMACgkQxWGfaAgowiLrUACfa7qMVr3oiOVS7JfhTa1S9EqY
pYkAn3Sj6cezC/EdWPu2+kzrgjaDygGE
=oY1c
-----END PGP SIGNATURE-----

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2 3
Prev: [HACKERS] [RFC] nodeToString format and exporting the SQL parser
Next: pgindent bizarreness