extensible enum types [PgSql]

Prev: [HACKERS] extensible enum types
Next: [HACKERS] About tapes

From: Tom Lane on 19 Jun 2010 15:03

Andrew Dunstan <andrew(a)dunslane.net> writes:
> Another thought: could we add a column to pg_type with a flag that's
> true if the oids are in sort order? Then the comparison routines could
> just look that up in the type cache and if it's true (as it often will
> be) just return the oid comparison.

Well, having to do a cache lookup already makes it a couple orders of
magnitude more expensive than an OID comparison. However, it's hard to
say how much that matters in terms of total application performance.
We really could do with a bit of performance testing here ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Gurjeet Singh on 19 Jun 2010 16:14

On Fri, Jun 18, 2010 at 12:59 PM, Andrew Dunstan <andrew(a)dunslane.net>wrote:

>
>
> David E. Wheeler wrote:
>
>> On Jun 18, 2010, at 9:34 AM, Andrew Dunstan wrote:
>>
>>
>>
>>> I'd be perfectly happy to hear a reasonable alternative. Assuming we use
>>> some integer representation, given two labels represented by n and n+1, we
>>> can't add a label between them without rewriting the tables that use the
>>> type, whether it's my representation scheme or some other. Maybe we could
>>> have a FORCE option which would rewrite if necessary.
>>>
>>>
>>
>> People would likely always use it.
>>
>> Why not use a decimal number?
>>
>>
>>
>>
>
> You are just bumping up the storage cost. Part of the attraction of enums
> is their efficiency.
>
>
Probably it'd be the same as the decimal suggestion above, but we can use
floating-point data type.

It will allow injection of a new label at any stage.

CREATE leads to

Label1 -> 1.0
Label2 -> 2.0
Label3 -> 3.0

ALTER ... ADD Label4 AFTER Label2; leads to
Label1 -> 1.0
Label2 -> 2.0
Label4 -> 2.5
Label3 -> 3.0

ALTER ... ADD Label5 AFTER Label2; leads to
Label1 -> 1.0
Label2 -> 2.0
Label5 -> 2.25
Label4 -> 2.5
Label3 -> 3.0

Since floating-point implementation is architecture dependent, the ALTER
command should check that the injected value does not equate to any value
around it (eg. comparisons of (2.5 == 2) and (2.25 == 2.5) should not yield
0); and if it does, then throw an error and ask the user to use the
rewrite-the-table version of the command.

And since it is still 32 bit, and comparisons done by machine, performance
should be acceptably close to current integer comparisons, and much faster
that the cache lookups etc. being proposed.

This is very similar to Andrew's original suggestion of splitting 32 bits
into 16+16, but managed by the machine hence no complicated comparison algos
needed on our part. Also, since this is all transparent to the SQL
interface, our dump-reload cycle or Slony replication, etc. should not be
affected either.

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

From: Andrew Dunstan on 19 Jun 2010 16:55

Gurjeet Singh wrote:
>
>
> This is very similar to Andrew's original suggestion of splitting 32
> bits into 16+16, but managed by the machine hence no complicated
> comparison algos needed on our part. Also, since this is all
> transparent to the SQL interface, our dump-reload cycle or Slony
> replication, etc. should not be affected either.
>
>

It would break the on-disk representation, though. That's not something
we want to do any more if it can possibly be avoided. We want to keep
pg_upgrade working.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Merlin Moncure on 19 Jun 2010 17:11

On Sat, Jun 19, 2010 at 4:55 PM, Andrew Dunstan <andrew(a)dunslane.net> wrote:
> Gurjeet Singh wrote:
>>
>>
>> This is very similar to Andrew's original suggestion of splitting 32 bits
>> into 16+16, but managed by the machine hence no complicated comparison algos
>> needed on our part. Also, since this is all transparent to the SQL
>> interface, our dump-reload cycle or Slony replication, etc. should not be
>> affected either.
>>
>>
>
> It would break the on-disk representation, though. That's not something we
> want to do any more if it can possibly be avoided. We want to keep
> pg_upgrade working.

I was partial to your original idea -- i thought it was quite clever
actually. enums are a performance side of a tradeoff already so I
think any improvement for them should be looked at through that lens.

16 bits is IMO enough to pick a reasonable bucket size that gives you
enough play to handle the vast majority of cases that are appropriate
for enums. your workaround in the rare case you actually hit the
limitations (most of these would fall under the 'oops, i used the
wrong tool' category) seems perfectly ok imo.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 19 Jun 2010 20:15

On Fri, 2010-06-18 at 11:50 -0400, Andrew Dunstan wrote:

> Thoughts?

enum types exist as an optimisation-by-avoidance of referential
integrity.

We're a relational database, so IMHO we should spend time performance
tuning RI.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: [HACKERS] extensible enum types
Next: [HACKERS] About tapes