extensible enum types [PgSql]

Prev: [HACKERS] extensible enum types
Next: [HACKERS] About tapes

From: Andrew Dunstan on 18 Jun 2010 13:59

Robert Haas wrote:
> On Fri, Jun 18, 2010 at 12:59 PM, Andrew Dunstan <andrew(a)dunslane.net> wrote:
>
>> You are just bumping up the storage cost. Part of the attraction of enums is
>> their efficiency.
>>
>
> What's efficient about them? Aren't we using 4 bytes to store a value
> that will nearly always fit in 2, if not 1?
>
>
This was debated when we implemented enums. As between 1,2 and 4 there
is often not much to choose, as alignment padding makes it pretty much
the same. But any of them are more efficient than storing a numeric
value or the label itself.

Anyway, it might well be moot.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 18 Jun 2010 14:06

On Fri, Jun 18, 2010 at 1:59 PM, Andrew Dunstan <andrew(a)dunslane.net> wrote:
> This was debated when we implemented enums. As between 1,2 and 4 there is
> often not much to choose, as alignment padding makes it pretty much the
> same. But any of them are more efficient than storing a numeric value or the
> label itself.

I was assuming the alternative was an integer, rather than a
numeric... but yeah, a numeric or the label itself would definitely
be larger.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on 18 Jun 2010 14:28

On Fri, Jun 18, 2010 at 6:17 PM, Andrew Dunstan <andrew(a)dunslane.net> wrote:
>
>
> Tom Lane wrote:
>>
>> Insert a sort order column into pg_enum, and rearrange the values in
>> that whenever the user wants to add a new value in a particular place.

+1 I was going to say exactly the same thing.

>> You give up cheap comparisons in exchange for flexibility. �I think lots
>> of people would accept that tradeoff, especially if they could make it
>> per-datatype.
> Hmm. Yes, that could work. The assumption in my proposal was that existing
> values would not be reordered anyway.
>
> But I'm not happy about giving up cheap comparison. And how would it be per
> data-type? That part isn't clear to me. Would we mark a given enum type as
> having its oids in order? It would also be sensible to quantify how much
> more expensive comparisons would become. If the sort order data were kept in
> the syscache the extra cost might get �very small.

I think you would need a syscache or something like it. My first
instinct was to load the whole enum value->sort order mapping into a
hash table the first time you're asked to compare two values in a
given type. Then your comparison operator amounts to "look
up/initialize hash table for this enum type, look up both sort orders
in hash table, return comparison". You might need something like a
syscache for the hash tables so that you don't keep the hash tables
around forever.

Using a syscache for the individual sort values would be slower to
initially load if you're sorting a list since you would be doing a lot
of retail lookups of individual values. But then perhaps it's a cheap
way to protect against very large enums which using a hash table per
enum type would be fragile against.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 18 Jun 2010 15:18

Andrew Dunstan <andrew(a)dunslane.net> writes:
> Tom Lane wrote:
>> Insert a sort order column into pg_enum, and rearrange the values in
>> that whenever the user wants to add a new value in a particular place.
>> You give up cheap comparisons in exchange for flexibility. I think lots
>> of people would accept that tradeoff, especially if they could make it
>> per-datatype.

> But I'm not happy about giving up cheap comparison.

I don't think it would be all that bad. We could teach typcache.c to
cache the ordering data for any type that's in active use. It'd
certainly be a lot more expensive than OID comparison, but perhaps not
worse than NUMERIC comparisons.

> And how would it be per data-type?

Well, there'd be two kinds of enums, just as you were saying before.
I'm not sure how we'd expose that to users exactly, or whether there
could be provisions for switching a type's behavior after creation.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Joseph Adams on 18 Jun 2010 18:17

On Fri, Jun 18, 2010 at 1:59 PM, Andrew Dunstan <andrew(a)dunslane.net> wrote:
>
>
> Robert Haas wrote:
>>
>> On Fri, Jun 18, 2010 at 12:59 PM, Andrew Dunstan <andrew(a)dunslane.net>
>> wrote:
>>
>>>
>>> You are just bumping up the storage cost. Part of the attraction of enums
>>> is
>>> their efficiency.
>>>
>>
>> What's efficient about them? �Aren't we using 4 bytes to store a value
>> that will nearly always fit in 2, if not 1?
>>
>>
>
> This was debated when we implemented enums. As between 1,2 and 4 there is
> often not much to choose, as alignment padding makes it pretty much the
> same. But any of them are more efficient than storing a numeric value or the
> label itself.
>
> Anyway, it might well be moot.
>
> cheers
>
> andrew

Something I don't understand in all this is: why can't the type of an
enum be determined statically rather than stored in every single
value? For instance, if we have:

CREATE TYPE number AS ENUM ('one', 'two', 'three');
CREATE TYPE color AS ENUM ('red', 'green', 'blue');

PostgreSQL won't allow a comparison between two different enum types, e.g.:

> SELECT 'one'::number = 'red'::color;
ERROR: operator does not exist: number = color

However, when we say:

SELECT 'one'::number = 'one'::number

Couldn't enum_eq just use get_fn_expr_argtype to determine the type of
enum input rather than rely on it being stored in the value (either
implicitly via OID or explicitly as a word half)?

Also, I can't seem to find the original debates from when enums were
implemented. Does anyone have a link to that thread in the archives?
Thanks.

Joey Adams

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: [HACKERS] extensible enum types
Next: [HACKERS] About tapes