From: Tom Lane on 19 Jun 2010 15:03 Andrew Dunstan <andrew(a)dunslane.net> writes: > Another thought: could we add a column to pg_type with a flag that's > true if the oids are in sort order? Then the comparison routines could > just look that up in the type cache and if it's true (as it often will > be) just return the oid comparison. Well, having to do a cache lookup already makes it a couple orders of magnitude more expensive than an OID comparison. However, it's hard to say how much that matters in terms of total application performance. We really could do with a bit of performance testing here ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Gurjeet Singh on 19 Jun 2010 16:14 On Fri, Jun 18, 2010 at 12:59 PM, Andrew Dunstan <andrew(a)dunslane.net>wrote: > > > David E. Wheeler wrote: > >> On Jun 18, 2010, at 9:34 AM, Andrew Dunstan wrote: >> >> >> >>> I'd be perfectly happy to hear a reasonable alternative. Assuming we use >>> some integer representation, given two labels represented by n and n+1, we >>> can't add a label between them without rewriting the tables that use the >>> type, whether it's my representation scheme or some other. Maybe we could >>> have a FORCE option which would rewrite if necessary. >>> >>> >> >> People would likely always use it. >> >> Why not use a decimal number? >> >> >> >> > > You are just bumping up the storage cost. Part of the attraction of enums > is their efficiency. > > Probably it'd be the same as the decimal suggestion above, but we can use floating-point data type. It will allow injection of a new label at any stage. CREATE leads to Label1 -> 1.0 Label2 -> 2.0 Label3 -> 3.0 ALTER ... ADD Label4 AFTER Label2; leads to Label1 -> 1.0 Label2 -> 2.0 Label4 -> 2.5 Label3 -> 3.0 ALTER ... ADD Label5 AFTER Label2; leads to Label1 -> 1.0 Label2 -> 2.0 Label5 -> 2.25 Label4 -> 2.5 Label3 -> 3.0 Since floating-point implementation is architecture dependent, the ALTER command should check that the injected value does not equate to any value around it (eg. comparisons of (2.5 == 2) and (2.25 == 2.5) should not yield 0); and if it does, then throw an error and ask the user to use the rewrite-the-table version of the command. And since it is still 32 bit, and comparisons done by machine, performance should be acceptably close to current integer comparisons, and much faster that the cache lookups etc. being proposed. This is very similar to Andrew's original suggestion of splitting 32 bits into 16+16, but managed by the machine hence no complicated comparison algos needed on our part. Also, since this is all transparent to the SQL interface, our dump-reload cycle or Slony replication, etc. should not be affected either. Regards, -- gurjeet.singh @ EnterpriseDB - The Enterprise Postgres Company http://www.EnterpriseDB.com singh.gurjeet@{ gmail | yahoo }.com Twitter/Skype: singh_gurjeet Mail sent from my BlackLaptop device
From: Andrew Dunstan on 19 Jun 2010 16:55 Gurjeet Singh wrote: > > > This is very similar to Andrew's original suggestion of splitting 32 > bits into 16+16, but managed by the machine hence no complicated > comparison algos needed on our part. Also, since this is all > transparent to the SQL interface, our dump-reload cycle or Slony > replication, etc. should not be affected either. > > It would break the on-disk representation, though. That's not something we want to do any more if it can possibly be avoided. We want to keep pg_upgrade working. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Merlin Moncure on 19 Jun 2010 17:11 On Sat, Jun 19, 2010 at 4:55 PM, Andrew Dunstan <andrew(a)dunslane.net> wrote: > Gurjeet Singh wrote: >> >> >> This is very similar to Andrew's original suggestion of splitting 32 bits >> into 16+16, but managed by the machine hence no complicated comparison algos >> needed on our part. Also, since this is all transparent to the SQL >> interface, our dump-reload cycle or Slony replication, etc. should not be >> affected either. >> >> > > It would break the on-disk representation, though. That's not something we > want to do any more if it can possibly be avoided. We want to keep > pg_upgrade working. I was partial to your original idea -- i thought it was quite clever actually. enums are a performance side of a tradeoff already so I think any improvement for them should be looked at through that lens. 16 bits is IMO enough to pick a reasonable bucket size that gives you enough play to handle the vast majority of cases that are appropriate for enums. your workaround in the rare case you actually hit the limitations (most of these would fall under the 'oops, i used the wrong tool' category) seems perfectly ok imo. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 19 Jun 2010 20:15
On Fri, 2010-06-18 at 11:50 -0400, Andrew Dunstan wrote: > Thoughts? enum types exist as an optimisation-by-avoidance of referential integrity. We're a relational database, so IMHO we should spend time performance tuning RI. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |