reducing NUMERIC size for 9.1 [PgSql]

Prev: SHOW TABLES
Next: [HACKERS] Listen/Notify in 9.0

From: Tom Lane on 30 Jul 2010 14:11

[ forgot to answer this part ]

Robert Haas <robertmhaas(a)gmail.com> writes:
> Another question here is whether we should just fix this in CVS HEAD,
> or whether there's any reason to back-patch it.

Agreed, fixing it in HEAD seems sufficient.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 29 Jul 2010 14:35

On Thu, Jul 29, 2010 at 1:20 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Yeah, you would need an additional layer of struct to represent the
> numeric with a length word in front of it. I think this is not
> necessarily bad because it would perhaps open the door to working
> directly with short-varlena-header values, which is never going to
> be possible with this:
>
>> typedef struct NumericData
>> {
>> int32 varlen;
>> int16 n_header;
>> union { ...
>
> OTOH alignment considerations may make that idea hopeless anyway.

My understanding of our alignment rules for on-disk storage is still a
bit fuzzy, but as I understand it we don't align varlenas. So
presumably if we get a pointer directly into a disk block, the first
byte might happen to be not aligned, which would make the rest of the
structure aligned; and from playing around with the system, it looks
like if we get a value from anywhere else it's typically using the
4-byte varlena header. So it seems like it might be possible to write
code that aligns the data only if needed and otherwise skips a
palloc-and-copy cycle. I'm not totally sure that would be a win, but
it could be. Actually, I had a thought that it might be even more of
a win if you added a flag to the NumericVar representation indicating
whether the digit array was palloc'd or from the original tuple. Then
you might be able to avoid TWO palloc-and-copy cycles, although at the
price of a fairly significant code restructuring.

Which is a long-winded way of saying - it's probably not hopeless.

>> Why n_data as char[1] instead of NumericDigit, you ask?
>
> Yes, we'd have to export NumericDigit if we wanted to declare these
> structs "properly" in numeric.h. I wonder if that decision should
> be revisited. I'd lean to making the whole struct local to numeric.c
> though. Is there anyplace else that really ought to see it?

Probably not. btree_gist is using it, but that's it, at least as far
as our tree is concerned. Attached please find a patch to make the
numeric representation private and add a convenience function
numeric_is_nan() for the benefit of btree_gist. If this looks sane,
I'll go ahead and commit it, which will simplify review of the main
patch once I rebase it over these changes.

>>> I hadn't actually looked. I think though that it's a mistake to break
>>> compatibility on both dscale and weight when you only need to break one.
>>> Also, weight is *certainly* uninteresting for NaNs since it's not even
>>> meaningful unless there are digits. dscale could conceivably be worth
>>> something.
>
>> I don't think I'm breaking compatibility on anything. Can you clarify
>> what part of the code you're referring to here? I'm sort of lost.
>
> On-disk is what I'm thinking about. Right now, a NaN's first word is
> all dscale except the sign bits. You're proposing to change that
> but I think it's unnecessary to do so.

*Where* am I proposing this? The new versions of NUMERIC_WEIGHT() and
NUMERIC_DSCALE() determine where to look for the bits in question
using NUMERIC_IS_SHORT(), which just tests NUMERIC_FLAGBITS(n) ==
NUMERIC_SHORT. There's nothing in there about the NaN case at all.
Even if there were, it's irrelevant because those bits are never
examined and, as far as I can tell, will always be zero barring a
cosmic ray hit. But even if they WERE examined, I don't see where I'm
changing the interpretation of them; in fact, I think I'm very
explicitly NOT doing that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From: Robert Haas on 30 Jul 2010 21:55

On Fri, Jul 30, 2010 at 2:08 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas(a)gmail.com> writes:
>> .... �Maybe something like this,
>> obviously with a suitable comment which I haven't written yet:
>
>> � � numeric_digits = (precision + 6) / 4;
>> � � return (numeric_digits * sizeof(int16)) + NUMERIC_HDRSZ;
>
> This is OK for the base-10K case, but there's still code in there
> for the base-10 and base-100 cases. �Can you express this logic in
> terms of DEC_DIGITS and sizeof(NumericDigit) ? �I think you might
> find it was actually clearer that way, cf Polya.

It appears to work out to:

numeric_digits = (precision + 2 * (DEC_DIGITS - 1)) / DEC_DIGITS
return (numeric_digits * sizeof(NumericDigits)) + NUMERIC_HDRSZ;

The smallest value for precision which requires 2 numeric_digits is
always 2; and the required number of numeric_digits increases by 1
each time the number of base-10 digits increases by DEC_DIGITS.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 30 Jul 2010 14:08

Robert Haas <robertmhaas(a)gmail.com> writes:
> .... Maybe something like this,
> obviously with a suitable comment which I haven't written yet:

> numeric_digits = (precision + 6) / 4;
> return (numeric_digits * sizeof(int16)) + NUMERIC_HDRSZ;

This is OK for the base-10K case, but there's still code in there
for the base-10 and base-100 cases. Can you express this logic in
terms of DEC_DIGITS and sizeof(NumericDigit) ? I think you might
find it was actually clearer that way, cf Polya.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 3 Aug 2010 19:52

On Fri, Jul 30, 2010 at 9:55 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> On Fri, Jul 30, 2010 at 2:08 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas(a)gmail.com> writes:
>>> .... Maybe something like this,
>>> obviously with a suitable comment which I haven't written yet:
>>
>>> numeric_digits = (precision + 6) / 4;
>>> return (numeric_digits * sizeof(int16)) + NUMERIC_HDRSZ;
>>
>> This is OK for the base-10K case, but there's still code in there
>> for the base-10 and base-100 cases. Can you express this logic in
>> terms of DEC_DIGITS and sizeof(NumericDigit) ? I think you might
>> find it was actually clearer that way, cf Polya.
>
> It appears to work out to:
>
> numeric_digits = (precision + 2 * (DEC_DIGITS - 1)) / DEC_DIGITS
> return (numeric_digits * sizeof(NumericDigits)) + NUMERIC_HDRSZ;
>
> The smallest value for precision which requires 2 numeric_digits is
> always 2; and the required number of numeric_digits increases by 1
> each time the number of base-10 digits increases by DEC_DIGITS.

And here is a patch implementing that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

First | Prev |
Pages: 1 2 3 4 5 6
Prev: SHOW TABLES
Next: [HACKERS] Listen/Notify in 9.0