reducing NUMERIC size for 9.1 [PgSql]

Prev: SHOW TABLES
Next: [HACKERS] Listen/Notify in 9.0

From: Tom Lane on 16 Jul 2010 14:39

Robert Haas <robertmhaas(a)gmail.com> writes:
> I'm not entirely happy with the way I handled the variable-length
> struct, although I don't think it's horrible, either. I'm willing to
> rework it if someone has a better idea.

I don't like the way you did that either (specifically, not the kluge
in NUMERIC_DIGITS()). It would probably work better if you declared
two different structs, or a union of same, to represent the two layout
cases.

A couple of other thoughts:

n_sign_dscale is now pretty inappropriately named, probably better to
change the field name. This will also help to catch anything that's
not using the macros. (Renaming the n_weight field, or at least burying
it in an extra level of struct, would be helpful for the same reason.)

It seems like you've handled the NAN case a bit awkwardly. Since the
weight is uninteresting for a NAN, it's okay to not store the weight
field, so I think what you should do is consider that the dscale field
is still full-width, ie the format of the first word remains old-style
not new-style. I don't remember whether dscale is meaningful for a NAN,
but if it is, your approach is constraining what is possible to store,
and is also breaking compatibility with old databases.

Also, I wonder whether you can do anything with depending on the actual
bit values of the flag bits --- specifically, it's short header format
iff first bit is set. The NUMERIC_HEADER_SIZE macro in particular could
be made more efficient with that.

The sign extension code in the NUMERIC_WEIGHT() macro seems a bit
awkward; I wonder if there's a better way. One solution might be to
offset the value (ie, add or subtract NUMERIC_SHORT_WEIGHT_MIN) rather
than try to sign-extend per se.

Please do NOT commit this:

(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
! errmsg("value overflows numeric format %x w=%d s=%u",
! result->n_sign_dscale,
! NUMERIC_WEIGHT(result), NUMERIC_DSCALE(result))));

or at least hide it in "#ifdef DEBUG_NUMERIC" or some such.

Other than that the code changes look pretty clean, I'm mostly just
dissatisfied with the access macros.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 20 Jul 2010 11:28

On Fri, Jul 16, 2010 at 2:39 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas(a)gmail.com> writes:
>> I'm not entirely happy with the way I handled the variable-length
>> struct, although I don't think it's horrible, either. I'm willing to
>> rework it if someone has a better idea.
>
> I don't like the way you did that either (specifically, not the kluge
> in NUMERIC_DIGITS()). �It would probably work better if you declared
> two different structs, or a union of same, to represent the two layout
> cases.
>
> A couple of other thoughts:
>
> n_sign_dscale is now pretty inappropriately named, probably better to
> change the field name. �This will also help to catch anything that's
> not using the macros. �(Renaming the n_weight field, or at least burying
> it in an extra level of struct, would be helpful for the same reason.)

I'm not sure what you have in mind here. If we create a union of two
structs, we'll still have to pick one of them to use to check the high
bits of the first word, so I'm not sure we'll be adding all that much
in terms of clarity. One possibility would be to name the fields
something like n_header1 and n_header2, or even just n_header[], but
I'm not sure if that's any better. If it is I'm happy to do it.

> It seems like you've handled the NAN case a bit awkwardly. �Since the
> weight is uninteresting for a NAN, it's okay to not store the weight
> field, so I think what you should do is consider that the dscale field
> is still full-width, ie the format of the first word remains old-style
> not new-style. �I don't remember whether dscale is meaningful for a NAN,
> but if it is, your approach is constraining what is possible to store,
> and is also breaking compatibility with old databases.

There is only one NaN value. Neither weight or dscale is meaningful.
I think if the high two bits of the first word are 11 we never examine
anything else - do you see somewhere that we're doing otherwise?

> Also, I wonder whether you can do anything with depending on the actual
> bit values of the flag bits --- specifically, it's short header format
> iff first bit is set. �The NUMERIC_HEADER_SIZE macro in particular could
> be made more efficient with that.

Right, OK.

> The sign extension code in the NUMERIC_WEIGHT() macro seems a bit
> awkward; I wonder if there's a better way. �One solution might be to
> offset the value (ie, add or subtract NUMERIC_SHORT_WEIGHT_MIN) rather
> than try to sign-extend per se.

Hmm... so, if the weight is X we store the value
X-NUMERIC_SHORT_WEIGHT_MIN as an unsigned integer? That's kind of a
funny representation - I *think* it works out to sign extension with
the high bit flipped. I guess we could do it that way, but it might
make it harder/more confusing to do bit arithmetic with the weight
sign bit later on.

> Please do NOT commit this:
>
> � � � � � � � � � � � � � � � �(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> ! � � � � � � � � � � � � � � � �errmsg("value overflows numeric format %x w=%d s=%u",
> ! � � � � � � � � � � � � � � � � � � � result->n_sign_dscale,
> ! � � � � � � � � � � � � � � � � � � � NUMERIC_WEIGHT(result), NUMERIC_DSCALE(result))));
>
> or at least hide it in "#ifdef DEBUG_NUMERIC" or some such.

Woopsie. That's debugging leftovers, sorry.

> Other than that the code changes look pretty clean, I'm mostly just
> dissatisfied with the access macros.

Thanks for the review.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 28 Jul 2010 15:00

[ gradually catching up on email ]

Robert Haas <robertmhaas(a)gmail.com> writes:
> On Fri, Jul 16, 2010 at 2:39 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> I don't like the way you did that either (specifically, not the kluge
>> in NUMERIC_DIGITS()). �It would probably work better if you declared
>> two different structs, or a union of same, to represent the two layout
>> cases.
>>
>> n_sign_dscale is now pretty inappropriately named, probably better to
>> change the field name. �This will also help to catch anything that's
>> not using the macros. �(Renaming the n_weight field, or at least burying
>> it in an extra level of struct, would be helpful for the same reason.)

> I'm not sure what you have in mind here. If we create a union of two
> structs, we'll still have to pick one of them to use to check the high
> bits of the first word, so I'm not sure we'll be adding all that much
> in terms of clarity.

No, you can do something like this:

typedef struct numeric_short
{
uint16 word1;
NumericDigit digits[1];
} numeric_short;

typedef struct numeric_long
{
uint16 word1;
int16 weight;
NumericDigit digits[1];
} numeric_long;

typedef union numeric
{
uint16 word1;
numeric_short short;
numeric_long long;
} numeric;

and then access word1 either directly or (after having identified which
format it is) via one of the sub-structs. If you really wanted to get
pedantic you could have a third sub-struct representing the format for
NaNs, but since those are just going to be word1 it may not be worth the
trouble.

>> It seems like you've handled the NAN case a bit awkwardly. �Since the
>> weight is uninteresting for a NAN, it's okay to not store the weight
>> field, so I think what you should do is consider that the dscale field
>> is still full-width, ie the format of the first word remains old-style
>> not new-style. �I don't remember whether dscale is meaningful for a NAN,
>> but if it is, your approach is constraining what is possible to store,
>> and is also breaking compatibility with old databases.

> There is only one NaN value. Neither weight or dscale is meaningful.
> I think if the high two bits of the first word are 11 we never examine
> anything else - do you see somewhere that we're doing otherwise?

I hadn't actually looked. I think though that it's a mistake to break
compatibility on both dscale and weight when you only need to break one.
Also, weight is *certainly* uninteresting for NaNs since it's not even
meaningful unless there are digits. dscale could conceivably be worth
something.

>> The sign extension code in the NUMERIC_WEIGHT() macro seems a bit
>> awkward; I wonder if there's a better way. �One solution might be to
>> offset the value (ie, add or subtract NUMERIC_SHORT_WEIGHT_MIN) rather
>> than try to sign-extend per se.

> Hmm... so, if the weight is X we store the value
> X-NUMERIC_SHORT_WEIGHT_MIN as an unsigned integer? That's kind of a
> funny representation - I *think* it works out to sign extension with
> the high bit flipped. I guess we could do it that way, but it might
> make it harder/more confusing to do bit arithmetic with the weight
> sign bit later on.

Yeah, it was just an idea. It seems like there should be an easier way
to extract the sign-extended value, though.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 29 Jul 2010 17:03

Robert Haas <robertmhaas(a)gmail.com> writes:
> OK. I think you're misinterpreting the point of that comment, which
> may mean that it needs some clarification. By "the two byte header
> format is also used", I think I really meant "the header (and in fact
> the entire value) is just 2 bytes". Really, the low order bits have
> neither the old interpretation nor the new interpretation: they don't
> have any interpretation at all - they're completely meaningless.
> That's what the part after the word "but" was intended to clarify.
> Every routine in numeric.c checks for NUMERIC_IS_NAN() and gives it
> some special handling before doing anything else, so NUMERIC_WEIGHT()
> and NUMERIC_DSCALE() are never called in that case.

I would suggest the comment ought to read something like

NaN values also use a two-byte header (in fact, the
whole value is always only two bytes). The low order bits of
the header word are available to store dscale, though dscale
is not currently used with NaNs.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 29 Jul 2010 16:37

Robert Haas <robertmhaas(a)gmail.com> writes:
> On Thu, Jul 29, 2010 at 1:20 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> On-disk is what I'm thinking about. �Right now, a NaN's first word is
>> all dscale except the sign bits. �You're proposing to change that
>> but I think it's unnecessary to do so.

> *Where* am I proposing this?

Um, your patch has the comment

! * If the high bits of n_scale_dscale are NUMERIC_NAN, the two-byte header
! * format is also used, but the low bits of n_scale_dscale are discarded in
! * this case.

but now that I look a bit more closely, I don't think that's what the
code is doing. You've got the NUMERIC_DSCALE and NUMERIC_WEIGHT access
macros testing specifically for NUMERIC_IS_SHORT, not for high-bit-set
which I think is what I was assuming they'd do. So actually that code
is good as is: a NAN still has the old header format. It's just the
comment that's wrong.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: SHOW TABLES
Next: [HACKERS] Listen/Notify in 9.0