From: Tom Lane on 4 Aug 2010 16:07 Robert Haas <robertmhaas(a)gmail.com> writes: > I have a couple ideas for further work on the numeric code that I want > to get feedback on. > 1. Cramming it down some more. I propose that we introduce a third > format with a one-byte header: 1 bit for sign, 3 bits for dynamic > scale, and 4 bits for weight (the first of which is a sign bit). This > might seem crazy, Yes, it does. In the first place it isn't going to work conveniently because NumericDigit requires int16 alignment. In the second, shaving just one byte doesn't seem like enough win to be worth the trouble. I don't believe your "billion rows" argument because you aren't factoring in the result of row-level alignment padding --- most of the time you're not going to win anything. > We don't need any special > marker to indicate that the 1-byte format is in use, because we can > deduce it from the length of the varlena (after excluding the header): > even = 2b or 4b header, odd = 1b header. There can't be any > odd-length numerics already on disk, so there shouldn't be any > compatibility break for pg_upgrade to worry about. Really? Not sure this is true, because numerics can be toast-compressed. It hardly ever happens, but to do this that's not good enough. > 2. Don't untoast/don't copy. This would be good, but I'm not sure how to do it. The main problem again is NumericDigit alignment. Only about half the time is the digit array going to be aligned the way you need, so that puts a real crimp in the possible win. (In fact, if we assume the previous field is more than byte aligned and the toast header is one byte, then the digit array is *never* properly aligned on disk :-() One possibility is to have an additional toasting rule that forces odd-byte-alignment of a field's one-byte header. But it's a bit hard to argue that numeric deserves the additional overhead that that would put into all the core tuple forming/deforming logic. > 3. 64-bit arithmetic. Right now, mul_var() and div_var() use int for > arithmetic, but haven't we given up on supporting platforms without > long long? I'm not sure I'm motivated enough to write the patch > myself, but it seems like 64-bit arithmetic would give us a lot more > room to postpone carries. I don't think this would win unless we went to 32-bit NumericDigit, which is a problem from the on-disk-compatibility standpoint, not to mention making the alignment issues even worse. Postponing carries is good, but we have enough headroom for that already --- I really doubt that making the array elements wider would save anything noticeable unless you increase NBASE. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 4 Aug 2010 19:16 On Wed, Aug 4, 2010 at 4:07 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas(a)gmail.com> writes: >> I have a couple ideas for further work on the numeric code that I want >> to get feedback on. > >> 1. Cramming it down some more. �I propose that we introduce a third >> format with a one-byte header: 1 bit for sign, 3 bits for dynamic >> scale, and 4 bits for weight (the first of which is a sign bit). �This >> might seem crazy, > > Yes, it does. �In the first place it isn't going to work conveniently > because NumericDigit requires int16 alignment. It is definitely not convenient. I'm not disputing that. > In the second, shaving > just one byte doesn't seem like enough win to be worth the trouble. > I don't believe your "billion rows" argument because you aren't > factoring in the result of row-level alignment padding --- most of the > time you're not going to win anything. Row-level alignment padding is a problem, and on very short rows, or rows where numeric is the only varlena, you may see no benefit. But if there are multiple text or numeric columns packed up next to each other, things are more promising. >> We don't need any special >> marker to indicate that the 1-byte format is in use, because we can >> deduce it from the length of the varlena (after excluding the header): >> even = 2b or 4b header, odd = 1b header. �There can't be any >> odd-length numerics already on disk, so there shouldn't be any >> compatibility break for pg_upgrade to worry about. > > Really? �Not sure this is true, because numerics can be toast-compressed. > It hardly ever happens, but to do this that's not good enough. I was thinking of it like PG_GETARG_TEXT_PP() and similar - we would detoast compressed and external datums, but leave packed ones as-is. At that point you should have an accurate length count, and can decide what to do. >> 2. Don't untoast/don't copy. > > This would be good, but I'm not sure how to do it. �The main problem > again is NumericDigit alignment. �Only about half the time is the digit > array going to be aligned the way you need, so that puts a real crimp > in the possible win. �(In fact, if we assume the previous field is more > than byte aligned and the toast header is one byte, then the digit array > is *never* properly aligned on disk :-() This is another reason why I think a 1-byte numeric header would be good to have. > One possibility is to have an additional toasting rule that forces > odd-byte-alignment of a field's one-byte header. �But it's a bit hard to > argue that numeric deserves the additional overhead that that would put > into all the core tuple forming/deforming logic. Yeah, plus we'd be adding more alignment padding for an extremely tenuous performance gain. The benchmarks I did this morning seem to indicate that the extra palloc/pfree/memcpy overhead is only barely more than zero, so it only makes sense if we can get it without suffering other penalties. >> 3. 64-bit arithmetic. �Right now, mul_var() and div_var() use int for >> arithmetic, but haven't we given up on supporting platforms without >> long long? �I'm not sure I'm motivated enough to write the patch >> myself, but it seems like 64-bit arithmetic would give us a lot more >> room to postpone carries. > > I don't think this would win unless we went to 32-bit NumericDigit, > which is a problem from the on-disk-compatibility standpoint, This would increase the average size of a Numeric value considerably, so it would be a very BAD thing IMO. > not to > mention making the alignment issues even worse. �Postponing carries is > good, but we have enough headroom for that already --- I really doubt > that making the array elements wider would save anything noticeable > unless you increase NBASE. I dunno, it was just a thought, based on some quick benchmarking that indicated some possible hotspots in that area. But I didn't test it carefully enough to be sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 4 Aug 2010 19:27 Robert Haas <robertmhaas(a)gmail.com> writes: > On Wed, Aug 4, 2010 at 4:07 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: >> This would be good, but I'm not sure how to do it. �The main problem >> again is NumericDigit alignment. �Only about half the time is the digit >> array going to be aligned the way you need, so that puts a real crimp >> in the possible win. �(In fact, if we assume the previous field is more >> than byte aligned and the toast header is one byte, then the digit array >> is *never* properly aligned on disk :-( > This is another reason why I think a 1-byte numeric header would be > good to have. Hmm. That's a good point --- 1-byte toast header plus 1-byte numeric header would leave you correctly aligned, anytime the previous field didn't end on an odd byte boundary. So maybe the combination of both things would have enough synergy to be worth the trouble. Still, it seems like it'd be quite messy to deal with 1-byte header followed by NumericDigits without any padding ... there'd be no way to declare that as a C struct, for sure. Have you got a plan for what this would actually look like in code? Also, maybe this idea should supersede the one with two-byte numeric header. I'm not sure it's worth having three variants, and we are not at all committed to the two-byte version yet. >> I don't think this would win unless we went to 32-bit NumericDigit, >> which is a problem from the on-disk-compatibility standpoint, > This would increase the average size of a Numeric value considerably, > so it would be a very BAD thing IMO. Oh, I certainly wasn't advocating for doing that ;-) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 4 Aug 2010 21:27 On Wed, Aug 4, 2010 at 7:27 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas(a)gmail.com> writes: >> On Wed, Aug 4, 2010 at 4:07 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: >>> This would be good, but I'm not sure how to do it. �The main problem >>> again is NumericDigit alignment. �Only about half the time is the digit >>> array going to be aligned the way you need, so that puts a real crimp >>> in the possible win. �(In fact, if we assume the previous field is more >>> than byte aligned and the toast header is one byte, then the digit array >>> is *never* properly aligned on disk :-( > >> This is another reason why I think a 1-byte numeric header would be >> good to have. > > Hmm. �That's a good point --- 1-byte toast header plus 1-byte numeric > header would leave you correctly aligned, anytime the previous field > didn't end on an odd byte boundary. �So maybe the combination of both > things would have enough synergy to be worth the trouble. �Still, > it seems like it'd be quite messy to deal with 1-byte header followed > by NumericDigits without any padding ... there'd be no way to declare > that as a C struct, for sure. �Have you got a plan for what this would > actually look like in code? No. I was hoping you'd have a brilliant idea. Generally, I think we'd need to treat a "Numeric" as essentially a void * and probably lose the special cases that try to operate directly on the packed format. That would allow us to confine the knowledge of the multiple header formats to the pack/unpack functions (set_var_from_num and make_result). > Also, maybe this idea should supersede the one with two-byte numeric > header. �I'm not sure it's worth having three variants, and we are > not at all committed to the two-byte version yet. It's a thought, but let's not get ahead of ourselves. The code for the two-byte header code is done, tested, reviewed, and committed, whereas the code for the one-byte header is vaporware and full of difficulties. Furthermore, let's not kid ourselves: a broad range of useful values can be represented using a one-byte header, but to need a four-byte header instead of a two-byte header you need to be doing something fairly ridiculous. Even if the one-byte header thing gets implemented, I don't think it makes sense to give back 2 bytes on all the fine things that can be represented with a two-byte header for some tenuous code complexity benefit. >>> I don't think this would win unless we went to 32-bit NumericDigit, >>> which is a problem from the on-disk-compatibility standpoint, > >> This would increase the average size of a Numeric value considerably, >> so it would be a very BAD thing IMO. > > Oh, I certainly wasn't advocating for doing that ;-) Oh, good. :-) Making this smaller is too much work to think about doing *anything* that might make it bigger. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
|
Pages: 1 Prev: [HACKERS] more numeric stuff Next: [HACKERS] Using Small Size SSDs to improve performance? |