Prev: Fwd: [HACKERS] gSoC - ADD MERGE COMMAND - code patch submission
Next: Review: Row-level Locks & SERIALIZABLE transactions, postgres vs. Oracle
From: "Kevin Grittner" on 17 Jul 2010 11:39 Andrew Geery <andrew.geery(a)gmail.com> wrote: > The HYPOT macro executed 100 million times in 11 seconds and the > phypot function executed the same number of times in 22 seconds. Or, to put that another way, the new function adds 110 nanoseconds to each hypotenuse calculation. > With both -O2 and -O3, the HYPOT macro executed in 8 seconds and > the phypot in 18. Or, with these optimizations, it adds 100 nanoseconds to each hypotenuse calculation. I don't know about others, but that seems a reasonable price to pay to eliminate all possible underflows, all overflows except where the result doesn't fit in the result data type, and make NaN and Inf processing more standard compliant. > I found that the difference in the two calculations were always > less than 0.000001. However, about a third of the calculations > differed at one more magnitude of precision (that is, there were > differences in the calculations that were greater than 0.0000001). That's harder for me to evaluate in terms of whether it's acceptable. It *is* an *approximate* data type, and the differences are all less than 0.0001%; however, that seems like a pretty weak guarantee if it's making the answer less accurate. If we could take some of these cases with relatively large differences and see which of the calculations is more accurate, that might help make a decision. I'm not sure which technique would tend to be more accurate. Since they're algebraically equivalent, and what we're using now pushes toward underflow and overflow more readily, it seems possible that the differences will generally reflect a greater accuracy in the patch's technique. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 17 Jul 2010 15:19 "Kevin Grittner" <Kevin.Grittner(a)wicourts.gov> writes: > Andrew Geery <andrew.geery(a)gmail.com> wrote: >> I found that the difference in the two calculations were always >> less than 0.000001. However, about a third of the calculations >> differed at one more magnitude of precision (that is, there were >> differences in the calculations that were greater than 0.0000001). > That's harder for me to evaluate in terms of whether it's > acceptable. It *is* an *approximate* data type, and the differences > are all less than 0.0001%; however, that seems like a pretty weak > guarantee if it's making the answer less accurate. If we could take > some of these cases with relatively large differences and see which > of the calculations is more accurate, that might help make a > decision. I'm not sure which technique would tend to be more > accurate. Since they're algebraically equivalent, and what we're > using now pushes toward underflow and overflow more readily, it > seems possible that the differences will generally reflect a greater > accuracy in the patch's technique. Hm ... it's been twenty years since I did any serious numerical analysis hacking, but ... offhand this looks to me like it's about as accurate as the straightforward way, not really better or worse. Ignoring overflow/underflow, the basic knock on the naive expression is that if x is much bigger than y, you lose most or all of the significant digits in y when you add their squares. For instance, if x is 1e8 * y, then y*y fails to affect the sum at all (given typical float8 arithmetic), and you'll get back sqrt(x*x) even though y should have been able to affect the result at the 8th place or so. In the patch's calculation, y/x is computed accurately but then we'll lose the same precision when we form 1 + yx*yx --- the result will be just 1 if y is lots smaller than x. If we were feeling tense about this, we could look for an alternate way of calculating sqrt(1 + yx*yx) that doesn't lose so much accuracy. In principle I think that's doable since this expression is related to ln(1+x) which can be calculated accurately even for very small x. I'm not convinced that it's worth troubling over though, seeing that no real attention has been paid to numerical stability anywhere else in the geometric functions. I think the patch is good in principle; what could be improved about it is: 1. It should just redefine HYPOT(x,y) as pg_hypot(x,y), rather than touching all the call sites --- not to mention possibly breaking third-party code depending on the HYPOT macro. (That possibility also leads me to think that the function shouldn't be static, but should be exported in geo_decls.h.) 2. The comments in the new function leave something to be desired, eg the discussion of the zero case is as clear as mud IMO. BTW, the function comment claims that SUS requires a NAN result for hypot(NAN,INF), but I don't believe it. If it did say that it would be contrary to ISO C: -- hypot(x, y) returns +INF if x is infinite, even if y is a NaN. Anyway, if you read SUS' HUGE_VAL as meaning INF, that clause comes before the one about NAN so I think it's saying the same thing as the other standards. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Dean Rasheed on 18 Jul 2010 04:31 On 17 July 2010 20:19, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > ... �For instance, if x is 1e8 * y, then y*y > fails to affect the sum at all (given typical float8 arithmetic), and > you'll get back sqrt(x*x) even though y should have been able to affect > the result at the 8th place or so. �In the patch's calculation, y/x is > computed accurately but then we'll lose the same precision when we form > 1 + yx*yx --- the result will be just 1 if y is lots smaller than x. > No. If x is 1e8 * y, then y will only affect the result in the 16th place. You can see this if you do a simple series expansion: sqrt(1+yx^2) = 1 + 1/2 yx^2 + O(yx^4) > If we were feeling tense about this, we could look for an alternate way > of calculating sqrt(1 + yx*yx) that doesn't lose so much accuracy. > In principle I think that's doable since this expression is related to > ln(1+x) which can be calculated accurately even for very small x. This algorithm is about as accurate as it could possibly be. The point with ln(1+x) is that for small x: ln(1+x) = x + O(x^2) so you would loose precision if x were much smaller than 1. This is not the case with sqrt(1+x). For most cases, the new algorithm is no more accurate than the old one. The exception is when *both* x and y are very small. In this case, it protects against incorrect underflows to 0. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 18 Jul 2010 10:50 Dean Rasheed <dean.a.rasheed(a)gmail.com> writes: > No. If x is 1e8 * y, then y will only affect the result in the 16th > place. You can see this if you do a simple series expansion: > sqrt(1+yx^2) = 1 + 1/2 yx^2 + O(yx^4) Sigh, I went looking for that expansion yesterday and didn't find it. Should've tried harder. I was relying on a gut feeling that it would behave approximately like ln(1+x). > For most cases, the new algorithm is no more accurate than the old > one. The exception is when *both* x and y are very small. In this > case, it protects against incorrect underflows to 0. Yeah, I think you're right. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 23 Jul 2010 16:01
Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > I think the patch is good in principle Since everyone seems to agree this is a good patch which needed minor tweaks, and we haven't heard from the author, I just went ahead and made the changes. Andrew, could you take another look and see if you agree I've covered it all before it gets marked ready for a committer? -Kevin |