From: Johannes Baagoe on 14 Apr 2010 16:40 In the process of refining my PNRGs, I stumbled on a problem: the lack of good hash functions. One does not need them that often in javascript, since in most of their uses in C or other lower-level languages, Objects provide the required association between arbitrary strings and whatever one wants to look up, or count, or whatever. But sometimes, they are useful, in my case, to seed the PRNGs. I first thought I could use well-established methods, like the Fowler–Noll–Vo family of hashes, which seemed to meet the needs. But that does not work: 32-bit FNV supposes that multiplications of two 32-bit integers will retain the 32 lowest bits of the result. This is not the case with ECMAScript Numbers, which are always "double-precision 64-bit binary format IEEE 754 values". That implies 1. that in the case of overflow, the *lower* bits get discarded, and 2. that one never retains more than 53 significant bits - 54 if you count the sign. (For the same reason, one cannot implement the classic LCG PRNG in a naive way. Marsiglia's "easy to remember" multiplier 69069 is all right, since the result of its multiplication by a 32-bit integer will always fit in 53 bits. But 1103515245, which is widely used, leads to overflow, which ruins the modulo 2^32 assumption.) As a further, minor complication, FNV and the like still assume that characters are bytes. ECMAScript String.prototype.charCodeAt returns "a Number (a nonnegative integer less than 2^16)". I have therefore attempted to write a general purpose hash function for ECMAScript. It is called with `hash(data, n)`, where data is whatever you want to hash, and the optional n the intended number of buckets. (If you do not provide n, the result is a Number in [0, 1[, just like Math.random(), but of course with a deterministic relationship to the data, and with 53 bits precision.) The results should be well distributed: changing just one bit in the data should usually result in the change of about half the 32 first bits of the result, and somewhat less in the 21 lowest. The theory will be explained on my Random Musings pages Really Soon Now. In the meantime, you are welcome to try to figure it out, and to prove me wrong when I claim that it distributes *any* data over *any* number of buckets in a suitably pseudo-random way :) (I haven't tested it very extensively yet, so you might succeed.) Here is the code: function hash(data, n) { var norm = Math.pow(2, -32); var a = 2095533; var s = 0, c = 1; var t, t0 = 0; data = data.toString(); for (var i = 0; i < data.length; i++) { s -= data.charCodeAt(i) * 65537 * norm; if (s < 0) { s += 1; } t = a * s + c * norm; t0 = s = t - (c = t | 0); t = a * s + c * norm; s = t - (c = t | 0); } if (n) { return Math.floor(n * (s + t0 * norm)); } else { return s + t0 * norm; } } -- Johannes
From: Scott Sauyet on 14 Apr 2010 17:13 Johannes Baagoe wrote: > In the process of refining my PNRGs, I stumbled on a problem: the > lack of good hash functions. [ ... ] > > I first thought I could use well-established methods, like the > FowlerNollVo family of hashes, which seemed to meet the needs. > > But that does not work [ ... ] Have you had an experience of unexpected collisions with FNV? -- Scott
From: Johannes Baagoe on 14 Apr 2010 17:34 Scott Sauyet : > Have you had an experience of unexpected collisions with FNV? No, what I experienced was totally different outputs from the javascript and the C versions of my PRNGs. Which, retrospectively, is hardly surprising - things happen very differently in the two languages when you multiply a 32-bit non negative integer by 16777619 ! That being said, it should not be too difficult to construct a horrible test case for a naive port of 32-bit FNV to javascript. It is not among my priorities just now, but I may give it a try some day if nobody else does it before. NB: FNV works just fine on languages that have proper 32-bit integers, and it could, obviously, be ported properly to javascript by the means of a multi-precision multiplication routine. But one cannot simply cut and paste, and change all the "Fnv32_t"s to "var"s... Which is essentially what I did, and I would be surprised if I were the only one ever to do so. -- Johannes
From: nick on 14 Apr 2010 20:11 On Apr 14, 4:40 pm, Johannes Baagoe <baa...(a)baagoe.com> wrote: > [...] > It is called with `hash(data, n)`, where data is > whatever you want to hash, and the optional n the intended number > of buckets. > [...] Thanks Johannes! This looks incredibly useful; I was just looking for something like this that doesn't require a thousand lines of code and a crazy lookup table. I'll definitely be making use of this. A question: I thought it would be cool to see this thing generate string hashes. Here's what I came up with... please let me know if I'm inadvertently breaking the rest of your code with these additions? I don't fully understand your algorithm but I think this should be ok. function hash(data, n, asString) { var norm = Math.pow(2, -32); var a = 2095533; var s = 0, c = 1; var t, t0 = 0; var r; data = data.toString(); for (var i = 0; i < data.length; i++) { s -= data.charCodeAt(i) * 65537 * norm; if (s < 0) { s += 1; } t = a * s + c * norm; t0 = s = t - (c = t | 0); t = a * s + c * norm; s = t - (c = t | 0); } if (n) { r = Math.floor(n * (s + t0 * norm)); } else { r = s + t0 * norm; } // return as string if (asString) { if (n) r = 1 / (r + 1.123); r = r.toString(36).substr(2); } return r; } Thanks again! -- Nick
From: Johannes Baagoe on 15 Apr 2010 03:00 nick : > A question: I thought it would be cool to see this thing generate string > hashes. Here's what I came up with... please let me know if I'm > inadvertently breaking the rest of your code with these additions? I > don't fully understand your algorithm but I think this should be ok. > function hash(data, n, asString) { [...] > // return as string > if (asString) { > if (n) r = 1 / (r + 1.123); > r = r.toString(36).substr(2); > } > return r; > } It doesn't break anything, obviously, but just out of curiosity: why do you need that ? -- Johannes
|
Next
|
Last
Pages: 1 2 3 Prev: Need help understanding javascript syntax Next: My (Your) Library: canCall |