From: Johannes Baagoe on 31 May 2010 17:53 Sometime ago, I proposed a hash function for javascript, with the warning that it was not extensively tested, and that you might succeed in proving me wrong in the claim that "it distributes *any* data over *any* number of buckets in a suitably pseudo-random way". It doesn't, I'm sorry to say. A minimal sanity test is to hash all the words in a largish dictionary, and count collisions. Take for example the /usr/share/dict/british-english-insane wordlist (it is the `wbritish-insane` package on Ubuntu). Version 6.3 has 638 286 words. For 32-bit hash values, one would expect in the neighbourhood of 47 collisions, say, plus or minus a dozen or so. (The formula is here: http://en.wikipedia.org/wiki/Birthday_problem#Collision_counting) Anything above 70 is definitely too large. The function I proposed fails miserably - it has 1079 collisions. Don't use it. How do I know? Because I wrote a simple function that counts collisions in any kind of disk file, for whatever hash function one supplies : function collisions(file, hash) { var seen = []; var nCollisions = 0; var words = FileIO(file, 'r', 'utf-8'); while (! words.eof()) { var word = words.getLine(); var h = hash(word); if (seen[h]) { nCollisions++; } else { seen[h] = true; } } return nCollisions; } That is very straightforward, *except* for the FileIO function. (I gave it a capital initial, since it returns an Object. It is not a proper javascript constructor, though, since it doesn't return `this` and indeed doesn't use prototypes at all. I never do any more, I have yet to be convinced that the entire concept is not a misguided attempt to force round pegs into square holes. Convince me if you can.) That is where node.js comes in. FileIO is defined like this, in its own FileIO.js file : var fs = require('fs'); exports.FileIO = function(fileName, mode, encoding) { var fd = fs.openSync(fileName, mode); var offset = 0; function read() { return fs.readSync(fd, 1024, offset, encoding); } return { eof: function() { return read()[1] <= 0; }, getLine: function() { var line = read()[0].split('\n')[0]; offset += line.length + 1; return line; }, rewind: function() { offset = 0; } }; } It is far from perfect: no error checking, synchronous operation only, bare minimum functionality, wasteful calls of `read`, etc. But it gets the job done. Suggestions for improvement are welcome, as are other comments. -- Johannes
|
Pages: 1 Prev: i hate java script Next: FAQ Topic - What is a native object? (2010-06-01) |