From: Peter J. Holzer on 26 Jun 2010 15:02 On 2010-06-26 18:05, David Combs <dkcombs(a)panix.com> wrote: > In article <87y6exwo6o.fsf(a)lifelogs.com>, > Ted Zlatanov <tzz(a)lifelogs.com> wrote: >>On Tue, 01 Jun 2010 15:37:34 -0400 "Uri Guttman" <uri(a)StemSystems.com> wrote: >> >>UG> he said he wanted 1k random numbers out of a large range so a hash would >>UG> be fine for that. >> >>You're right, I wasn't paying enough attention. > > What, you're saying a div and a mod (let's assume 64 bits, if integet) to > find the desired bit, versus dealing each time with a hash bucket to choose > an ensuing chain to hunt down through? > > So you make the bucket-array longer, so the chains are short -- get it > long enough so only very, very few chains are longer than one (most > being zero) in length, and you're almost emulating the bit-vector > scheme, but taking up FAR FAR more space. No, for the use case given (1k random numbers out of a large range) a hash takes *less* space. The bit vector takes (max-min)/8 bytes. So for 0 .. 100_000_000 it takes approximately 12.5 MB. 1000 hash entries take (depending on pointer size, malloc implementation, etc.) something like 60 to 150 kB. So that's about 1/100th of the space. I'd still think a bit vector should be faster. hp
From: Tad McClellan on 26 Jun 2010 18:26 David Combs <dkcombs(a)panix.com> wrote: > I see it You see what "it"? > differently. Differently from what? > He's got every right to get pissed off from time to time. Who "he"? Please follow the standard practice of quoting some context in followups. -- Tad McClellan email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/" The above message is a Usenet post. I don't recall having given anyone permission to use it on a Web site.
From: Ted Zlatanov on 29 Jun 2010 10:37 On Sat, 26 Jun 2010 21:02:19 +0200 "Peter J. Holzer" <hjp-usenet2(a)hjp.at> wrote: PJH> On 2010-06-26 18:05, David Combs <dkcombs(a)panix.com> wrote: >> In article <87y6exwo6o.fsf(a)lifelogs.com>, >> Ted Zlatanov <tzz(a)lifelogs.com> wrote: >>> On Tue, 01 Jun 2010 15:37:34 -0400 "Uri Guttman" <uri(a)StemSystems.com> wrote: >>> UG> he said he wanted 1k random numbers out of a large range so a hash would UG> be fine for that. >>> >>> You're right, I wasn't paying enough attention. >> >> What, you're saying a div and a mod (let's assume 64 bits, if integet) to >> find the desired bit, versus dealing each time with a hash bucket to choose >> an ensuing chain to hunt down through? >> >> So you make the bucket-array longer, so the chains are short -- get it >> long enough so only very, very few chains are longer than one (most >> being zero) in length, and you're almost emulating the bit-vector >> scheme, but taking up FAR FAR more space. PJH> No, for the use case given (1k random numbers out of a large range) a PJH> hash takes *less* space. PJH> The bit vector takes (max-min)/8 bytes. So for 0 .. 100_000_000 it takes PJH> approximately 12.5 MB. 1000 hash entries take (depending on pointer PJH> size, malloc implementation, etc.) something like 60 to 150 kB. So PJH> that's about 1/100th of the space. I'd still think a bit vector should PJH> be faster. Right. I somehow missed the 1K sample requirement and thought this was for any set of random numbers. Certainly a bit vector has better O(n) performance but the requirement is very far from hitting any O(n) bounds. Also as Peter pointed out memory usage is really high. Finally, Perl's hashes are very well optimized so I'd rely on them over Bit::Vector for performance with small sets. If I expected *clusters* of numbers in a large space, like say Unicode codepoints, I would use inversion lists (http://search.cpan.org/~teodor/Algorithm-InversionList-0.03/). If the domain was 32-bit or smaller ints and I expected at least 10% to be randomly picked, I would use Bit::Vector. For almost all other reasonable use cases, though, hashes would work better. Ted
First
|
Prev
|
Pages: 1 2 3 4 5 Prev: Troubles using Net::Pcap and Net::PcapUtils Next: How to take two input streams? |