Prev: Learn about proxy sites and how to use them to open blocked sites unlimited downloads from RapidShare and megaupload and increase the speed of the Internet with new sites for free
Next: It's a fact not lost on the opportunity to see yourself
From: Ersek, Laszlo on 30 Mar 2010 05:05 On Mon, 29 Mar 2010, M�ns Rullg�rd wrote: > David Schwartz <davids(a)webmaster.com> writes: > >> On Mar 27, 7:21�am, "Peter Olcott" <NoS...(a)OCR4Screen.com> wrote: >> >>> That may slow things down too much, if I want true random I >>> an get a hardware random number generator dongle. For my >>> purposes I only need the numbers to be very widely >>> dispersed. I could generated my own pseudo random numbers >>> based on a lookup table. >> >> On Linux, you can open /dev/urandom and read an unlimited supply of >> random bits. So long as your distribution sets things up properly, >> they should pass most statistical tests for randomness and should be >> cryptographically strong. > > The problem here isn't so much the true randomness of the sequence as > the distribution and the avoiding of accesses turning into a short > loop. (The local nntp server I used to use seems to have a problem with forwarding my messages outwards. At least I was unable to find my messages with google, and they appear absent also from news.eternal-september.org. I went kind of crazy seeing this question discussed in multiple newsgroups and nobody reflecting on my pertinent(?) message -- reposted below. I'm switching to a different news client and the aforementioned news server, in the hope that I'll be able to repost Sattolo's name to all relevant newsgroups. Excuse the massive cross-posting. I set Followup-To to c.u.p., because that's where I saw the topic first.) lacos X-News: ludens comp.unix.programmer:167386 From: lacos(a)ludens.elte.hu (Ersek, Laszlo) Subject: Re: Can extra processing threads help in this case? Date: 24 Mar 2010 02:49:32 +0100 Message-ID: <RfITPGAL3QUd(a)ludens> In article <pe9k77-gjk.ln1(a)wilbur.25thandClement.com>, William Ahern <william(a)wilbur.25thandClement.com> writes: > Peter Olcott <NoSpam(a)ocr4screen.com> wrote: >> "Eric Sosman" <esosman(a)ieee-dot-org.invalid> wrote in >> message news:ho5tof$lon$1(a)news.eternal-september.org... > <snip> >> > But if there's another CPU/core/strand/pipeline, it's possible that one >> > processor's stall time could be put to productive use by another if >> > there were multiple execution threads. > <snip> >> is there any possible way that this app is not memory bound that you can >> provide a specific concrete example of? > > Your question was answered. > > You're hung up on your numbers and preconceived ideas. Your application > could be BOTH memory bound AND able to benefit from multiple CPUs. But it's > nearly impossible to guess without knowing at least the algorithm; more > specifically, the code. This thread has inspired me to write some code. I wrote it last night in a sleep deprivation induced delirium. (I almost modified the upper limit of LOG2_NEL from 31 to 32 before posting, but then decided to post the version with 31 so that it works where "size_t" is a 32-bit unsigned integer type with conversion rank at least that of "int".) I was fairly sure about knowing what it measures. Now I'm fairly sure I was stupid. So please tell me, I ask you kindly, what does the following code measure? Thank you. First some test results (inserting whitespace generously): $ time -p ./jump_around 1 starting threads 14.5209 s/thread real 16.43 user 16.37 sys 0.02 $ time -p ./jump_around 2 starting threads 15.571 s/thread real 17.64 user 32.95 sys 0.05 $ time -p ./jump_around 4 starting threads 15.587 s/thread real 33.25 user 64.16 sys 0.08 CPU (from <http://mysite.verizon.net/pchardwarelinks/current_cpus.htm>): Athlon 64 X2 6000+ MMX 3DNow! SSE SSE2 SSE3 (Windsor) (128-bit on-Die unbuffered DDR2 PC6400 mem controller, AMD-V) (dual core) 940 pins, 3000MHz (200x15), (64-bit dual-pumped bus), 1.4v 2x 64KB data (2-way) 2x 64KB instruction (2-way) 2x 1MB on-Die unified L2 (16-way exclusive) Here cometh the code. Please excuse the bugs, both the blatant and the insidious. Thanks, lacos /* $Id: jump_around.c,v 1.4 2010/03/23 00:02:22 lacos Exp $ This program creates a single-cycle permutation of [0, 2**LOG2_NEL - 1] in a static array with Sattolo's algorithm. Then it starts the specified number of threads. Each sub-thread will traverse the permutation NROUND times. Each thread starts at a different position in the loop (at offsets increasing simply one by one in the buffer). Since the shuffle is driven by a uniform distribution PRNG, the sub-threads should divide the loop more or less evenly among themselves. The program measures and prints the user CPU time consumed by a single sub-thread on average. Waiting for memory should be included in this value. Compile and link with eg. $ gcc -ansi -pedantic -Wall -Wextra -o jump_around jump_around.c -l pthread uint32_t and uint64_t are used only when their sizes are considered significant for some reason. */ #define _XOPEN_SOURCE 500 #include <stdlib.h> /* size_t, strtol(), EXIT_FAILURE, jrand48() */ #include <inttypes.h> /* uint32_t, uint64_t */ #include <assert.h> /* assert() */ #include <stdio.h> /* fprintf(), fflush() */ #include <sys/resource.h> /* getrusage() */ #include <pthread.h> /* pthread_create(), pthread_join() */ /* Number of rounds a single sub-thread traverses the cycle. */ #define NROUND 10u /* Highest number of sub-threads allowed on the command line. */ #define MAXTHR 4L /* Base two logarithm of the number of elements in the array (that is, log2 of the cycle length). Make sure it is in [log2(MAXTHR), 31]. */ #define LOG2_NEL 24 /* Number of elements; IOW, cycle length. */ #define NEL ((size_t)1 << LOG2_NEL) /* 28 == LOG2_NEL will result in a 1GB array. */ static uint32_t buf[NEL]; /* "*arg" points to a size_t object holding the start offset in the buffer. */ static void * meander(void *arg) { unsigned round; for (round = 0u; round < NROUND; ++round) { size_t pos, cnt; pos = *(const size_t *)arg; for (cnt = 0u; cnt < NEL; ++cnt) { pos = buf[pos]; } assert(pos == *(const size_t *)arg); } return 0; } int main(int argc, char **argv) { long nthr; /* Get number of sub-threads. */ { char *endptr; if (2 != argc || (nthr = strtol(argv[1], &endptr, 10)) < 1L || nthr > MAXTHR || '\0' != *endptr) { (void)fprintf(stderr, "pass number of threads (1-%ld)\n", MAXTHR); return EXIT_FAILURE; } } /* Initialize array with Sattolo's algorithm. */ { size_t cnt; short unsigned xsubi[3] = { 1u, 2u, 3u }; /* Identity permutation. */ for (cnt = 0u; cnt < NEL; ++cnt) { buf[cnt] = cnt; } /* Swaps. */ for (cnt = 0u; cnt < NEL - 1u; ++cnt) { uint32_t save; size_t rnd; save = buf[cnt]; /* Choose a pseudo-random number in [0, NEL - 2u - cnt] with uniform distribution. (uint32_t)jrand48(xsubi) covers [0, 2**32 - 1]. See also the constraint on LOG2_NEL. */ rnd = (uint64_t)(uint32_t)jrand48(xsubi) * ((NEL - 2u - cnt) + 1u) >> 32; buf[cnt] = buf[cnt + 1u + rnd]; buf[cnt + 1u + rnd] = save; } } (void)fprintf(stderr, "starting threads\n"); { int ret; struct rusage start, stop; ret = getrusage(RUSAGE_SELF, &start); assert(0 == ret); { long widx; size_t startpos[MAXTHR]; pthread_t worker[MAXTHR]; for (widx = 0L; widx < nthr; ++widx) { startpos[widx] = widx; ret = pthread_create(worker + widx, 0, &meander, startpos + widx); assert(0 == ret); } for (widx = 0L; widx < nthr; ++widx) { ret = pthread_join(worker[widx], 0); assert(0 == ret); } } ret = getrusage(RUSAGE_SELF, &stop); assert(0 == ret); (void)fprintf(stdout, "%Lg s/thread\n", (((long double)stop.ru_utime.tv_usec - start.ru_utime.tv_usec) / 1000000.0L + stop.ru_utime.tv_sec - start.ru_utime.tv_sec) / nthr); if (EOF == fflush(stdout)) { return EXIT_FAILURE; } } return EXIT_SUCCESS; } |