Prev: Learn about proxy sites and how to use them to open blocked sites unlimited downloads from RapidShare and megaupload and increase the speed of the Internet with new sites for free
Next: Why can't I catch a SIGBUS on my Mac (when strlen(NULL) is used)?
From: Moi on 5 Apr 2010 18:54 On Mon, 05 Apr 2010 22:26:50 +0000, Nicolas George wrote: > Moi wrote in message > <455de$4bba44e4$5350c024$9115(a)cache110.multikabel.net>: >> I've one done a similar thing for a disk-exerciser: grab each diskblok >> exactly once, > > Grab each block exactly once: you said it. Peter's method does not do > that, it probably only accesses a very small fraction of the blocks. This is not that important. You only need to reference one byte to cause the the whole slot to be drawn in. Subsequent references for addresses in the same slot virtually come for free. Ergo: the best way to measure is by grabbing (and writing) small chunks of memory. (you could do the same for disk caching by referencing/dirtying only one byte per page) >> but in pseudo-random order, and sufficiently distant to avoid >> read-ahead to be effective. > > Such things make sense for disks because disks require mechanical > movements and thus are much more efficient for contiguous reads. That is > why filesystems drivers try to avoid fragmentation and implement > readahead. My example was not about disks, but about avoiding short cycles (loops) in permutations. > > And for the same reason, software and file formats try to make reads as > sequential as possible (for example interleaving audio and video data in > multimedia container formats). Which in turn makes readahead useful even > for flash-based devices. > > RAM does not have mechanical parts, and no bonus for sequential reads. > Furthermore, the read patterns are typically much more random: access a > pointer, then another, read until a 0 byte, access another pointer, etc. > The RAM cache system does not try to do simple readahead for that; > instead, it follows the flow of the program and analyzes and predicts > the memory accesses. > > Therefore, if the algorithm does not achieve cache locality at the scale > of a cache line (typically 32 to 128 octets), then the distance between > items is irrelevant. That is all correct. Apart from sizes and parameters, the only differences between memory and disk wrt caching are: 1) disks (sometimes) favor sequential reads and memory does not. 2) for memory, all cache misses are created equal. 3) for memory, the mapping to/from cache slots is mostly fixed, causing multiple stretches of addresses to compete for the same cache slot. HTH, AvK
From: Scott Lurndal on 5 Apr 2010 20:46 Moi <root(a)invalid.address.org> writes: >That is all correct. > >Apart from sizes and parameters, the only differences between >memory and disk wrt caching are: > >1) disks (sometimes) favor sequential reads and memory does not. Memory (sometimes) favors sequential (or striding) accesses. Modern memory controllers will prefetch cache lines if stride accesses are detected. > >2) for memory, all cache misses are created equal. Definitely incorrect. A cache miss that hits on the TLB is one thing, a cache miss that misses the TLB, but hits the page table cache is another thing, and a cache miss that misses the TLB and misses the page table cache is yet a third thing. Each of the above has different latency characteristics. A worst case TLB fill requires 22 (yes, twenty two!) memory accesses to resolve (when using SVM and nested page tables, with 4k mappings in all levels). When not using virtualization, a TLB fill can still cost 3 or 4 memory accesses (one for each level of the page table heirarchy), whereas a simple cache line fill is a single memory access (possibly two to handle the eviction). > >3) for memory, the mapping to/from cache slots is mostly fixed, > causing multiple stretches of addresses to compete for the same > cache slot. Actually, an address will compete for one of N ways at a particular cache slot, typical caches are organized with 4 or 8 ways. scott
From: Ian Collins on 5 Apr 2010 21:07 On 04/ 6/10 09:27 AM, Peter Olcott wrote: I know you are using an awful client, but please fix your quoting! > I was going to do the thread priority thing very simply. > Only one thread can run at a time, and the purpose of the > multiple threads was to make fast context switching using > thread local data. How do you intend doing that? it looks like an attempt at premature micro-optimisation. > As soon as a higher priority job arrives > activity in the current lower priority thread is suspended. You should head David's sage advice! This may well work, but using multiple thread priorities more often ends in tears. -- Ian Collins
From: Peter Olcott on 5 Apr 2010 21:49 "Ian Collins" <ian-news(a)hotmail.com> wrote in message news:81vfqpFa9rU3(a)mid.individual.net... > On 04/ 6/10 09:27 AM, Peter Olcott wrote: > > I know you are using an awful client, but please fix your > quoting! > >> I was going to do the thread priority thing very simply. >> Only one thread can run at a time, and the purpose of the >> multiple threads was to make fast context switching using >> thread local data. > > How do you intend doing that? it looks like an attempt at > premature micro-optimisation. > >> As soon as a higher priority job arrives >> activity in the current lower priority thread is >> suspended. > > You should head David's sage advice! This may well work, > but using multiple thread priorities more often ends in > tears. > > -- > Ian Collins I have to have some way of implementing multiple priorities. I am aiming for 500 ms response time for very small transactions from paying customers. I already know that there are many things that are out of my control that could easily screw this up. One thing that is not out of my control is how I handle differing priorities. If one of these high priority jobs takes at most 100 ms of processing time (I am allowing 400 ms for HTTP transport) Then a 3.5 minute job must be immediately put on hold. The complex part that David is talking about is having multiple concurrent threads with differing priority. I can't imagine how priority inversion could occur, but, then actually working with this stuff is new to me. What I am proposing is (much simpler) multiple threads where the higher priority one halts the next lower priority one, and there is never a time when more that one of these threads is running concurrently with the other threads. I don't see any pitfalls with this, but, then I have never yet done this before. David's proposal will not work because the 3.5 minute thread would have to finish before the 50 ms thread could begin. This would provide intolerably long response time for the 50 ms thread.
From: David Schwartz on 5 Apr 2010 22:59 On Apr 5, 6:49 pm, "Peter Olcott" <NoS...(a)OCR4Screen.com> wrote: > If one of these high priority jobs takes at most 100 ms of > processing time (I am allowing 400 ms for HTTP transport) > Then a 3.5 minute job must be immediately put on hold. The > complex part that David is talking about is having multiple > concurrent threads with differing priority. I can't imagine > how priority inversion could occur, but, then actually > working with this stuff is new to me. Like this: 1) Low priority job is running. 2) High priority job starts, pre-empts low priority job. 3) High priority job needs a lock low priority job was holding when it was pre-empted. 4) High priority job cannot run now until low priority job gets the CPU, which it won't do for quite some time because higher-priority tasks are running. The point is, creating threads with low priority can actually cause high priority threads to act as if their priority was lowered. This can result in worse performance, even for the high priority tasks, than if you didn't mess with thread priorities. > David's proposal will not work because the 3.5 minute thread > would have to finish before the 50 ms thread could begin. > This would provide intolerably long response time for the 50 > ms thread. Only if you design is such that the 3.5 minute thread would have to finish before the 50 ms thread began. Forget about which threads run -- make sure threads are only doing the work *you* want done (or, at worst, don't do work that's harmful to the forward progress you need to make). Then it won't matter much what the scheduler does. DS
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Prev: Learn about proxy sites and how to use them to open blocked sites unlimited downloads from RapidShare and megaupload and increase the speed of the Internet with new sites for free Next: Why can't I catch a SIGBUS on my Mac (when strlen(NULL) is used)? |