From: David Schwartz on 9 May 2010 22:15 On May 9, 6:12 pm, phil-news-nos...(a)ipal.net wrote: > How is it that a scheduler has anything to do with this? The scheduler is specifically designed to allocate timeslices to ready to run threads such that the number of context switches is low enough that they don't impact performance. This design will work quite well unless you do something stupid. For example, if you use a process-per-connection design and need to do a tiny bit of work for each of 150 connections, you will need about 150 context switches. But that's because you did something stupid. So long as you don't do something stupid like that, the cost of context switches is lost in the noise because the scheduler will not make very many of them. > If you have > different tasks under different thread or different processes, then the > scheduler is forced to make a context switch when something different > has to be done, and that's in a different thread/process. There is no such thing as "tasks under different thread". Threads share all memory, all file descriptors, everything. There is no way something can be "stuck in the wrong thread" unless you specifically design things that way. Assuming a sane designer, he would only allow tasks to get "stuck to a thread" where that didn't harm performance. And, of course, you can always shoot yourself in the foot. However, if the process that holds a file descriptor is not running, no forward progress can be made without a context switch. > The win for > threads is the context switch to another thread within the same process > is cheaper than a context switch between processes. No. That is not why threads are a win. That is, as I've tried to explain, a common misconception. It's like saying jet planes are a win over bicycles because you don't have to pedal them. Threads are a win over processes because it makes no difference which thread runs. The process makes forward progress so long as any ready- to-run thread gets the CPU. That is, in a properly designed multi- threaded application, the amount of work done before a context switch will be needed will be much higher. DS
From: Scott Lurndal on 10 May 2010 10:29 phil-news-nospam(a)ipal.net writes: >On 09 May 2010 23:08:21 GMT Scott Lurndal <scott(a)slp53.sl.home> wrote: > >| [*] Up to 22 memory references when using nested page tables, depending on >| processor page directory cache hit rate; this can be reduce to 11 if the >| nested page table uses 1GB pages sizes (vice 4 or less without using SVM). > >Is the page table also stored in cache, even if also in the TLB? Depends on: 1) how recently used and 2) The cache eviction behavior. Generally, I wouldn't count on any of the PTE entries being present in the processor cache, you might find one or more of the intermediate entries (PML4, PDP, PD) in a shared L3 cache of sufficient size, but I wouldn't count on it. Both AMD and Intel have special PML4/PDP/PD caches in the processor to help make TLB fills a bit more efficient. The PML4 entry will likely be cached (since there are only two entries available in the PML4, one for the lower 512GB, and one for the uppert 512GB) and the PML4 is the first reference on a table walk. Consider a page table mapping 1TB of memory with 4k pages. This requires two gigabytes of memory just for the page tables. Consider then, that each process must have it's own page table, and you'll see that the processor cache has little benefit for TLB fills. scott
From: Scott Lurndal on 10 May 2010 10:32 David Schwartz <davids(a)webmaster.com> writes: >On May 9, 4:08=A0pm, sc...(a)slp53.sl.home (Scott Lurndal) wrote: > >> Threads are a performance win because they don't need to flush the TLB's >> on context switches between threads in the same process. > >Nope. That's like saying that cars are faster than bicycles because >they don't have pedals. While it's true that threads are a performance >win and it's true that context switches between threads of the same >process are faster than context switches between threads from >different processes, the latter does not cause the former. > >> A thread context switch is enormously less >> expensive than a process context switch. =A0 The larger the page size, >> the better. > >It doesn't matter. In any sensible threaded application, there will be >so few context switches that making them faster will be lost in the >noise. I've never seen a thread that doesn't require a context switch, aside from the user-level M-N threads in the old SVR4.2MP threads library, and that was also a context switch, just done in the library rather than the kernel. If you degenerate your system to a single thread per core, and only have one process (i.e. a real-time embedded) system, then there won't be many context switches between threads. However, in real-world threaded applications there _are_ context switches, and there are _many_ context switches, and a thread context switch is more efficient than a process context switch. scott
From: Scott Lurndal on 10 May 2010 10:35 phil-news-nospam(a)ipal.net writes: >On 09 May 2010 23:15:08 GMT Scott Lurndal <scott(a)slp53.sl.home> wrote: >| David Schwartz <davids(a)webmaster.com> writes: >|>On May 9, 12:37=A0am, Golden California Girls <gldncag...(a)aol.com.mil> >|>wrote: >|> >|>> That depends upon what you call a context switch. =A0Somehow I think to >|>> switch threads you have to somehow save and restore a few registers, the >|>> Program Counter for sure, unless you have more cores than threads. =A0The >|>> more registers that have to be exchanged the longer the switching time. >|> >|>Compared to blowing out the code and data caches, the time it takes to >|>save and restore a few registers is meaningless. >|> >|>DS >| >| It's not the caches, so much, as it is the TLB's. The caches (at least >| on physically indexed architectures like Intel/AMD's) are not flushed on a >| context switch; either a thread context switch or process context switch >| may or may not result in a subsequent cache miss - that depends on many >| factors. A thread switch is less likely to see a subsequent cache miss, >| however. > >However, once the context switch to a new VM does take place, the cache that >pointed to the previous process is useless (except for shared parts since >this is a physical/real address caching architecture). Indeed. The shared parts are key. Context switches between VM's are a special case, and AMD has some help for the TLB's in this case by associating a ASID with the VM. However, this is orthogonal to the point I made above about thread switches within a process being more efficient than thread switches between processes. scott
From: Rainer Weikusat on 10 May 2010 10:40
David Schwartz <davids(a)webmaster.com> writes: > On May 9, 4:08�pm, sc...(a)slp53.sl.home (Scott Lurndal) wrote: [...] >> A thread context switch is enormously less >> expensive than a process context switch. � The larger the page size, >> the better. > > It doesn't matter. In any sensible threaded application, there will be > so few context switches that making them faster will be lost in the > noise. Dedicating threads to particular subtasks of something which is supposed to be done is also a sensible way to design 'a threaded application', just one which is rather geared towards simplicity of the implementation than maximum performance. Because a thread context switch is cheaper than a process context switch, such simple designs are useful for a wider range of tasks when using threads instead of processes. |