From: Scott Lurndal on 9 May 2010 19:15 David Schwartz <davids(a)webmaster.com> writes: >On May 9, 12:37=A0am, Golden California Girls <gldncag...(a)aol.com.mil> >wrote: > >> That depends upon what you call a context switch. =A0Somehow I think to >> switch threads you have to somehow save and restore a few registers, the >> Program Counter for sure, unless you have more cores than threads. =A0The >> more registers that have to be exchanged the longer the switching time. > >Compared to blowing out the code and data caches, the time it takes to >save and restore a few registers is meaningless. > >DS It's not the caches, so much, as it is the TLB's. The caches (at least on physically indexed architectures like Intel/AMD's) are not flushed on a context switch; either a thread context switch or process context switch may or may not result in a subsequent cache miss - that depends on many factors. A thread switch is less likely to see a subsequent cache miss, however. A cache miss is a single memory reference. A TLB miss is 2 (1gb pages) to 4 (4k pages) to 22 (4k pages with 4k nested page table in virtual machine) memory references. And _if_ the scheduler needs to do a global TLB shootdown on a process switch, that requires bringing all processors in the system to a barrier during the switch (blech). Fortunately, global shootdowns are only necessary when the page table itself is changed (like by mmap, brk, etc). Cache miss rates are best controlled by the scheduler trying not to move threads from one core to another. scott
From: phil-news-nospam on 9 May 2010 21:12 On Sat, 8 May 2010 16:18:42 -0700 (PDT) David Schwartz <davids(a)webmaster.com> wrote: | On May 8, 1:11�pm, phil-news-nos...(a)ipal.net wrote: | |> Threads often are a performance win ... NOT because they allow faster |> sharing between tasks (not always needed) ... BUT just because context |> switches between threads of the same virtual memory space are faster. | | This is a common misconception. Threads are rarely, if ever, a | performance win because they make context switches faster. Threads are | primarily a performance win because they minimize the need for context | switches. A threaded web server can do a little bit of work for each | of a thousand clients without even a single context switch. And I've done the same thing in just one process, too. But it makes for more complicated code. It defeats using forms of abstraction that lets programmers focus less on the mechanics and more on the objective. | If you are having so many context switches that the cost of a context | switch shows up on your performance radar, you are doing something | horribly wrong. Schedulers are specifically designed to ensure that | context switches are infrequent, and you would have to be putting some | disastrous pressures on them for that design to fail to do its job. How is it that a scheduler has anything to do with this? If you have different tasks under different thread or different processes, then the scheduler is forced to make a context switch when something different has to be done, and that's in a different thread/process. The win for threads is the context switch to another thread within the same process is cheaper than a context switch between processes. -- ----------------------------------------------------------------------------- | Phil Howard KA9WGN | http://linuxhomepage.com/ http://ham.org/ | | (first name) at ipal.net | http://phil.ipal.org/ http://ka9wgn.ham.org/ | -----------------------------------------------------------------------------
From: phil-news-nospam on 9 May 2010 21:17 On 09 May 2010 23:15:08 GMT Scott Lurndal <scott(a)slp53.sl.home> wrote: | David Schwartz <davids(a)webmaster.com> writes: |>On May 9, 12:37=A0am, Golden California Girls <gldncag...(a)aol.com.mil> |>wrote: |> |>> That depends upon what you call a context switch. =A0Somehow I think to |>> switch threads you have to somehow save and restore a few registers, the |>> Program Counter for sure, unless you have more cores than threads. =A0The |>> more registers that have to be exchanged the longer the switching time. |> |>Compared to blowing out the code and data caches, the time it takes to |>save and restore a few registers is meaningless. |> |>DS | | It's not the caches, so much, as it is the TLB's. The caches (at least | on physically indexed architectures like Intel/AMD's) are not flushed on a | context switch; either a thread context switch or process context switch | may or may not result in a subsequent cache miss - that depends on many | factors. A thread switch is less likely to see a subsequent cache miss, | however. However, once the context switch to a new VM does take place, the cache that pointed to the previous process is useless (except for shared parts since this is a physical/real address caching architecture). | A cache miss is a single memory reference. A TLB miss is 2 (1gb pages) to | 4 (4k pages) to 22 (4k pages with 4k nested page table in virtual machine) | memory references. And _if_ the scheduler needs to do a global TLB shootdown | on a process switch, that requires bringing all processors in the system to | a barrier during the switch (blech). Fortunately, global shootdowns are only | necessary when the page table itself is changed (like by mmap, brk, etc). TLB is certainly the biggie. But the cache still is a factor. | Cache miss rates are best controlled by the scheduler trying not to move threads | from one core to another. Or the whole process between CPUs. -- ----------------------------------------------------------------------------- | Phil Howard KA9WGN | http://linuxhomepage.com/ http://ham.org/ | | (first name) at ipal.net | http://phil.ipal.org/ http://ka9wgn.ham.org/ | -----------------------------------------------------------------------------
From: phil-news-nospam on 9 May 2010 21:18 On 09 May 2010 23:08:21 GMT Scott Lurndal <scott(a)slp53.sl.home> wrote: | [*] Up to 22 memory references when using nested page tables, depending on | processor page directory cache hit rate; this can be reduce to 11 if the | nested page table uses 1GB pages sizes (vice 4 or less without using SVM). Is the page table also stored in cache, even if also in the TLB? -- ----------------------------------------------------------------------------- | Phil Howard KA9WGN | http://linuxhomepage.com/ http://ham.org/ | | (first name) at ipal.net | http://phil.ipal.org/ http://ka9wgn.ham.org/ | -----------------------------------------------------------------------------
From: David Schwartz on 9 May 2010 22:06
On May 9, 4:08 pm, sc...(a)slp53.sl.home (Scott Lurndal) wrote: > Threads are a performance win because they don't need to flush the TLB's > on context switches between threads in the same process. Nope. That's like saying that cars are faster than bicycles because they don't have pedals. While it's true that threads are a performance win and it's true that context switches between threads of the same process are faster than context switches between threads from different processes, the latter does not cause the former. > A thread context switch is enormously less > expensive than a process context switch. The larger the page size, > the better. It doesn't matter. In any sensible threaded application, there will be so few context switches that making them faster will be lost in the noise. DS |