From: David Schwartz on 10 May 2010 22:04 On May 10, 6:21 pm, sc...(a)slp53.sl.home (Scott Lurndal) wrote: > Simple. There are generally more threads than there are processing units, > and all the threads want to accomplish something. That won't make context switches except as each thread finishes out its timeslice. The scheduler is carefully designed to give each thread a large enough timeslice so that the cost of these context switches are lost in the noise. > I assume your system is performing I/O? kswapd (which handles writing from > the file cache to the device) will run, resulting in a context switch. SoftIRQ > handling may require a context switch. Sure, but these will generally run as timeslices run out. When a system task pre-empts something, that's unavoidable, and application architecture can't do anything about it. What I'm talking about are context switches forced by application architecture. They are very, very ungood. > The reason for all these threads _is_ that thread context switches are cheap. That doesn't make sense. With all the reasons you specified, the context switches won't be between threads in the same process anyway. The only exception is when a thread completes its timeslice, in which case the timeslices are lost in the noise. DS
From: David Schwartz on 10 May 2010 22:06 On May 10, 6:28 pm, sc...(a)slp53.sl.home (Scott Lurndal) wrote: > >Compared to the ability to avoid context switches entirely, the > Which _cannot be done_ on any reasonable modern general purpose > unix-like operating system. You misunderstand. If you reduce the number of context switches needed to do a particular task by one, you have eliminated one context switch entirely. That is most definitely possible. > >relative cost difference of process versus thread context switches is > >lost in the noise in realistic scenarios. Of course, things that only > Have you ever measured this? I have, several times, on various > architectures from mainframes to MPP boxes to hypervisors. The cost > difference is measurable and hardly in the noise. Sure, because you deliberately set out to measure that one thing. That's like saying that brake friction is a significant factor in the fuel economy of a 747 because operate very badly while the brakes are creating friction. > >make things better are good, and this is certainly a small benefit to > >threads. But it isn't a game changer. On the other hand, the ability > >to reduce the number of context switches by an order of magnitude > >(because you never have a thread running that can't access the memory > >or file descriptor needed to make forward progress) *is* a game > So you never need to fault a page in? Context switch. > So you never need to read or write the file descriptor? Context switch.. Again, all of those things happen, but they are lost in the noise compared to a process-based architecture that forces a context switch every time the "wrong process" is running. DS
From: David Schwartz on 10 May 2010 22:11 On May 10, 6:37 pm, sc...(a)slp53.sl.home (Scott Lurndal) wrote: > I see no advantage to this model for any application. Why do you > think a multiple process (hence multiple address space) model is > superior to a multi-threaded process? Every time a threaded process allocates memory or allocate a file descriptor, synchronization is required with the other threads in the process. This can be totally eliminated by using a process pool approach. It's also convenient for fault isolation. When a thread fails, the process context is lost. By being able to isolate what shared state a process can get to, you can recover from a fault with fewer problems and less loss. > If you're worried about > the cost of poll on a large pool of file descriptors, then you've > posed your problem poorly and should rethink your solution. No, that's not the issue at all. > >it helps to allocate a large chunk of address space before 'fork'ing > >processes off so that you can dereference pointers in shared memory > >without needing to manually alias them. > c'est what? If multiple processes map shared memory or shared files, the pointers to that memory will typically be different. That means that this memory cannot contain pointers that you can dereference the normal way. However, if you use a 64-bit operating system, allocate a huge chunk of address space before you 'fork' off the processes, and use a special inter-process allocator to manage that space, you can pass pointers between processes and dereference them with no special effort at all. Something strange is going on here. You're a smart guy and the things I'm saying are very simple, yet nothing I'm saying seems to be getting through. Perhaps we have wildly different assumptions or somehow you are thinking I'm saying something totally different from what I'm actually saying. What I'm saying is that at least for some applications, a "process pool" server that was very analogously coded to "thread pool" servers might be able to provide benefits that thread pools alone cannot. Each process in the pool can uses threads, of course, if that makes sense. The benefit is primarily that sharing can be precisely controlled and secondarily that false sharing is minimized. DS
From: Scott Lurndal on 11 May 2010 12:56 David Schwartz <davids(a)webmaster.com> writes: >On May 10, 6:37=A0pm, sc...(a)slp53.sl.home (Scott Lurndal) wrote: > >> I see no advantage to this model for any application. =A0 Why do you >> think a multiple process (hence multiple address space) model is >> superior to a multi-threaded process? > >Every time a threaded process allocates memory or allocate a file >descriptor, synchronization is required with the other threads in the Every time memory is allocated or a file descriptor is allocated by any thread in the system (whether a single or multiple threaded process), synchronization is required. There is no difference between a single threaded process and a multiple threaded process from the point of view of the kernel allocators. >process. This can be totally eliminated by using a process pool >approach. It's also convenient for fault isolation. When a thread >fails, the process context is lost. By being able to isolate what >shared state a process can get to, you can recover from a fault with >fewer problems and less loss. A process failure in this context will generally be due to a software bug, and might cascade to all the remaining processes. If each process is running on a different system (such tier1 systems), then a hardware failure will not affect the remaining systems (unless it was due to power, cooling or network access). > >>=A0If you're worried about >> the cost of poll on a large pool of file descriptors, then you've >> posed your problem poorly and should rethink your solution. > >No, that's not the issue at all. > >> >it helps to allocate a large chunk of address space before 'fork'ing >> >processes off so that you can dereference pointers in shared memory >> >without needing to manually alias them. > >> c'est what? > >If multiple processes map shared memory or shared files, the pointers >to that memory will typically be different. That means that this >memory cannot contain pointers that you can dereference the normal >way. However, if you use a 64-bit operating system, allocate a huge >chunk of address space before you 'fork' off the processes, and use a >special inter-process allocator to manage that space, you can pass >pointers between processes and dereference them with no special effort >at all. Then you've lost the isolation that you trumpeted above as a benefit. > >Something strange is going on here. You're a smart guy and the things >I'm saying are very simple, yet nothing I'm saying seems to be getting >through. Perhaps we have wildly different assumptions or somehow you >are thinking I'm saying something totally different from what I'm >actually saying. I guest that's the case. I'm a operating system guy, and that colors my thinking. I write multithreaded applications just like I do operating systems, using the same synchronization techniques etc. The first hardware I used professionally had instructions for mutexes and condition variables, a rather clever microkernel system. >What I'm saying is that at least for some applications, a "process >pool" server that was very analogously coded to "thread pool" servers >might be able to provide benefits that thread pools alone cannot. Each >process in the pool can uses threads, of course, if that makes sense. >The benefit is primarily that sharing can be precisely controlled and >secondarily that false sharing is minimized. That's the way most web servers work today, particularly with apache. But they don't share memory. Oracle does, using system 5 SHM, but they do some pretty grody tricks to allow the oracle code to access the shared memory directly, without any base-relative accesses (at link time, they link a stub assembler file based at the SHM load address with the rest of the database code such that the database code can directly reference variables in the fixed portion of the shared global area. The remainder is allocated dynamically, but all the processes that share the SGA map it at the same process scott
From: David Schwartz on 11 May 2010 14:52
On May 11, 9:56 am, sc...(a)slp53.sl.home (Scott Lurndal) wrote: > >Every time a threaded process allocates memory or allocate a file > >descriptor, synchronization is required with the other threads in the > Every time memory is allocated or a file descriptor > is allocated by any thread in the system (whether a single or multiple > threaded process), synchronization is required. There is no difference > between a single threaded process and a multiple threaded process from > the point of view of the kernel allocators. That is not true. In the case of file descriptors, the file descriptor table is shared by all the threads. Allocating a new file descriptor requires manipulating this shared resource. If two threads in the same process both call 'socket', the kernel must somehow assign one a lower descriptor number than the other. This requires an ordering that is not required if two threads from different processes allocate a file descriptor. The same is true of memory. When a thread allocates memory, some address space must be taken from the process' pool. If a mapping needs to be made or modified, the mapping will be to the shared address space. This will require some synchronization with the possibility of a conflict. The same is true when a thread is created. Threads are process resources, and if two threads in the same process each try to create a thread, there is likely to be more synchronization overhead that if threads in different processes each created an additional thread. > A process failure in this context will generally be due to a > software bug, and might cascade to all the remaining processes. How would that happen? The only mechanism one process can use to influence another process is with the consent of both processes. One can precisely contain what a peer process can access in ways one cannot similarly constrain a peer thread. > >If multiple processes map shared memory or shared files, the pointers > >to that memory will typically be different. That means that this > >memory cannot contain pointers that you can dereference the normal > >way. However, if you use a 64-bit operating system, allocate a huge > >chunk of address space before you 'fork' off the processes, and use a > >special inter-process allocator to manage that space, you can pass > >pointers between processes and dereference them with no special effort > >at all. > Then you've lost the isolation that you trumpeted above as a benefit. Not at all. You can precisely control what parts of that address space are accessible by what processes at what time. > >What I'm saying is that at least for some applications, a "process > >pool" server that was very analogously coded to "thread pool" servers > >might be able to provide benefits that thread pools alone cannot. Each > >process in the pool can uses threads, of course, if that makes sense. > >The benefit is primarily that sharing can be precisely controlled and > >secondarily that false sharing is minimized. > That's the way most web servers work today, particularly with apache. > > But they don't share memory. Right, and they typically don't hand off file descriptors. > Oracle does, using system 5 SHM, but they do some pretty grody tricks > to allow the oracle code to access the shared memory directly, without > any base-relative accesses (at link time, they link a stub assembler file > based at the SHM load address with the rest of the database code such that > the database code can directly reference variables in the fixed portion of > the shared global area. The remainder is allocated dynamically, but > all the processes that share the SGA map it at the same process Right, and that's not needed any more with 64-bit operating systems. DS |