From: Chris Friesen on 10 May 2010 11:20 On 05/08/2010 12:52 PM, David Schwartz wrote: > On May 8, 6:22 am, phil-news-nos...(a)ipal.net wrote: > >> excessive emphasis on threads compared to processes > > Process-pool designs are not really realistic yet. Nobody's done the > work needed to make them useful. > > I keep hoping somebody will, since I think that's a phenomenal design > approach. You would need to allocate lots of memory address space > before you fork off the child processes (64-bit OSes make this easy), > and have a special "shared allocator" to allocate shared memory. You'd > need a library that made it easy to register file descriptors as > shared and hand them from process to process. You'd also need a "work > pool" implementation that only accepted references to shared resources > to identify a work item. What's wrong with allocating memory space after forking using the normal shared memory allocators or mmap()ing a file in a common filesystem? I assume the library would be simply to hide the complexity of passing file descriptors using unix sockets? > Ideally, a process could register what it was messing with. So if it > crashed/failed, the system would know what was potentially corrupt. The system as a whole likely doesn't care...only the other related processes that might want to touch the same resource. Chris
From: David Schwartz on 10 May 2010 11:33 On May 10, 7:32 am, sc...(a)slp53.sl.home (Scott Lurndal) wrote: > However, in real-world threaded applications there _are_ context switches, > and there are _many_ context switches, and a thread context switch is > more efficient than a process context switch. Why would there be many context switches? All the reasons processes need to switch contexts don't apply to threads. For example: 1) Process running doesn't have access to the memory object that needs to be manipulated. Doesn't apply, threads all share VM. 2) Process running doesn't have access to the file descriptor that needs work. Doesn't apply, threads share all file descriptors. And so on. Processes require lots of context switches because the wrong one is often running. With a threaded design, there can only be a "wrong thread" if one is intentionally created. And one would not create one where it impacts performance (unless one is an idiot). DS
From: David Schwartz on 10 May 2010 11:37 On May 10, 7:40 am, Rainer Weikusat <rweiku...(a)mssgmbh.com> wrote: > Dedicating threads to particular subtasks of something which is > supposed to be done is also a sensible way to design 'a threaded > application', just one which is rather geared towards simplicity of > the implementation than maximum performance. You can always trade-off performance for something else. The point is that you *have* the performance in the first place and where that performance comes from. > Because a thread context > switch is cheaper than a process context switch, such simple designs > are useful for a wider range of tasks when using threads instead of > processes. Compared to the ability to avoid context switches entirely, the relative cost difference of process versus thread context switches is lost in the noise in realistic scenarios. Of course, things that only make things better are good, and this is certainly a small benefit to threads. But it isn't a game changer. On the other hand, the ability to reduce the number of context switches by an order of magnitude (because you never have a thread running that can't access the memory or file descriptor needed to make forward progress) *is* a game changer. DS
From: Chris Friesen on 10 May 2010 11:37 On 05/09/2010 08:15 PM, David Schwartz wrote: > Threads are a win over processes because it makes no difference which > thread runs. The process makes forward progress so long as any ready- > to-run thread gets the CPU. That is, in a properly designed multi- > threaded application, the amount of work done before a context switch > will be needed will be much higher. It seems like you are limiting a "properly designed multithreaded application" to those that use a "pool of worker threads" model. "properly designed" doesn't always mean "as fast as possible at runtime". It also seems like by definition these "properly designed" apps must be rarely doing anything that would block, because that causes a context switch. Does that mean that they need to use asynch I/O, or that such apps rarely have to wait for data from a disk? Lastly, it seems like they must rarely execute system calls, because the cost of an thread-to-thread context switch isn't much more than that of a syscall. Finally, the worker pool design doesn't rule out processes instead of threads. In general it is possible to set up a multi-process model to mimic a multi-thread model, using shared memory and passing around file descriptors. Once the setup is completed, at runtime the primary performance difference between the two models is the fact that you need to flush the TLB on a context switch between processes and not between threads. Chris
From: David Schwartz on 10 May 2010 11:39
On May 10, 8:20 am, Chris Friesen <cbf...(a)mail.usask.ca> wrote: > What's wrong with allocating memory space after forking using the normal > shared memory allocators or mmap()ing a file in a common filesystem? You can't follow pointers inside the shared memory without special guard code. > I assume the library would be simply to hide the complexity of passing > file descriptors using unix sockets? Not just that. It would also hide the complexity of allocating and managing shared memory, assigning tasks to processes, creating and destroying processes that cooperate, and so on. Basically, it would provide an interface that was very much like threads, except that non- shared memory could be managed as easily as shared ones. > > Ideally, a process could register what it was messing with. So if it > > crashed/failed, the system would know what was potentially corrupt. > The system as a whole likely doesn't care...only the other related > processes that might want to touch the same resource. By "the system", I mean the system of cooperating processes. The idea would be that one of the big advantages would be that such a system could run riskier code and a process crash wouldn't take down the cooperating processes. But you wouldn't have all of the disadvantages of a "process per connection" approach. DS |