From: phil-news-nospam on 8 May 2010 16:11 On Sat, 8 May 2010 11:52:48 -0700 (PDT) David Schwartz <davids(a)webmaster.com> wrote: | On May 8, 6:22�am, phil-news-nos...(a)ipal.net wrote: | |> excessive emphasis on threads compared to processes | | Process-pool designs are not really realistic yet. Nobody's done the | work needed to make them useful. | | I keep hoping somebody will, since I think that's a phenomenal design | approach. You would need to allocate lots of memory address space | before you fork off the child processes (64-bit OSes make this easy), | and have a special "shared allocator" to allocate shared memory. You'd | need a library that made it easy to register file descriptors as | shared and hand them from process to process. You'd also need a "work | pool" implementation that only accepted references to shared resources | to identify a work item. I've seen few application purposes ... so few I can't even think of one at the moment ... although I know I though about one many years back ... which would need to hand file descriptors between tasks (I'm using this as a generic term for a unit of work that could be a thread or a process), other than the simplistic case of a master listener for a daemon that hands off to workers for each arriving connection (probably best done as part of the creation of that task ... which is usually what we see happening). The 64-bit VM does give us space to do some memory structuring to deal with the issues like having memory that is shared, and having private memory that still needs to be distinguished (e.g. two tasks that cannot see each other's stack, but we still want the addresses to be different for some reason). Well, it will until we waste too much of it and eventually exhaust it. | Ideally, a process could register what it was messing with. So if it | crashed/failed, the system would know what was potentially corrupt. I'd prefer that registration of it be a part of getting it. That is, when the task gets the resource, it already is registered. Think descriptors and processes. Kill the process and the descriptors close (aside for a few glitches in the design such as stuck devices ... but that's another whole rant for another day) and go away. Sharing the resources would have to be considered, too. |> What is really needed is a whole NEW threading concept where individual |> threads can have private-to-that-thread resources, like file descriptors |> (but done without giving up the ability to choose to share them). �Then |> you can spread the descriptors and other resources out in ways that allow |> them to be managed better. | | I'm not sure how that would be any better. Currently, if you want a | file descriptor to only be accessed by one thread, just only access it | from that one thread. But it's still silly to have to deal with the issues of a descriptor space that exceeds some large value just because all the other descriptors are visible together in at least some descriptor space. Think of 40,000 HTTP tasks working at once. You need a couple descriptors each, at least. Why do they all need to share the same descriptor space, even if there is some need to share the same virtual memory space (which may or may not be a good thing). Threads often are a performance win ... NOT because they allow faster sharing between tasks (not always needed) ... BUT just because context switches between threads of the same virtual memory space are faster. You won't need to switch the segment structure. You won't need to flush the VM translation cache. You won't even need to discard memory cache in many cases (depending on architecture in some). But threads are also often a risk ... they are not padded cells, for example. And then there is the issue of having enough separate stacks, file descriptor spaces, etc. A lot of this is an issue of architectural design (and not just architecture of the CPU) ... architecture of how process and thread contexts are organized and how information flows where it needs to go. If it's a web server that just delivers static files (for example all those button images and such), then it is mostly very simple. But if it needs to keep state for each user, or share information between distinct users in real time, especially if in faster time than storing it in a database can do, then the architecture of the server/service needs to go into this. That design needs to consider the effects of threads, processes, shared resources (which ones are needed and which ones are not), and even distinct hardware. For example, users accessing a web based mail system might be best redirected to the same machine each time during a login session, allowing their state cache to be kept in one place, even while thousands of machines and tens of millions of total processes or threads are running to service them. There's really no general purpose solution. There won't be until we get to a level where the "loose fit" of "one size fits all" won't matter (this will require machines that would look to us today much as today's machines would impress people from the 1980's). -- ----------------------------------------------------------------------------- | Phil Howard KA9WGN | http://linuxhomepage.com/ http://ham.org/ | | (first name) at ipal.net | http://phil.ipal.org/ http://ka9wgn.ham.org/ | -----------------------------------------------------------------------------
From: David Schwartz on 8 May 2010 19:18 On May 8, 1:11 pm, phil-news-nos...(a)ipal.net wrote: > Threads often are a performance win ... NOT because they allow faster > sharing between tasks (not always needed) ... BUT just because context > switches between threads of the same virtual memory space are faster. This is a common misconception. Threads are rarely, if ever, a performance win because they make context switches faster. Threads are primarily a performance win because they minimize the need for context switches. A threaded web server can do a little bit of work for each of a thousand clients without even a single context switch. If you are having so many context switches that the cost of a context switch shows up on your performance radar, you are doing something horribly wrong. Schedulers are specifically designed to ensure that context switches are infrequent, and you would have to be putting some disastrous pressures on them for that design to fail to do its job. DS
From: Golden California Girls on 9 May 2010 03:37 David Schwartz wrote: > On May 8, 1:11 pm, phil-news-nos...(a)ipal.net wrote: > >> Threads often are a performance win ... NOT because they allow faster >> sharing between tasks (not always needed) ... BUT just because context >> switches between threads of the same virtual memory space are faster. > > This is a common misconception. Threads are rarely, if ever, a > performance win because they make context switches faster. Threads are > primarily a performance win because they minimize the need for context > switches. A threaded web server can do a little bit of work for each > of a thousand clients without even a single context switch. That depends upon what you call a context switch. Somehow I think to switch threads you have to somehow save and restore a few registers, the Program Counter for sure, unless you have more cores than threads. The more registers that have to be exchanged the longer the switching time.
From: David Schwartz on 9 May 2010 15:10 On May 9, 12:37 am, Golden California Girls <gldncag...(a)aol.com.mil> wrote: > That depends upon what you call a context switch. Somehow I think to > switch threads you have to somehow save and restore a few registers, the > Program Counter for sure, unless you have more cores than threads. The > more registers that have to be exchanged the longer the switching time. Compared to blowing out the code and data caches, the time it takes to save and restore a few registers is meaningless. DS
From: Scott Lurndal on 9 May 2010 19:08
David Schwartz <davids(a)webmaster.com> writes: >On May 8, 1:11=A0pm, phil-news-nos...(a)ipal.net wrote: > >> Threads often are a performance win ... NOT because they allow faster >> sharing between tasks (not always needed) ... BUT just because context >> switches between threads of the same virtual memory space are faster. > >This is a common misconception. Threads are rarely, if ever, a >performance win because they make context switches faster. Threads are >primarily a performance win because they minimize the need for context >switches. A threaded web server can do a little bit of work for each >of a thousand clients without even a single context switch. Threads are a performance win because they don't need to flush the TLB's on context switches between threads in the same process. A thread context switch is enormously less expensive than a process context switch. The larger the page size, the better. TLB misses are expensive. TLB misses are _really_ expensive in virtual machines[*]. scott [*] Up to 22 memory references when using nested page tables, depending on processor page directory cache hit rate; this can be reduce to 11 if the nested page table uses 1GB pages sizes (vice 4 or less without using SVM). |