Prev: LBW 0.1: Linux Binaries on Windows
Next: socket
From: Peter Olcott on 2 Apr 2010 10:53 "Scott Lurndal" <scott(a)slp53.sl.home> wrote in message news:nBatn.1653$OC1.680(a)news.usenetserver.com... > David Schwartz <davids(a)webmaster.com> writes: >>On Apr 1, 9:09=A0am, "Peter Olcott" >><NoS...(a)OCR4Screen.com> wrote: >> >>> The first process appends (O_APPEND flag) transaction >>> records to a transaction log file, and then writes to a >>> named pipe to inform the other process that a >>> transaction is >>> ready for processing. The transaction log file contains >>> all >>> of the details of the transaction as fixed length binary >>> records. Any reads of this file use pread(). >> >>Appends are not guaranteed atomic. So each writer would >>have to have >>its own transaction log file or you'd need some separate >>mechanism to >>lock them. > > A single write or pwrite call on a file with O_APPEND > is required by the SUS to ensure that the write is > performed > atomically with respect to other writes to the same file > which also have > the O_APPEND flag set. The order, of course, is not > guaranteed. > > scott I am taking the above to mean that David was incorrect when he said: Appends are not guaranteed atomic. So each writer would have to have its own transaction log file or you'd need some separate mechanism to lock them. If this is not what was intended please clarify.
From: Vitus Jensen on 2 Apr 2010 11:37 On Fri, 2 Apr 2010, Ersek, Laszlo wrote: > On Fri, 2 Apr 2010, Vitus Jensen wrote: > >> Hi David, > > (I'll try to follow up on this, though I'm sure David will offer you much > better answers.) > > >> On Thu, 1 Apr 2010, David Schwartz wrote: >> >> > On Apr 1, 4:23 pm, "Peter Olcott" <NoS...(a)OCR4Screen.com> wrote: >> >> ... >> >> > Seems kind of silly to have a thread for each request that spends most >> > of its time just waiting for another program. You don't need a thread to >> > wait. Just assign a thread to do what needs to be done when you're >> > notified that the request is finished being processed by the OCR >> > software. >> >> I'm coming from another platform where the maxime was "a thread should do >> nothing very well" (PF), which I always interpreted as to code your thread >> so that they spend most of their time waiting. So yes, you need a thread >> to wait. 99% of the time it should do nothing else but wait. > > What's the name of that platform? I don't understand the "PF" abbreviation. That's "OS/2", Peter Fitzsimmons (PF) first axiom of threads was "A thread's most important job is to do nothing very well." As posted on fidonet OS2PROG, which unfortunately google doesn't search. > Perhaps you mean "FP" as in functional programming. The programmer-accessible > thread concept in "such languages" tends to be lightweight ("green threads" > or "managed threads"), and the runtime environment of the language provides a > scheduler, implemented in user space, that multiplexes such green threads > over kernel threads (kernel schedulable entities). This is called the M:N > thread model. I've never used userspace thread libraries so I can't say something about their advantage. OS/2 uses kernel threads, 1:1. Of cause that axiom was posted 20 years ago when most machines had 16MB RAM or less and a single CPU. And classic OS/2 can't give more than 568 MB to any process (minus shared memory, minus kernel, minus driver ...). >> > I'm saying, don't have one thread waiting to do X when X is possible and >> > one waiting to do Y when Y is possible and so on. First, this wastes a >> > lot of threads. Second, it forces a lot of context switches to get the >> > "right thread for the job" running. >> >> Are threads a sparse resource in linux? I thought the limit for a typical >> system is well above 1000. > > GNU/Linux switched to a 1:1 thread model not so long ago (at least when you > program in C), AFAICT. > > http: //people.redhat.com/drepper/glibcthreads.html > http: //people.redhat.com/drepper/nptl-design.pdf > > Furthermore, IIRC the default stack dedicated to a single thread is 2M (at > least?), so 1000 threads in a 32-bit process would eat up (per default) the > usable address space of the process quite quickly. > > So yes, in the 1:1 thread model (which seems to me kind of "proven optimal" > for system programming languages), a thread is a heavy-weight resource. You > implement manually what the user-space green threads scheduler does for you > elsewhere. Because of that big fat 2MB stack which isn't needed if you code small, simple threads. >> And if a thread is waiting for data appearing on a filehandle how could it >> create context switches? It's just lying there, the only thing it's >> occupying is address space and some kernel memory. > > I recommend reading "High-Performance Server Architecture" by Jeff Darcy: > <http://pl.atyp.us/content/tech/servers.html>. Yes, I know that paper. But I still think for reasonable requirements it's better to code threads. After all it's easier because you concentrate on one problem at a time. >> Now if all those threads will be crunching data this would be another >> case. To avoid overload just increase/decrease a counter and wait (!) if >> it's getting too high. > > You create heavy-weight threads not because you want to express the work to > be done differently, ie. in separate logical sections (eg. for I/O > multiplexing), but because you want to scale CPU-intensive work to multiple > cores, or you wish to overlap waiting for IO with CPU-intensive work and > don't want to use AIO or signals for some reason. David wrote about a thread getting data from OCR to be send back to the client. Which is mostly IO-bound. > The C10K problem, by Dan Kegel: <http://kegel.com/c10k.html>. Yes, I know. Will that OCR server get so many requests? For his one OCR task? .... > "Homogeneous threads" are easier to schedule for the kernel, and they adapt > better to a changing load between different task types. Consider a pharmacy > with five windows. Compare the following two setups: > > - you have a single queue of patients, with the patient at the head of the > queue dispatched to whichever window becomes free -- each window (teller?) > handles all kinds of requests, > > - you have five queues and patients can't move between queues once they've > chosen one queue, according to their types of requests. > > Which one seems to handle better (a) smoother servicing / minimal wait times > / wildly varying individual servicing times, (b) adding more windows, (c) > adding more patients? If you only have 5 windows it doesn't make much sense to cram 6 patients into them. So you use 5 threads reading from a single queue. Usually you have such a direct connection resource<->thread. But I see the direction you are heading, 5 windows (CPUs), doctors of 20 different kinds (routines) and a queue of patients (tasks selecting the routines). Well, this sounds like CPU-bound. >> If you start those worker routines as threads, the decision making about >> what worker to run is moved into the kernel which is highly optimised for >> that kind of work. > > It surely is, but then the kernel *must* work (the user/kernel boundary > *must* be crossed) for doing nothing more than dispatching (selecting a > worker based on task type). A switch statement is much cheaper. As a thread has to be woken up (because a new task is available) a context switch is unavoidable. After all you system was idle before. So just wake the correct one, based on queue handle, socket handle or whereever the task appears. >> Additionally your worker threads keep their context data local and may >> hide that data structure from other threads/modules which give a much >> cleaner, simpler and safer design. > > I agree that keeping request-specific (temporary) data on the stack improves > locality, but I believe this should matter little if a request, once > dispatched, requires a massive amount of computation (relative to ensuring > cache coherence etc). And eavy-weight threads appear most eligible to me when > jobs do need intensive crunching. (We're talking OCR, right?) Yes, the OP wanted to have one thread doing OCR. Eventually he wanted to add a second thread, probably because he wanted to buy a heavier machine later. Vitus -- Vitus Jensen, Hannover, Germany, Earth, Universe (current)
From: Ersek, Laszlo on 2 Apr 2010 11:36 On Fri, 2 Apr 2010, David W Noon wrote: > On Fri, 2 Apr 2010 15:48:13 +0200, Ersek, Laszlo wrote about Re: IPC > based on name pipe FIFO and transaction log file: > >> On Fri, 2 Apr 2010, Vitus Jensen wrote: >>> I'm coming from another platform where the maxime was "a thread >>> should do nothing very well" (PF), > >> What's the name of that platform? I don't understand the "PF" >> abbreviation. > Vitus was referring to OS/2, and PF is Peter Fitzsimmons. Peter was a > well known developer on OS/2 some 15 or 20 years ago. Thank you for this clarification. Briefly searching for 'Peter Fitzsimmons OS/2 threads' didn't return anything definitive, but Wikipedia did: http://en.wikipedia.org/wiki/Thread_(computer_science) ----v---- Systems like Windows NT and OS/2 are said to have "cheap" threads and "expensive" processes; in other operating systems there is not so great a difference except the cost of address space switch which implies a TLB flush. ----^---- http://en.wikipedia.org/wiki/OS/2#Problems ----v---- Problems Some problems were classic subjects of comparison with other operating systems: [...] # No unified object handles. The availability of threads probably led system designers to overlook mechanisms which allow a single thread to wait for different types of asynchronous events at the same time, for example the keyboard and the mouse in a "console" program. Even though /select/ was added later, it only worked on network sockets. In case of a console program, dedicating a separate thread for waiting on each source of events made it difficult to properly release all the input devices before starting other programs in the same "session". As a result, console programs usually polled the keyboard and the mouse alternately, which resulted in wasted CPU and a characteristic "jerky" reactivity to user input. In OS/2 3.0 IBM introduced a new call for this specific problem. ----^---- The notion of a thread and a thread's designed-in characteristics in OS/2 seem fundamentally different from those in UNIX(R), where "everything is a file". Therefore I assume programming techniques derived from the OS/2 thread concept cannot be easily applied on modern POSIX(R) systems. (This is not to say that the SUS requires POSIX threads to be heavy-weight, but in practice, many times they are; and the OS/2 techniques appear to rely definitely on the cheapness of threads, and that is not guaranteed (though not forbidden either) by the SUS. AFAICT.) Thanks! lacos
From: Peter Olcott on 2 Apr 2010 11:47 "Vitus Jensen" <vitus(a)alter-schwede.de> wrote in message news:alpine.LNX.2.00.1004021649110.5338(a)asterix.crazy-teaparty.dyndns.org... > On Fri, 2 Apr 2010, Ersek, Laszlo wrote: > >> On Fri, 2 Apr 2010, Vitus Jensen wrote: >> >>> Hi David, >> >> (I'll try to follow up on this, though I'm sure David >> will offer you much better answers.) >> >> >>> On Thu, 1 Apr 2010, David Schwartz wrote: >>> >>> > On Apr 1, 4:23 pm, "Peter Olcott" >>> > <NoS...(a)OCR4Screen.com> wrote: >>> >>> ... >>> >>> > Seems kind of silly to have a thread for each request >>> > that spends most of its time just waiting for another >>> > program. You don't need a thread to wait. Just assign >>> > a thread to do what needs to be done when you're >>> > notified that the request is finished being processed >>> > by the OCR software. >>> >>> I'm coming from another platform where the maxime was >>> "a thread should do >>> nothing very well" (PF), which I always interpreted as >>> to code your thread >>> so that they spend most of their time waiting. So yes, >>> you need a thread >>> to wait. 99% of the time it should do nothing else but >>> wait. >> >> What's the name of that platform? I don't understand the >> "PF" abbreviation. > > That's "OS/2", Peter Fitzsimmons (PF) first axiom of > threads was "A thread's most important job is to do > nothing very well." As posted on fidonet OS2PROG, which > unfortunately google doesn't search. > >> Perhaps you mean "FP" as in functional programming. The >> programmer-accessible thread concept in "such languages" >> tends to be lightweight ("green threads" or "managed >> threads"), and the runtime environment of the language >> provides a scheduler, implemented in user space, that >> multiplexes such green threads over kernel threads >> (kernel schedulable entities). This is called the M:N >> thread model. > > I've never used userspace thread libraries so I can't say > something about their advantage. OS/2 uses kernel > threads, 1:1. Of cause that axiom was posted 20 years ago > when most machines had 16MB RAM or less and a single CPU. > And classic OS/2 can't give more than 568 MB to any > process (minus shared memory, minus kernel, minus driver > ...). > > >>> > I'm saying, don't have one thread waiting to do X >>> > when X is possible and one waiting to do Y when Y is >>> > possible and so on. First, this wastes a lot of >>> > threads. Second, it forces a lot of context switches >>> > to get the "right thread for the job" running. >>> >>> Are threads a sparse resource in linux? I thought the >>> limit for a typical >>> system is well above 1000. >> >> GNU/Linux switched to a 1:1 thread model not so long ago >> (at least when you program in C), AFAICT. >> >> http: //people.redhat.com/drepper/glibcthreads.html >> http: //people.redhat.com/drepper/nptl-design.pdf >> >> Furthermore, IIRC the default stack dedicated to a single >> thread is 2M (at least?), so 1000 threads in a 32-bit >> process would eat up (per default) the usable address >> space of the process quite quickly. >> >> So yes, in the 1:1 thread model (which seems to me kind >> of "proven optimal" for system programming languages), a >> thread is a heavy-weight resource. You implement manually >> what the user-space green threads scheduler does for you >> elsewhere. > > Because of that big fat 2MB stack which isn't needed if > you code small, simple threads. > >>> And if a thread is waiting for data appearing on a >>> filehandle how could it >>> create context switches? It's just lying there, the >>> only thing it's >>> occupying is address space and some kernel memory. >> >> I recommend reading "High-Performance Server >> Architecture" by Jeff Darcy: >> <http://pl.atyp.us/content/tech/servers.html>. > > Yes, I know that paper. But I still think for reasonable > requirements it's better to code threads. After all it's > easier because you concentrate on one problem at a time. > > >>> Now if all those threads will be crunching data this >>> would be another >>> case. To avoid overload just increase/decrease a >>> counter and wait (!) if >>> it's getting too high. >> >> You create heavy-weight threads not because you want to >> express the work to be done differently, ie. in separate >> logical sections (eg. for I/O multiplexing), but because >> you want to scale CPU-intensive work to multiple cores, >> or you wish to overlap waiting for IO with CPU-intensive >> work and don't want to use AIO or signals for some >> reason. > > David wrote about a thread getting data from OCR to be > send back to the client. Which is mostly IO-bound. > >> The C10K problem, by Dan Kegel: >> <http://kegel.com/c10k.html>. > > Yes, I know. Will that OCR server get so many requests? > For his one OCR task? I am assuming a maximum load of 100 requests per second. This is based on the maximum load that the OCR can handle. > > > ... >> "Homogeneous threads" are easier to schedule for the >> kernel, and they adapt better to a changing load between >> different task types. Consider a pharmacy with five >> windows. Compare the following two setups: >> >> - you have a single queue of patients, with the patient >> at the head of the queue dispatched to whichever window >> becomes free -- each window (teller?) handles all kinds >> of requests, >> >> - you have five queues and patients can't move between >> queues once they've chosen one queue, according to their >> types of requests. >> >> Which one seems to handle better (a) smoother servicing / >> minimal wait times / wildly varying individual servicing >> times, (b) adding more windows, (c) adding more patients? > > If you only have 5 windows it doesn't make much sense to > cram 6 patients into them. So you use 5 threads reading > from a single queue. Usually you have such a direct > connection resource<->thread. > > But I see the direction you are heading, 5 windows (CPUs), > doctors of 20 different kinds (routines) and a queue of > patients (tasks selecting the routines). Well, this > sounds like CPU-bound. > >>> If you start those worker routines as threads, the >>> decision making about >>> what worker to run is moved into the kernel which is >>> highly optimised for >>> that kind of work. >> >> It surely is, but then the kernel *must* work (the >> user/kernel boundary *must* be crossed) for doing nothing >> more than dispatching (selecting a worker based on task >> type). A switch statement is much cheaper. > > As a thread has to be woken up (because a new task is > available) a context switch is unavoidable. After all you > system was idle before. So just wake the correct one, > based on queue handle, socket handle or whereever the task > appears. > >>> Additionally your worker threads keep their context >>> data local and may >>> hide that data structure from other threads/modules >>> which give a much >>> cleaner, simpler and safer design. >> >> I agree that keeping request-specific (temporary) data on >> the stack improves locality, but I believe this should >> matter little if a request, once dispatched, requires a >> massive amount of computation (relative to ensuring cache >> coherence etc). And eavy-weight threads appear most >> eligible to me when jobs do need intensive crunching. >> (We're talking OCR, right?) > > Yes, the OP wanted to have one thread doing OCR. > Eventually he wanted to add a second thread, probably > because he wanted to buy a heavier machine later. I want to have one thread for the OCR initially because the initial machine will only have a single core. Eventually I will want to have one thread per core. I don't know how well Intel hyperthreading works, if it works well then I may be able to handle two threads per core. I envision that it will be a long time before my OCR load reaches the level of requiring more than a single thread. > > > Vitus > > -- > Vitus Jensen, Hannover, Germany, Earth, Universe (current)
From: Scott Lurndal on 2 Apr 2010 12:09
"Peter Olcott" <NoSpam(a)OCR4Screen.com> writes: > >"Scott Lurndal" <scott(a)slp53.sl.home> wrote in message >news:nBatn.1653$OC1.680(a)news.usenetserver.com... >> David Schwartz <davids(a)webmaster.com> writes: >>>On Apr 1, 9:09=A0am, "Peter Olcott" >>><NoS...(a)OCR4Screen.com> wrote: >>> >>>> The first process appends (O_APPEND flag) transaction >>>> records to a transaction log file, and then writes to a >>>> named pipe to inform the other process that a >>>> transaction is >>>> ready for processing. The transaction log file contains >>>> all >>>> of the details of the transaction as fixed length binary >>>> records. Any reads of this file use pread(). >>> >>>Appends are not guaranteed atomic. So each writer would >>>have to have >>>its own transaction log file or you'd need some separate >>>mechanism to >>>lock them. >> >> A single write or pwrite call on a file with O_APPEND >> is required by the SUS to ensure that the write is >> performed >> atomically with respect to other writes to the same file >> which also have >> the O_APPEND flag set. The order, of course, is not >> guaranteed. >> >> scott > >I am taking the above to mean that David was incorrect when >he said: > > Appends are not guaranteed atomic. So each writer > would have to have its own transaction log file or you'd > need some separate mechanism to lock them. > >If this is not what was intended please clarify. > > Read David's response. If two writers are appending to a single file, each individual write is appended atomically to the end of the file. The order of the records when more than one process writes to the file is non-deterministic. In your application, I'd frankly avoid file operations in favor of queues or ring-buffers in a MAP_SHARED mmap(2) region. If you need the queues to be persistent, map a file; otherwise map anonymous (linux) or shmat (unix). Use pthread_mutex (attr=PROCESS_SHARED), semop, or GCC built-in atomic memory access functions (e.g. __sync_fetch_and_add) for mutual exclusion/wakup. scott |