Prev: LBW 0.1: Linux Binaries on Windows
Next: socket
From: David Schwartz on 1 Apr 2010 19:51 On Apr 1, 4:23 pm, "Peter Olcott" <NoS...(a)OCR4Screen.com> wrote: > --Appends are not guaranteed atomic. So each writer would > have to have > --its own transaction log file or you'd need some separate > mechanism to > --lock them. > You may be correct, but, if you are then two different > editions of Advanced Programming in the Unix Environment > would be incorrect: > First Edition Chapter 3 Section 3.11 Atomic Operations page > 60-61 Appending to a File > "Unix provides an atomic way to do this operation if we set > the O_APPEND flag when a file is opened." > Second Edition Chapter 3 Section 3.11 Atomic Operations page > 74 Appending to a File > "Unix provides an atomic way to do this operation if we set > the O_APPEND flag when a file is opened." You are confusing two different notions of atomicity. Sorry I wasn't clearer. A write to a file opened with O_APPEND is atomic in the sense that the file position pointer will not move between the notional seek and the write. So if two processes each append an "A", this can't happen: 1) Process 1 seeks to the end. 2) Process 2 seeks to the end. 3) Process 1 writes an A. 4) Process 2 writes an A on top of the first A. The net effect is only one 'A'. That can't happen. However, the issue is whether they're atomic in the sense that the write operation itself cannot be interrupted by another file- modification operation. The standards say: "If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation." This seems to guarantee the former atomicity but not the latter. But your implementation would require the latter. If there's anything that guarantees what you need, I'm not aware of it. I had hashed this out in the past but am unable to recall for sure the final resolution. I believe it was that it is not formally guaranteed, but that the guarantee provided by the standard would be all but useless without it. > --Or are you suggesting there be one transaction log file > and one named > > Yes one single transaction log file. > > --pipe for each possible thread-to-thread set? If so, how > will they be > --established in the first place? > Two total pipes, one in each direction. Okay, so a message comes in over a pipe, how does it get to the right thread -- the one that's waiting for that message? > --It's hard to analyze a solution without knowing what > problem it's > --supposed to solve. ;) > I am trying to convert my proprietary OCR software into a > web application. Initially there will be multiple threads, > one for each web request, and a single threaded process > servicing these web requests. Eventually there may be > multiple threads servicing these web requests. Seems kind of silly to have a thread for each request that spends most of its time just waiting for another program. You don't need a thread to wait. Just assign a thread to do what needs to be done when you're notified that the request is finished being processed by the OCR software. I'm saying, don't have one thread waiting to do X when X is possible and one waiting to do Y when Y is possible and so on. First, this wastes a lot of threads. Second, it forces a lot of context switches to get the "right thread for the job" running. Instead, have one thread that waits until anything is possible. When something is possible, it wakes another thread to wait for the next thing to be possible and it does X, Y, Z, or whatever it was just told is now possible to do. This results in far fewer context switches and better utilization of CPU code and data caches. (Of course, if the web part is an insignificant fraction of resource usage, it might not matter.) DS
From: Scott Lurndal on 1 Apr 2010 20:09 David Schwartz <davids(a)webmaster.com> writes: >On Apr 1, 9:09=A0am, "Peter Olcott" <NoS...(a)OCR4Screen.com> wrote: > >> The first process appends (O_APPEND flag) transaction >> records to a transaction log file, and then writes to a >> named pipe to inform the other process that a transaction is >> ready for processing. The transaction log file contains all >> of the details of the transaction as fixed length binary >> records. Any reads of this file use pread(). > >Appends are not guaranteed atomic. So each writer would have to have >its own transaction log file or you'd need some separate mechanism to >lock them. A single write or pwrite call on a file with O_APPEND is required by the SUS to ensure that the write is performed atomically with respect to other writes to the same file which also have the O_APPEND flag set. The order, of course, is not guaranteed. scott
From: Peter Olcott on 1 Apr 2010 20:24 "David Schwartz" <davids(a)webmaster.com> wrote in message news:d83fe7cb-609f-4456-a4de-66eca05c211f(a)i25g2000yqm.googlegroups.com... On Apr 1, 4:23 pm, "Peter Olcott" <NoS...(a)OCR4Screen.com> wrote: > --Appends are not guaranteed atomic. So each writer would > have to have > --its own transaction log file or you'd need some separate > mechanism to > --lock them. > You may be correct, but, if you are then two different > editions of Advanced Programming in the Unix Environment > would be incorrect: > First Edition Chapter 3 Section 3.11 Atomic Operations > page > 60-61 Appending to a File > "Unix provides an atomic way to do this operation if we > set > the O_APPEND flag when a file is opened." > Second Edition Chapter 3 Section 3.11 Atomic Operations > page > 74 Appending to a File > "Unix provides an atomic way to do this operation if we > set > the O_APPEND flag when a file is opened." --You are confusing two different notions of atomicity. Sorry I wasn't --clearer. A write to a file opened with O_APPEND is atomic in the sense --that the file position pointer will not move between the notional seek --and the write. So if two processes each append an "A", this can't --happen: --1) Process 1 seeks to the end. --2) Process 2 seeks to the end. --3) Process 1 writes an A. --4) Process 2 writes an A on top of the first A. --The net effect is only one 'A'. That can't happen. That may be all that I need.. --However, the issue is whether they're atomic in the sense that the --write operation itself cannot be interrupted by another file- --modification operation. --The standards say: "If the O_APPEND flag of the file status flags is --set, the file offset shall be set to the end of the file prior to each --write and no intervening file modification operation shall occur --between changing the file offset and the write operation." So the complete Append Operation can not be interrupted. --This seems to guarantee the former atomicity but not the latter. But I don't see how this does not guarantee all of the atomicity that I need. Could you propose a concrete example that meets the standard and causes problems? It seems like it is saying the entire append must complete before any other file modifications take place. --your implementation would require the latter. If there's anything that --guarantees what you need, I'm not aware of it. I had hashed this out --in the past but am unable to recall for sure the final resolution. I --believe it was that it is not formally guaranteed, but that the --guarantee provided by the standard would be all but useless without --it. > --Or are you suggesting there be one transaction log file > and one named > > Yes one single transaction log file. > > --pipe for each possible thread-to-thread set? If so, how > will they be > --established in the first place? > Two total pipes, one in each direction. --Okay, so a message comes in over a pipe, how does it get to the right --thread -- the one that's waiting for that message? Initially (on the request) there will be only one thread on the other end, and multiple thread on the sending end. Eventually there my be multiple threads on the other end, and it won't matter which one picks it up. The response may be a little trickier, maybe the Thread-ID is passed through in the request. > --It's hard to analyze a solution without knowing what > problem it's > --supposed to solve. ;) > I am trying to convert my proprietary OCR software into a > web application. Initially there will be multiple threads, > one for each web request, and a single threaded process > servicing these web requests. Eventually there may be > multiple threads servicing these web requests. --Seems kind of silly to have a thread for each request that spends most --of its time just waiting for another program. You don't need a thread --to wait. Just assign a thread to do what needs to be done when you're --notified that the request is finished being processed by the OCR --software. This is a fundamental part of the web server that I will be using. Also I want the request to be acknowledged immediately. Processing may take quite a while. --I'm saying, don't have one thread waiting to do X when X is possible --and one waiting to do Y when Y is possible and so on. First, this --wastes a lot of threads. Second, it forces a lot of context switches --to get the "right thread for the job" running. The web server queues up requests in order of arrival. I think that this side has to be very responsive or the connection might die. In any case I don't want the user to wait until their request is acknowledged. I want at least the acknowledgement to be as immediate as possible. --Instead, have one thread that waits until anything is possible. When --something is possible, it wakes another thread to wait for the next --thing to be possible and it does X, Y, Z, or whatever it was just told --is now possible to do. All this stuff is already implemented in the web server. I merely must interface with pre-existing code. It is possible that I could get 1000 requests at once, and take several minutes to process all of them. --This results in far fewer context switches and better utilization of --CPU code and data caches. (Of course, if the web part is an --insignificant fraction of resource usage, it might not matter.) Yes insignificant fraction, probably far less than 1%. DS
From: Peter Olcott on 1 Apr 2010 20:49 "Scott Lurndal" <scott(a)slp53.sl.home> wrote in message news:nBatn.1653$OC1.680(a)news.usenetserver.com... > David Schwartz <davids(a)webmaster.com> writes: >>On Apr 1, 9:09=A0am, "Peter Olcott" >><NoS...(a)OCR4Screen.com> wrote: >> >>> The first process appends (O_APPEND flag) transaction >>> records to a transaction log file, and then writes to a >>> named pipe to inform the other process that a >>> transaction is >>> ready for processing. The transaction log file contains >>> all >>> of the details of the transaction as fixed length binary >>> records. Any reads of this file use pread(). >> >>Appends are not guaranteed atomic. So each writer would >>have to have >>its own transaction log file or you'd need some separate >>mechanism to >>lock them. > > A single write or pwrite call on a file with O_APPEND > is required by the SUS to ensure that the write is > performed > atomically with respect to other writes to the same file > which also have > the O_APPEND flag set. The order, of course, is not > guaranteed. > > scott To what extent is the order not guaranteed? I envision that my second process will need to write to the same record that was just appended almost immediately. Could this be an issue?
From: Ersek, Laszlo on 1 Apr 2010 21:16
On Thu, 1 Apr 2010, Peter Olcott wrote: > I don't see how this does not guarantee all of the atomicity that I > need. Could you propose a concrete example that meets the standard and > causes problems? It seems like it is saying the entire append must > complete before any other file modifications take place. write() itself (with or without O_APPEND) is not required to write all bytes at once, even to a regular file. http://www.opengroup.org/onlinepubs/9699919799/functions/write.html Consider a signal delivered to the thread, a file size limit reached, or being temporarily out of space on the fs hosting the file; all after some but not all bytes were written. (These would return -1 and set errno to EINTR (without SA_RESTART), EFBIG, ENOSPC, respectively, if no data could have been written before encountering the condition in question.) This list is not exhaustive. (Signal delivery is plausible -- suppose you submit a write request of 1G bytes on a system with little buffer cache and a slow disk. If SSIZE_MAX equals LONG_MAX, for example, the result of such a request is not even implementation-defined.) ----v---- Write requests to a pipe or FIFO shall be handled in the same way as a regular file with the following exceptions: [...] * Write requests of {PIPE_BUF} bytes or less shall not be interleaved with data from other processes doing writes on the same pipe. [...] * If the O_NONBLOCK flag is clear, a write request may cause the thread to block, but on normal completion it shall return nbyte. [...] ----^---- Both quoted guarantees (exclusion of interleaved writes, plus completeness of writes) are exceptional behavior of pipes in relation to regular files. Perhaps an interpretation request should be submitted: if write() returns nbyte and O_APPEND was set, was the block written atomically then? <http://www.kernel.org/doc/man-pages/online/pages/man2/write.2.html> does say ----v---- If the file was open(2)ed with O_APPEND, the file offset is first set to the end of the file before writing. The adjustment of the file offset and the write operation are performed as an atomic step. ----^---- which seems to imply that the write operation itself is atomic. (... if it returns "count", in my interpretation.) lacos |