From: Joseph M. Newcomer on 9 Apr 2010 13:19 On Thu, 8 Apr 2010 21:10:25 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote: > >"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in >message news:561tr5dgf4lptedsnbavf3frg7rk20r0dj(a)4ax.com... >> See below... >> On Thu, 8 Apr 2010 18:19:31 -0500, "Peter Olcott" >> <NoSpam(a)OCR4Screen.com> wrote: >> >>> >>>Was your trouble with Windows named pipes? (I won't be >>>using >>>those). >>>What IPC did you end up choosing? (I like named pipes >>>because their buffer can grow to any length). >> **** >> Actually, nowhere does it say this. In fact, the linux >> documentation seems to suggest >> that a fifo can be blocking on write. And if there is a >> failure of any process connected >> to the pipe fails, what happens to the data in the pipe? >> (Hint: it is lost) So it is not >> a particularly "reliable" mechanism unless there is the >> equivalent of transactions >> confirming receipt of a block of information. > >Yes that is it. I don't even acknowledge receipt of the >request until it is committed to the transaction log. >Anything at all that prevents this write also prevents the >acknowledgement of receipt. So basically I never say "I >heard you" until the point where nothing can prevent >completing the transaction. **** OK, this is a good specification. I'm not sure how the current proposal, which doesn't have anything resembling a reliable log, accomplishes it. **** > >Then in the event that I do not receive the HTTP >acknowledgement of final receipt of the output data, I roll >the whole transaction back. If the reason for the failure is >anything at all on my end I roll the charges back, but, let >the customer keep the output data for free. If the >connection was lost, then this data is waiting for them the >next time they log in. **** And you guarantee this exactly HOW? Oh yes, with the transacted database (I thought this had been eliminated from the design). And you have designed the recovery code? You have the state machine diagram of the entire workflow and know what happens at each of the cut-points where failure can occur (which is essentially at any arc of the DFA)? I don't recall any acknowledgement that this was part of your implmentation design. **** > >One of the Linux/Unix people is recommending MySQL InnoDB >storage engine because it has very good crash recovery. **** Crash recovery of the database is NOT the same as having a recovery policy for your workflow; all it guarantees is a certain amount of trust of what is in the database, What you DO with that information is what is crtical! When you discover the database accurarely reflects a state of handling a request, you have to have an idea of what you are going to do for EVERY such state that is accurately reflected in the database! ***** > Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on 9 Apr 2010 13:42 See beklow... On Thu, 8 Apr 2010 20:40:34 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote: > >"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in >message news:68vsr5dlv7053hcg4es22v0obf1tilc7ee(a)4ax.com... >> See below... >> On Thu, 8 Apr 2010 14:57:19 -0500, "Peter Olcott" >> <NoSpam(a)OCR4Screen.com> wrote: >> >>>If my understanding is correct fsync() is supposed to >>>handle >>>both. >>> http://linux.die.net/man/2/fsync >>>It might be the case that I must use the low level open() >>>command so that there are no application buffers. >> **** >> fflush() will flush application buffers if you are using >> stdio. and fsync(), if it is >> implemented (did you see the section of that SQLLITE >> discussion that says that it is not >> always implemented correctly?) > >No, missed that. > >> >> You may not be able to turn off the onboard disk cache >> buffering. That's part of the >> problem I was referring to. (And yes, it kills hard drive >> performance) >> **** > >Solution is new vendor where disk caching can be turned off. >Vendor says that disk caching can be turned off. Vendor rep >may be guessing. **** Note also that the existence of an ATAPI command to invoke an action does not guarantee the existence of an API that will send that ATAPI command. So you need a guarantee that the OS and/or the DBMS can actually activate this feature! We discovered that even though our system wanted to take advantage of certain features of a particular vendor's disk drive, we could not invoke them (for example, the SCSI pass-through was broken in the SCSI device driver!). So EVERY component of the system, from the application through the OS through the low-level disk drivers through the hardware on the disk drive must support the ability to invoke some state in the hardware. But hey, what does this matter? If the vendor says the device has the feature, that should be good enough to predicate success on, right? **** > >>> >>>Also the experts seem to be saying that the drive's own >>>onboard cache is not much of an issue if there is UPS. >>>There are some ways to force some drives to empty their >>>onboard cache. The only way that is supposed to always >>>work >>>is to turn write buffering off. This can really hurt >>>hard-drive performance. >> **** >> Power failure is not much of an issue if you have a UPS, >> so worrying about what happens >> under power failure is not a really high priority in real >> life. >> **** > >But required OS reboots are, right? Still need all writes to >go straight to the platters. > **** Note that when the OS reboots, the reboot procedure has, as one of its effects, the completion of pending writes to the disk. When you say "required reboot", or course, you are referring to the kinds of reboots that happen after updates to software, or any other reboot invoked by some kind of "reboot-and-restart" command or API. And these actually notify the file system (at least in Windows) that all pending I/O MUST be flushed NOW. IRP_MJ_SHUTDOWN (http://msdn.microsoft.com/en-us/library/ff549423.aspx). And for Plug-and-Play, a variety of IRP_MJ_PNP notifications with various minor functions like IRP_MJ_PNP:IRP_MN_REMOVE_DEVICE guarantee that the buffers are flushed. Or did you read the specs for building file systems? Since you clearly pretend to have expertise on the implementation of file systems, it would be nice if you could back it up with a little knowledge. joe **** >>> >>>>>It helps to have a single point I/O controller, but how >>>>>are you >>>>>planning to use this thread? How will you talk to it? >>>>>IOW, now you >>>>>really need to make sure you have synchronization. >>>> **** >>>> If one thread handles the file, then no >>>> "synchronization" >>>> is required because all requests >>>> serialze through this one thread. It is an approach >>>> called the "agent pattern". >>> >>>It looks like clarification from the Linux/Unix experts >>>indicate that this would be required for my transaction >>>log. >> **** >> Of course, you still have to flush application buffers and >> flush kernel buffers; putting >> it in a single thread still does not guarantee >> transactional integrity. >> >> You have to decide where your "start transaction" and "end >> transaction" points are. >> >> Oh yes, it really is hard on the disk drive; I killed on >> disk drive by running a large >> number of tests on a transacted database; it just stopped >> seeking. But during the tests, >> it was seeking ferociously as it made sure the directory >> blocks were consistent with the >> file contents. > >Good reason for hot swappable RAID, then. **** I presume you mean RAID-5. And that you will maintain a set of spare hard drives in their carriers for this contingency (I do) **** > >>>> Apparently, he thinks that a database can't a FIFO queue >>>> because he once read that SQLLITE >>>> doesn't have a record number, or something else silly >>>> like >>>> that. He missed the idea thata >>>> a FIFO queue is a FIFO queue and ANY stream-oriented >>>> protocol (including TCP/IP to the >>>> local machine!) could be a valid implementation; >>>> instead, >>>> he fastened on one >>> >>>And its buffer would automatically grow to any required >>>length and automatically shorten as items are removed? >> **** >> Yep. That's EXACTLY what happens. And only and undefined >> and indeterminate points does >> the file system manage to get these updated blocks out to >> the hard drive (unless you have >> a way to force synchronization of the buffers with the >> magnetic surfaces). So imagine >> that you have deleted records in page 1 and added records >> to page 7. When you delete >> records, the other records are "shuffled down" to fill the >> space. These pages are >> committed to disk in opportunistic order, so what is on >> the platters represents a snapshot >> of the in-memory buffers at random states, and the pages >> on the disk may be inconsistent >> with the pages in memory. So you can end up with >> duplicate records, missing records at >> the end, etc. >> **** > >Make sure the flush to disk then. **** The OS just crashed. Or your unlikely power-failure scenario just happened. So the files are flushed to disk exactly HOW? *** > >>>I think that many of these issues may go away by using two >>>half-duplex named pipes one in each direction. No one has >>>yet pointed out any issues with Unix/Linux named pipes. I >>>like named pipes because the implement the FIFO >>>intuitively >>>with minimal learning curve. >> **** >> No, in fact, NONE of them change, at all. Whether you are >> using two half-duplex pipes >> (which is all linux supports, even as named pipes) or a >> full-duplex pipe (as is supported >> in Windows). > >Unix/Linux groups say that any issues with named pipes must >be on Windows because Windows named pipes are borked. **** The typical sort of asinine response I expect from a linuxoid. They have no idea what they are talking about, because the ONLY think Windows named pipes and linux named pipes have in common is the spelling 'n-a-m-e-d p-i-p-e-s'. They are not even the same CONCEPT. But since linux is perfect, and every other operating system is less than perfect, OF COURSE linux got it right. And do you know the joke about what happens if you try to delete all the files (not directories) in the root directory (where there should actually be no files) and one of them has the valid Unix filename "-r"? Long out of print, The Unix-Haters Handbook is a hoot, mostly because those of us who used Unix for years got bit by one or more of the failures described. And linux is bug-for-bug compatible with Unix, so the same problems exist. Note that many of these problems existed in the open-source programs from GNU, BSD, and others, which linux adopted. **** > >> >> If either the server app or the app it spawns fail, the >> contents of the name pipe will be >> lost. Just because nobody bothered to point out the >> obvious does not mean the problem >> does not exist. Low learning curve does not immediately >> map to robust transacted data >> transfer! > >I already solved this issue with my very early design. That >is the purpose of my persistent disk file based FIFO queue. >As soon as it gets to this file, then even a server crash >will not prevent the job from getting completed correctly. >We may lose the way to send it back to the user's screen (it >will still be in their account when they log in) but we did >not lose the actual transaction even if the server crashes. **** As I recall, because SQLLITE could not have record numbers and do a seek, you abandoned the persisten disk-based FIFO queue in favor of named pipes (which have ZERO robusteness under most failure scenarios, but can add infinite amounts of kernel memory so they can keep growing, so they have some advantages; run out of memory, just create a pipe and write to it until you get more memory added by magic) In fact, the last I saw, you had FOUR of these queues, all badly designed, all working with processes that were using the worst possible approach to handling prioritization, minimizing concurrency, maximizing response time. Not clear that this is forward progress. **** > >Alternatively if we lose any part of the process before we >get an HTTP acknowledgement that they received their >results, we roll the whole transaction back. **** Actually, you either lose all of a process, or none of it. What you are trying to say, I think, is that if you suffer a failure at any point in the workflow state machine, there is some totally magical means that gets you back to the magically constructed recovery software that restarts the workflow at some point. I love this approach. Sadly, it would not work for me, because my customers actually want results. But I defrauded them by billing for many hours spent solving these problems, when I could have just waved my magic wand and gotten a solution that worked! joe > >> joe >> **** >>> >>> >> Joseph M. Newcomer [MVP] >> email: newcomer(a)flounder.com >> Web: http://www.flounder.com >> MVP Tips: http://www.flounder.com/mvp_tips.htm > Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Hector Santos on 9 Apr 2010 14:33 Joseph M. Newcomer wrote: > See below... > On Thu, 08 Apr 2010 22:21:31 -0400, Hector Santos <sant9442(a)nospam.gmail.com> wrote: > >> Joseph M. Newcomer wrote: >> >>> The same way ANY CGI-based process notifies its parent that it has completed. I believe >>> this is by closing stdout. >> >> Ultimately, the process ending (which must end) is the deciding factor >> which the parent is waiting on (with a idle timeout in redirected data). > **** > As I indicated, it is whatever the normal criterion is. My recollection of CGI was the > closing of stdout meant that there could be no future output, and that was the determining > factor, but I haven't look at this in a decade. Most CGIs just write to the standard output device using the std output functions, i.e. printf(), while others might open the std output handle and use it in other output functions, in which case, they should be closing it when they open. Since all handles will be closed when the process ends anyway, you need to check for the child process to end. However, a good web server processing a script map or any spawned child with a std I/O redirection, needs to do four things overall: - Graceful Process Ending - Client side socket drop or disconnect. - Global Timeout, no CGI should be active "forever". Our default is 5 mins. - An output idle timeout, optional. This can be shorter than 5 minutes if used. It should not be enforced because you don't know if a particular CGI is just a little slow to response The child process is terminated if a timeout occurs or the client connection drops which can easily happen when users switch to new pages and/or clicking another link when the browser hour glass is active waiting for a response. > The problem with the OP is that since he > doesn't actually look into the details, he hypothesizes how the details MUST work by > falling back on some form of mysticism, and if he sees a blank wall, makes some guesses > based on inadequate information and if the guesses don't work, figures there must not be > such a mechanism. Hence the overconcern with a non-problem, notifying a Web server that > the process has completed its action. The VERY FIRST instance of a Web server launching a > script had such a mechanism, but since he doesn't see it, it must be an unsolved problem! Right. The main thing I was trying to get across is that HTTP is a client-request, server-response system. The thread that handles processing the request is the same thread to wait for a response generated from the processing. Whether he realizes it or not, I lean on not, his Many Threads to 1 FIFO OCR thread design (he now says four, 1 for each type of request), requires a thread handle to send back a result to the request thread waiting for a response. In short, he can just get a request, a new thread is started, queue to 1 FIFO queue and EXIT without a response. With no response, the socket is close and the browser sets that as a 404. A response is required on the SAME SOCKET connection. He has not shown he understand this repeated idea to him. > There must be a Platonic Ideal mechanism; he guesses about its existence. From flickering > shadows on the wall. The rest of us read the documentation or the code. > > I get so tired of this... Yeah. Me too. :) What gets me, is that is simple - just follow standard practice. I said this before, there is only one reason for this odd behavior, and I honestly believe this: He really has not control over this "ocr code" or he lacks understanding how to change it, i.e. make thread safe. You can imagine that whatever is done is all using global variables, etc. So he honestly appears to be stuck on wrapping everything else around a single process handling all the queued request as they come in. Just consider the stated boundary conditions: 100 TPS (Transactions Per Second) 100 ms PPT (Processing time Per Transaction) That right there dictates a minimum of 10 handlers for an ideal steady state operation with no pressure points. It doesn't matter if its 10 threads in 1 process or 10 processes on one or multiple machines, he needs 10 handlers. He having realized this after it was pointed out, he seems to now say that 100 ms is the worst case and that 10 ms is really the fastest process time. Did he calculate 10ms to satisfy the 1 single FIFO OCR model he seems to be stuck with? One wonders, but there is no way on earth, even with the great Linux, he can't handle: WEB SERVER REQUEST SERVER - incoming request, start thread - log the request translation - save posted data (images, I presume) to unique folder - check user database for authentication and ACL, credit baloney - queue it for the OCR processor - signal the OCR processor with thread handle (see below) - wait for response - send http response all in 10 milliseconds. I'm sorry. :) and remember the OCR processor is a single thread: OCR FIFO PROCESSOR - Wait for named piped to be signaled - read fixed length pipe data - update database state - process the images from unique folder - generate response - send response to requesting thread - update database state - goto top There is NO WAY he do both parts in 10 ms! Even with the benefit of the doubt he can do it in 11ms, the queue pressure will build. You can actually model this with this simple formula: TPS = N * 1000/(RPT + PPT) where TPS = Transactions per second N = number of handlers RPT = Request handling Time per Transaction (ms) PPT = processing time per Transaction (ms) So for TPS = 100 PPT = 10 ms N = 1 100 = 1*1000/(RPT+10) therefore RPT = 0. which is not reality. The reality is to solve for N, given a TPS goal and estimated total processing time. N = TPS*(RPT + PPT)/1000 Assume RPT = 40 ms, we have N = 100*(40+10)/1000 N = 5 handlers! If your worst case is RPT+PPT = 100 ms, then you need 10 handlers. Now, he should probably be more realistic about the 100 TPS. I doubt he will get that, 6000 Request per minute, 360,000 per hour! Come on! he has a niche product. Expect lower TPS to start and go from there. Consider 3600 per hour (you should be so lucky), 60 per min, 1 per second, with just 1 machine 1 = 1*(RPT+PPT)/1000 RPT+PPT = 1000 ms or 1 second! Now he can jungle the performance (and methods) of all both parts. Even if he feels with a good engineering guess that the turnaround time is 500 ms with 1 processor: 1 = TPS*(500)/1000 Solve for TPS, and he can handle transactions at a rate of: 2 per second, 120 per min, 7200 per hour, 57,600 per day (8 hrs per day) 13,824,000 per year (20*12 workdays) DUDE, be realistic! Don't be greedy! Keep your eye on the prize! Your customer base will start small and that is what you MUST consider. If you do have something, they will come and now you will be able to get the capital and maybe VC interest who will treasure businesses with an establish customer base and PROVE there is a market. Stop wasting your time on issues that only made you never finish this 10 year endeavor of yours and will continue for another 10 years the way you are approaching this. -- HLS
From: Hector Santos on 9 Apr 2010 14:35 Peter Olcott wrote: > > I will be looking into all these details, I bought a bunch > of books. I want to narrow down exactly which details that I > need to look into. For Windows or Linux? -- HLS
From: Peter Olcott on 9 Apr 2010 14:44
"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message news:%23GAtkNB2KHA.4936(a)TK2MSFTNGP04.phx.gbl... > Peter Olcott wrote: > >> >> I will be looking into all these details, I bought a >> bunch of books. I want to narrow down exactly which >> details that I need to look into. > > > For Windows or Linux? > > > > -- > HLS Linux/Unix MySQL SQLite HTTP |