From: Hector Santos on 12 Apr 2010 00:18 Peter Olcott wrote: > "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message > news:O2mJq3d2KHA.2284(a)TK2MSFTNGP06.phx.gbl... >> Peter Olcott wrote: >> >> Joe, at least so far we got him to: >> >> - Admit to lack of understanding of memory and he >> himself reduced >> the loading requirement rather than code for any large >> memory >> efficiency methods. > > No. Joe was and continues to be wrong that a machine with > plenty of extra RAM ever needs to page out either a process > or its data. No, Joe never said or implied that at all. No one did. The only thing that you "said" you found that if you load a simple test program (that doesn't do any real testing at all), that you see ZERO FAULTS after the initial faults have settled down. That fact you got initial faults should TELL you that you are still a candidate for faulting, especially when the system is LOADED. You have not loaded your system. > >> - Admit that his 100 TPS was unrealistic for a 10 ms >> throughput >> that lacked consideration for the interfacing >> processing time >> outside the vapor ware OCR processor. So he added >> another >> 10 ms and reduced the TPS now to 50. > > No, the latest analysis indicates that I am back up to 100 > because the webserver and the OCR execute in parallel. No, it shows EXACTLY what the simple equation TPS = N * 1000/ WORK LOAD and the charts I provided to you is SAYING, that if if you want 100 TPS, with a 20 ms WORK LOAD, yuo need N=2 Handlers! If you want to do this with one handler, than you can only do 50 tps. But again, this is an idealize equalized loading system - a single queue with two handlers. One request coming in at a time. That is not reality unless you synchronize the incoming queuing and perform load balancing. But knowing how you think, you will say, that each one is its own WEB SERVER. So what? How do you control the request that are coming in. You said you want 100 TPS, but that load can come in in 500 msecs! Now your simple equation is: 100 request/500 ms = N /20ms work load Solve for N and N = 4 handlers, threads, separate processors, who cares how they are concurrently running for that 500 ms time span, hyperthreaded or each on their own CPU or machine - you need 4 handlers - period! >> He basically does not see the queue accumulation! > > The only way this site is going to ever get too long of a > queue is if too many free jobs are submitted. Do you really > think that this site is ever going to be making $10.00 per > second? If not then I really don't have to worry about queue > length. In any case I will keep track of the average and > peak loads. Fine, if you are going to do do thread delegation and load balancing, fine. All I am pointing out in this lesson is that your modeling is flawed for the work loading you expect to get and will not using this Many Thread to 1 FIFO queuing framework. Even without my software experience, I'm a chemical engineer, this is UNIT OPS 101. College freshman understanding. Even for accountants! IN = OUT must be conserved to obtain any level of steady state operation - otherwise you begin to get chaos, pressures, overflows, EXPLOSIONS! Here is a quick plan: 1) Get any web server with CGI or PHP script mapping support. 2) Design logic for shared map READ-ONLY meta data so you don't have the 30-60 load time. 3) Use simple PHP to process your OCR. The OCR can still be your SINGLE PROCESS non-threaded compiled code. PHP will give you all the scripting power and support for any SQL engine, direct file or anything logging you need, including delegation to a one of the OCR processors. GET YOUR MEASUREMENTS BASED ON THIS and begin the next step, if necessary, as you might be surprise that you might be able to handle a good enough TPS just to get you started for your presentations. Best of all, you can do the above under LINUX! If it pays offs to continue and you need better scalability, then you can begin to explore many of the ideas discussed here to improve it. Overall, you need to redesign and measure how the OCR processor can work as a multi-threaded processor. You won't get very far until you do this. -- HLS
From: Peter Olcott on 12 Apr 2010 00:29 "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message news:O%23HmXKf2KHA.5212(a)TK2MSFTNGP04.phx.gbl... > Peter Olcott wrote: > >> http://en.wikipedia.org/wiki/Priority_inversion >> If there are no shared resources then there is no >> priority inversion. >> Try and provide a valid counter-example of priority >> inversion without shared resources. > > > You don't to have a dead lock to reveal problems. You can > get Race Conditions with classic SYNC 101 mistakes like > this that depends on time synchronizations: > > if (NumberOfHighPriorityJobsPending !=0) > nanosleep(20); 20 milliseconds > > Since you like wikipedia, read: > > http://en.wikipedia.org/wiki/Race_condition > > Whats the point of the above? Are you expecting that the > value will turn 0 in the nanosleep(20) which is wrong > anyway. Is that 20 seconds or 20 nanaseconds? Did you > really mean? > > if (NumberOfHighPriorityJobsPending !=0) > usleep(20); > > In either case, you are are in for a RUDE awakening with > that. > > You probably mean: > > while (NumberOfHighPriorityJobsPending !=0) > usleep(20); > > which COULD be fine, but you should use an optimized > kernel object here to wait on. > > if (WaitForSingleObject(hPriorityEvent, INFINITE) == > WAIT_OBJECT) { > /// do whatever > } else { > /// Not what I expected > } > > When you wait on a kernel object, you won't be spinning > your thread like you do above. Event driven is better. I would prefer that the high priority jobs has absolute priority over the lower priority jobs. Even better would be if this could be done efficiently. I think that process priority would work well enough. That would depend on how the kernel scheduler works, the frequency and duration of the time slices. >> You are not explaining with much of any reasoning why you >> think that one alternative is better than another, and >> when I finally do get you to explain, it is only that >> your alternative is better than your misconception of my >> design, not the design itself. > > > No, your problem is that you are stuck with a framework One design constraint that won't be changed until system load requires it is that we must assume a single core processor with hyperthreading. > Many Threads to 1 FIFO/OCR process > > and everyone is telling you its flawed and why. I'm tried > different When they finally get to the why part I point out their false assumption. A priority Queue may be a great idea with multiple cores, I will not have those. > ways using your WORK LOAD which you accepted and began to > change your TPS. > > But you still going to overflow your Many to 1 design, > especially if you expect to use TIME to synchronize > everything. This is not a given, but, using time to synchronize is not the best idea. It could possibly waste a lot of CPU. So then four processes with one getting an 80% of the relative share and the other three sharing about 7%. >> Exactly what are these ways, and precisely what have I >> failed to account for? > > > You been told in a dozen ways why it will fail! You are > OFF in your timing of everything for the most part. You > think you can achieve what you want with a Many Thread to > 1 OCR process design at the TPS rates and work load you > think you can get. > > You can't! Four processes four queues each process reading only from its own queue. One process having much more process priority than the rest. Depending upon the frequency and size of the time slices this could work well on the required single core processor. On a quad-core it would have to be adapted possibly using a single priority queue so that the high priority jobs could possibly be running four instances at once. >> I know full and well that the biggest overhead of the >> process is going to be disk access. I also know full and >> well that tripling the number of disk access would likely >> triple overhead. I am not sure that SQLite is not smart >> enough to do a record number based seek without requiring >> an index. Even if SQLite is not smart enough to do a >> record seek without an index, it might still be fast >> enough. > > > This is what I am saying, WE TOLD YOU WHAT THE LIMITS OF > SQLITE are and you are not listening. You can do a ROW > lookup, but you can't do a low level FILE RECORD POSITION > AND BYTE OFFSET like you think you need, but really don't. As long as the ROW lookup maps to the file byte offset we are good. If the ROW lookup must read and maintain an index just to be able to get to the rows in sequential order, this may not be acceptable. > I also told you that while you UPDATE an SQLITE database, > all your READS are locked! > > You refuse to comprehend that. I knew this before you said it the first time. The practical implications of this is that SQLite can't handle nearly as many as simultaneous updates as other row locking systems. Their docs said 500 transaction per second. > Again, you can SELECT a row in your table using the proper > query, but it isn' a direct FILE ACCESS with BYTE OFFSET > idea and again, SQLITE3 will lock your database during > updates so all your REQUEST SERVER will be locked in > reading/writing any table while it is being updated by > ANYONE. If it doesn't require a separate index to do this, then the record number maps to a byte offset. Since record numbers can be sequenced out-of-order, in at least this instance it must have something telling it where to go, probably an index. Hopefully it does not always make an index just in case someone decides to insert records out-of-sequence. >> You (and Hector) are definitely right on some things. > > > We are right on EVERYTHING discussed here. There has been > nothing you stated or posted that indicates any error in > all suggestions to you. You and Joe are most often wrong by making false assumptions about the details of my design and its requirements. > IDEAL: Many Threads to Many Threads > WORST: Many Threads to 1 thread I guess that I am currently back to alternative two which is many threads or a web server to four OCR processes via four FIFOS on a single core machine, one process having much more process priority than the others. A multi-core processor would probably involve the same thing except have a single priority queue in-between.
From: Jerry Coffin on 12 Apr 2010 01:18 In article <LuidnT3tuaC7p1_WnZ2dnUVZ_gCdnZ2d(a)giganews.com>, NoSpam(a)OCR4Screen.com says... [ ... ] > Alternative (a) There are four processes with four queues > one for each process. These processes only care about > executing the jobs from their own queue. They don't care > about the jobs in any other queue. The high priority process > is given a relative process priority that equates to 80% of > the CPU time of these four processes. The remaining three > processes get about 7% each. This might degrade the > performance of the high priority jobs more than the next > alternative. There is no such thing with any OS of which I'm aware. At least with a typical OS, the highest priority task is the *only* one that will run at any given time. Windows (for one example) does attempt to prevent starvation of lower priority threads by waking one lower priority thread every four seconds. Though the specific details differ, Linux works reasonably similarly. Neither, however, provides any set of priorities that will give anything similar to what you've described. It just doesn't exist. > Alternative (b) each of the low priority jobs checks to see > if a high priority job is in the queue or is notified by a > signal that a high priority job is waiting. If a high > priority job is waiting then each of these low priority jobs > immediately sleeps for a fixed duration. As soon as they > wake up these jobs check to see if they should go back to > sleep or wake up. This requires that each of those tasks is aware of its own process scheduling AND of the scheduling of other processes of higher priority. Worse, without a lot of care, it's subject to race conditions -- e.g. if a high priority task shows up, for this scheme to work, it has to stay in the queue long enough for every other task to check the queue and realize that it needs to sleep, *before* you start the high priority task -- otherwise, the task that's supposed to have lower priority will never see that it's in the queue, and will continue to run. Bottom line: you're ignoring virtually everything the world has learned about process scheduling over the last 50 year or so. You're trying to start over from the beginning on a task that happens to be quite difficult. > These processes could even simply poll a shared memory > location that contains the number of high priority jobs > currently in the queue. From what the hardware guys have > told me memory writes and reads can not possibly garble each > other. This has the same problem outlined above. It adds the requirement for a shared memory location, and adding polling code to the OCR tasks. See above about ignoring what the world has learned about process scheduling over the last 5 decades or so. [ ... ] > Neither of these designs has any of the behavior that you > mentioned. No -- they're substantially worse. At least all that did was occasionally start a lower-priority task out of order. [ ... ] > I already figured out a way around that. Everyone must have > their own user account that must be created by a live human. > All users are always authenticated against this user > account. I don't see any loopholes in this on single form of > protection. Not even close, and you clearly don't understand the problem at all yet. The problem is that to authenticate the user you've *already* created a thread for his connection. The fact that you eventually decide not to do the OCR for him doesn't change the fact that you've already spawned a thread. If he makes a zillion attempts at connecting, even if you eventually reject them all, he's still gotten you to create a zillion threads to carry out the attempted authentication for each, and then reject it. Of course, that also ignores the fact that doing authentication well is non-trivial itself. -- Later, Jerry.
From: Jerry Coffin on 12 Apr 2010 01:57 In article <abidnZAfALb5HF_WnZ2dnUVZ_jydnZ2d(a)giganews.com>, NoSpam(a)OCR4Screen.com says... [ ... ] > No. Joe was and continues to be wrong that a machine with > plenty of extra RAM ever needs to page out either a process > or its data. It's not a question of whether it *needs* to -- it's a simple fact that with both Windows and Linux, it will *try* to whether it needs to or not. Windows it's called the working set trimmer -- it's a task that does nothing but attempt to remove pages from a process' working set. Offhand, I don't remember the name of the equivalent under Linux, but it has one (runs as a daemon if memory serves). If you want to avoid that, you probably want to try to find a provider running one of the *BSD systems for the server instead -- I haven't re-checked recently, but at least as of a few years ago, *BSD (at least most of them) did not have an equivalent of a working set trimmer. [ ... ] > No, the latest analysis indicates that I am back up to 100 > because the webserver and the OCR execute in parallel. On a single core machine? There are a few pieces that can execute in parallel (the OCR can use the CPU while the network adapter is reading or writing data), but with only one core, very little really happens in parallel -- the whole point of multiple cores (or multiple processors) is to allow them to *really* do things in parallel, instead of just switching between processes quickly enough for it to *look* like they're running in parallel. [ ... ] > The only way this site is going to ever get too long of a > queue is if too many free jobs are submitted. Do you really > think that this site is ever going to be making $10.00 per > second? If not then I really don't have to worry about queue > length. In any case I will keep track of the average and > peak loads. Keeping track of average vs. peak load is easy -- dealing with it (given a task as processor intensive as you've suggested) is not. Seriously, you'd be a lot better off with a "cloud" computing provider than one that gives you only a single core. Assuming your OCR really works, there's a pretty fair chance that the work pattern will be substantially different than you seem to imagine -- instead of a page or two at a time, you're (reasonably) likely to receive scans of an entire book at a time. This is a scenario where something like Amazon's EC2 would work well -- you pay only for processor time you actually use, but if you get a big job, it can run your task on dozens or even thousands of processors so (for example) all the pages from a book are OCRed in parallel, the results put back together, and your customer gets his result back quickly. Then your system goes idle again, and you quit paying for *any* processor time until another task comes along. -- Later, Jerry.
From: Hector Santos on 12 Apr 2010 09:39
On Apr 12, 1:57 am, Jerry Coffin wrote to Peter: > Seriously, you'd be a lot better off with a "cloud" computing > provider than one that gives you only a single core. Assuming your > OCR really works, there's a pretty fair chance that the work pattern > will be substantially different than you seem to imagine -- instead > of a page or two at a time, you're (reasonably) likely to receive > scans of an entire book at a time. > > This is a scenario where something like Amazon's EC2 would work well > -- you pay only for processor time you actually use, but if you get a > big job, it can run your task on dozens or even thousands of > processors so (for example) all the pages from a book are OCRed in > parallel, the results put back together, and your customer gets his > result back quickly. Then your system goes idle again, and you quit > paying for *any* processor time until another task comes along. Interesting proposal. But this won't eliminate coding requirements? He only leverages the computing power and scaling needs which is good as one less (big) issue to deal with, but he still needs to frame his code around the same principles. In fact, this current framework of Many Threads to 1 FIFO queue might cost him more based on my readings. He will need to make it dynamic loading for EC2 to minimize his cost.. Right? -- HLS |