From: Peter Olcott on 10 Apr 2010 22:09 "Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message news:peq1s51e061rsv2h2jc8fj9samuv2r1i3g(a)4ax.com... > See below... > On Fri, 9 Apr 2010 20:19:46 -0500, "Peter Olcott" > <NoSpam(a)OCR4Screen.com> wrote: > >>You did not even pay attention to what I just said. The >>Unix/Linux people said the same thing about thread >>priorities, that is why I switched to independnet >>processes >>that have zero dependency upon each other. > **** > Putting threads in separate processes does not change the > problem! And I fail to see how On and on and on and on with the same false assumption. It took far too long to discover this false assumption if your would have been more direct, honest and straightforward in your communication it would not have take a dozen hours to see that you falsely assume that I have multiple threads in each OCR process. It is beginning to look a lot cheaper to simply read the 10,000 pages of books that I recently bought. > threads in the same process would, for this kind of > processing, EVER had a "dependency > upon each other", or, alternatively, how moving the > threads to separate processes solves > the fundamental thread scheduler issues that arise. The > only thing that changes in the > thread scheduler is it now has to load the GDT or LDT with > the correct pointers to the > page tables of the process that is being switched to. > > For some reason, you seem to think that "processes" > somehow magically possess properties > that "threads" do not. Well, sadly, threads are threads > and they are the only schedulable > entity. Which processes they live in is largely > irrelevant. You have completely failed > to understand how schedulers work. Thread priorities > introduce the same problems no > matter which processes they are running in. I'm surprised > you missed something this > obvious. > **** >> >>> and should NOT be used as a method to handle load >>> balancing. I would use a different >>> approach, such as a single queue kept in sorted order, >>> and >>> because the free trial jobs are >>> small (rejecting any larger jobs) there should not be a >>> problem with priority inversion. >> >>Four processes, not threads. Since there is zero >>dependency >>upon each other there is no change of priority inversion. > **** > Four threads, each running in a private process, do not > look substantially different from > four threads, all running in one process, at least as far > as the scheduler is concerned. > Any issue dealing with thread priority issues deals SOLELY > with the concept of threads; > packaging them into various kinds of processes DOES NOT > AFFECT THIS! > > So you have missed the obvious, because you love > buzzword-fixation; if someone refers to > thread priorities, you seem to think these work > differently than "processes", for reasons > that are not only not apparent to anyone else, but > actually are completely wrong-headed. > > You lose. Please stop pretending you understand what you > are talking about while accusing > the rest of us of failing to understand your nonsensical > statements because we disagree > with them. Or ignore them, which is pretty much what they > deserve. > joe > **** >> >> > Joseph M. Newcomer [MVP] > email: newcomer(a)flounder.com > Web: http://www.flounder.com > MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Jerry Coffin on 11 Apr 2010 01:14 In article <0L6dnTk7cJuaulzWnZ2dnUVZ_u6dnZ2d(a)giganews.com>, NoSpam(a)OCR4Screen.com says... [ ... ] > A Linux/Unix expert David Schwartz says that it is nearly > impossible to avoid all kinds of hidden dependencies that > you have no control of when using threads. Threads under Linux/Unix are rather a different beast than under Windows. In particular, the basic Unix model of process creation interacts poorly with threads (to put it mildly). The problem is that Unix creates a new process as a clone of an existing process. With a single thread, this is pretty easy -- but with multiple threads, it gets ugly. One thread has clearly just called fork() -- but if (for example) another thread has initiated a write to disk that has not yet completed, you can't really just clone the whole state of the existing process into a new one. For better or worse, the thread creation API in Linux is basically just a variation of fork() -- basically, they have a set of bit-flags that indicate what to keep from the existing process, and what to create new. This means the same problems that can arise with creating a new multi-threaded process under Unix can also arise with creating new threads in the same process. Getting it to work at all took heroic effort, and even with that it doesn't really work well (and don't expect much improvement in that respect anytime soon either). Bottom line: since you're posting in a Windows-specific newsgroup, most of us have posted general designs that are oriented toward Windows. If you're designing this for Linux/Unix, then you almost certainly should use only one thread per process. Contrary to (at least the implication of) the statement above, however, the problem isn't really with threads themselves -- it's with how Unix (and especially Linux) has mis-implemented them. > For my purposes > with the redesigned OCR process there is no functional > difference between threads and processes because there is no > longer any need to share data. You seem to be having difficulty making up your mind. On one hand, you talk about gigabytes of static data, but on the other hand, you say there's no need to share data. One of these two is almost certainly false or at least misleading -- while sharing that static data may not be strictly *needed*, it would clearly be extremely desirable. -- Later, Jerry.
From: Jerry Coffin on 11 Apr 2010 01:14 In article <BPWdnX0fkKXvsFzWnZ2dnUVZ_u2dnZ2d(a)giganews.com>, NoSpam(a)OCR4Screen.com says... [ ... ] > I do explicitly remember that there was and probably still > is a mask interrupts flag that is directly on the processor > chip itself. Yes, and the OS uses this now and again -- but it's *completely* off- limits to user-mode processes. This situation is a bit like Joe having told you that cars sold in the US have require an interlock to prevent you from shifting them into reverse if the car is traveling forward at more than 5 MPH (and have required this for at least 10 years)-- and you reply by pointing out that you're sure the Ford Model T had no such thing. [ ... ] > My experience with this is with MS-DOS. ....and therefore so out of data that it's utterly meaningless with respect to Windows, Linux, or just about any other desktop or server OS sold within at least a decade. [ ... ] > Right thus preventing most every or every mutual dependency, > and thus the possibility of priority inversion because > priority inversion (according to several sources) can only > occur with some sort of mutual dependency. What you've said is that you're planning to use a single core, hyperthreaded processor (where did you get such a thing? Is it some old Pentium IV, or what?). That means *every* process and/or thread running on the machine shares the single most important resource: the processor. [ ... ] > Oh so then you just lied about the address space being > separate? The best that I can tell is that the separate > address space by itself almost completely eliminates the > possibility of any mutual dependency, and a mutual > dependency is absolutely required for priority inversion. Quite the contrary -- dependencies and address spaces are almost entirely orthogonal. Completely separate machines that don't share memory at all can (and often do) still have dependencies. > If you don't start explaining yourself more completely I > will reasonably assume that your whole purpose here is to be > a naysayer whose sole purpose is to provide discouragement > and you have no intention at all of being helpful. I've already outlined one possible way to implement the type of system you're talking about. I'll say right now, that most of what I've suggested *is* based on your running it under Windows or something reasonably similar. Under Linux, you almost certainly want to avoid thread pools, or anything else that involves a process having more than one thread. Linux just isn't suited well to that kind of design. That doesn't cause many really fundamental changes though -- you still basically want three processing stages: input, processing, and output, with queues separating the three. You want to keep the amount of raw data that goes into a single shared log file to a fairly minimal level -- maximizing dependability generally hurts performance (often substantially). Therefore, you want to store into that log file only the data that you really *need* to recover in case of a crash -- as an initial concept, I'd think in terms of it storing primarily the stage of processing for each task currently in the system (and little else). -- Later, Jerry.
From: Peter Olcott on 11 Apr 2010 08:13 "Jerry Coffin" <jerryvcoffin(a)yahoo.com> wrote in message news:MPG.262b000ee9eea05098985e(a)news.sunsite.dk... > In article > <0L6dnTk7cJuaulzWnZ2dnUVZ_u6dnZ2d(a)giganews.com>, > NoSpam(a)OCR4Screen.com says... > > [ ... ] > >> A Linux/Unix expert David Schwartz says that it is nearly >> impossible to avoid all kinds of hidden dependencies that >> you have no control of when using threads. > > Threads under Linux/Unix are rather a different beast than > under > Windows. In particular, the basic Unix model of process > creation > interacts poorly with threads (to put it mildly). The > problem is that > Unix creates a new process as a clone of an existing > process. With a > single thread, this is pretty easy -- but with multiple > threads, it > gets ugly. One thread has clearly just called fork() -- > but if (for > example) another thread has initiated a write to disk that > has not > yet completed, you can't really just clone the whole state > of the > existing process into a new one. > > For better or worse, the thread creation API in Linux is > basically > just a variation of fork() -- basically, they have a set > of bit-flags > that indicate what to keep from the existing process, and > what to > create new. This means the same problems that can arise > with creating > a new multi-threaded process under Unix can also arise > with creating > new threads in the same process. Getting it to work at all > took > heroic effort, and even with that it doesn't really work > well (and > don't expect much improvement in that respect anytime soon > either). > > Bottom line: since you're posting in a Windows-specific > newsgroup, > most of us have posted general designs that are oriented > toward > Windows. If you're designing this for Linux/Unix, then you > almost > certainly should use only one thread per process. Contrary > to (at > least the implication of) the statement above, however, > the problem > isn't really with threads themselves -- it's with how Unix > (and > especially Linux) has mis-implemented them. > >> For my purposes >> with the redesigned OCR process there is no functional >> difference between threads and processes because there is >> no >> longer any need to share data. > > You seem to be having difficulty making up your mind. On > one hand, > you talk about gigabytes of static data, but on the other > hand, you > say there's no need to share data. One of these two is > almost > certainly false or at least misleading -- while sharing > that static > data may not be strictly *needed*, it would clearly be > extremely > desirable. > > -- > Later, > Jerry. The redesign of my fundamental algorithm has resulted in such a huge reduction in memory requirements that I can now easily afford to load data on the fly. When I started these threads I was assuming the old algorithm.
From: Peter Olcott on 11 Apr 2010 08:30
"Jerry Coffin" <jerryvcoffin(a)yahoo.com> wrote in message news:MPG.262b087ffe60aad9989860(a)news.sunsite.dk... > In article > <BPWdnX0fkKXvsFzWnZ2dnUVZ_u2dnZ2d(a)giganews.com>, > NoSpam(a)OCR4Screen.com says... > What you've said is that you're planning to use a single > core, > hyperthreaded processor (where did you get such a thing? > Is it some > old Pentium IV, or what?). That means *every* process > and/or thread > running on the machine shares the single most important > resource: the > processor. > > [ ... ] > >> Oh so then you just lied about the address space being >> separate? The best that I can tell is that the separate >> address space by itself almost completely eliminates the >> possibility of any mutual dependency, and a mutual >> dependency is absolutely required for priority inversion. > > Quite the contrary -- dependencies and address spaces are > almost > entirely orthogonal. Completely separate machines that > don't share > memory at all can (and often do) still have dependencies. > >> If you don't start explaining yourself more completely I >> will reasonably assume that your whole purpose here is to >> be >> a naysayer whose sole purpose is to provide >> discouragement >> and you have no intention at all of being helpful. > > I've already outlined one possible way to implement the > type of > system you're talking about. I'll say right now, that most > of what > I've suggested *is* based on your running it under Windows > or > something reasonably similar. Under Linux, you almost > certainly want > to avoid thread pools, or anything else that involves a > process > having more than one thread. Linux just isn't suited well > to that > kind of design. > > That doesn't cause many really fundamental changes > though -- you > still basically want three processing stages: input, > processing, and > output, with queues separating the three. You want to keep > the amount > of raw data that goes into a single shared log file to a > fairly > minimal level -- maximizing dependability generally hurts > performance > (often substantially). Therefore, you want to store into > that log > file only the data that you really *need* to recover in > case of a > crash -- as an initial concept, I'd think in terms of it > storing > primarily the stage of processing for each task currently > in the > system (and little else). > > -- > Later, > Jerry. Since I only need to be able to process 100 transactions per second the speed of these overhead sort of things should not be too critical. I am guessing that the sum total of these overhead sort of things will only take about 10 ms per transaction most of this being drive head seek time. The two most unresolved issues with my design: (1) The best means to provide one higher level process with much higher and possibly absolute priority over three other types of jobs. The two current proposals are: (a) Assign a much higher process priority to the high priority jobs. The jobs are executed from four different FIFO queues. (b) Have each of three lower priority jobs explicitly put themselves to sleep as soon as a high priority job becomes available. The lower priority jobs could be notified by a signal. (2) The best way(s) to provide inter process communication between a web server that inherently has one thread per HTTP connection and up to four OCR processes (with a single thread each) one for each level of processing priority. |