Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Peter Olcott on 8 Apr 2010 22:51

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:ealujq41KHA.260(a)TK2MSFTNGP05.phx.gbl...
> Peter Olcott wrote:
> For what you want to use it for, my engineering sense
> based on experience tells me you will have problems,
> especially YOU for this flawed design of yours. Now you
> have 4 Named Pipes that you have to manage. Is that under
> 4 threads? But you are not designing for threads. One
> message yes, another no. Is the 1 OCR process going to
> handle all four pipes? Or 4 OCR processes? Does each
> OCR have their own Web Server? Did you work out how the
> listening servers will bind the IPs? Are you using
> virtual domains? sub-domains? Multi-home IP machine?

(1) One web server that inherently has by its own design one
thread per HTTP request
(2) Four named pipes corresponding to four OCR processes,
one of these has much higher process priority than the rest.
(3) The web server threads place items in each of the FIFO
queues.
(4) The OCR processes work on one job at a time from each of
the four queues.

>
> You really don't know what you are doing, right?
>
>>> But even then, I can understand why the success. Unix is
>>> not traditionally known to work with threads, and the
>>> piping has permanent storage - your DISK - making it
>>> easy to allow for easy recovery. Simple.
>>
>> The data is not supposed to ever actually hit the disk. I
>> started a whole thread on just that one point.
>
>
> But linux pipes are part of the disk, or did you missed
> that part, forgot or wish not to believe it?

Just the pipe name itself it part of the disk, nothing else
hits the disk. There are many messages about this on the
Unix/Linux groups, I stated a whole thread on this:

Do named pipes have disk I/O ??

>
> --
> HLS

From: Hector Santos on 8 Apr 2010 23:01

Well, good luck Peter with your Linux project. How to see it done one
day, even if its 2010. We will all look back and have a good laugh
about how much drugs you were taken. :)

Ciao

Peter Olcott wrote:

> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
> news:ealujq41KHA.260(a)TK2MSFTNGP05.phx.gbl...
>> Peter Olcott wrote:
>> For what you want to use it for, my engineering sense
>> based on experience tells me you will have problems,
>> especially YOU for this flawed design of yours. Now you
>> have 4 Named Pipes that you have to manage. Is that under
>> 4 threads? But you are not designing for threads. One
>> message yes, another no. Is the 1 OCR process going to
>> handle all four pipes? Or 4 OCR processes? Does each
>> OCR have their own Web Server? Did you work out how the
>> listening servers will bind the IPs? Are you using
>> virtual domains? sub-domains? Multi-home IP machine?
>
> (1) One web server that inherently has by its own design one
> thread per HTTP request
> (2) Four named pipes corresponding to four OCR processes,
> one of these has much higher process priority than the rest.
> (3) The web server threads place items in each of the FIFO
> queues.
> (4) The OCR processes work on one job at a time from each of
> the four queues.
>
>> You really don't know what you are doing, right?
>>
>>>> But even then, I can understand why the success. Unix is
>>>> not traditionally known to work with threads, and the
>>>> piping has permanent storage - your DISK - making it
>>>> easy to allow for easy recovery. Simple.
>>> The data is not supposed to ever actually hit the disk. I
>>> started a whole thread on just that one point.
>>
>> But linux pipes are part of the disk, or did you missed
>> that part, forgot or wish not to believe it?
>
> Just the pipe name itself it part of the disk, nothing else
> hits the disk. There are many messages about this on the
> Unix/Linux groups, I stated a whole thread on this:
>
> Do named pipes have disk I/O ??
>
>> --
>> HLS
>
>

--
HLS

From: Peter Olcott on 9 Apr 2010 11:18

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:2p1tr59b52hnu9ptrs3tlt8ruldfbaofq9(a)4ax.com...
> See below...
> On Wed, 7 Apr 2010 22:02:13 -0600, Jerry Coffin
> <jerryvcoffin(a)yahoo.com> wrote:
>
>>In article
>><L9qdndbIjeoeOCHWnZ2dnUVZ_tKdnZ2d(a)giganews.com>,
>>NoSpam(a)OCR4Screen.com says...
>>
>>[ ... ]
>>
>>> The web server is designed with one thread per HTTP
>>> request.
>>
>>This falls somewhere in the "terrible" to "oh no!" range.
>>You
>>normally want a fairly small pool of threads with a queue
>>of tasks
>>for the threads to do. The number of threads in the pool
>>is normally
>>based on the number of (logical) processors available --
>>on the order
>>of 2 to 4 times as many as processors is fairly typical.
>>Given that
>>you're dealing with an extremely I/O heavy application, a
>>few more
>>than that might make sense, but not a whole lot.
> ***
> The traditional HTTP server/CGI interface lets the
> operating system manage the "pool of
> threads" by creating new threads when one is needed (using
> the old Unix
> one-thread-per-process model, this means "launch a new
> instance of the program" and lets
> the scheduler deal with the resulting load). The FASTCGI
> technique keeps the processes
> around so the launch cost does not exist. Apache used a
> process-pool model where a pool
> of recently-used programs are kept running "just in case"
> there is a need for them.
>
> IIS used a thread pool and ISAPI, which mean a DLL was
> loaded by a thread in the thread
> pool; the downside of this was if the ISAPI extension
> corrupted the heap or took some kind
> of failure such as an access fault, the whole IIS went
> down (let's hear applause fot the
> winner of the Dumbest Web Server Design Ever Created
> award). This has been supplanted by
> using CLR components, because the protected object model
> makes it impossible to corrupt
> the heap, and if a component fails, it throws an exception
> that aborts the execution but
> can be caught and handled gracefully by the invoker. But
> most Web servers do not
> implement queues of tasks in the way you suggest.
>
> However, your implementation is easily realizable by using
> an I/O Completion Port as a
> thread queue and setting the maximum concurrency to be the
> number of CPU cores. You might
> have more threads, but a thread that gets blocked on I/O
> is removed from the thread
> concurrency count.

I am going to use four processes for the four different
levels of process priority. The High priority jobs will be
assigned something like 80% of the CPU (relative to the low
priority jobs) and the low priority jobs will each get
something like 7% of the CPU. Each of these processes will
pull jobs from each of four FIFO queues. Some sort of port
access might be better than a named pipe because each port
access could be explicitly acknowledged when it occurs.
Shoving something in a named pipe does not provide this
benefit.

> ****
>>
>>What's most important is that you *not* tie a thread to a
>>request
>>though -- the number of threads is a tunable parameter of
>>the pool,
>>independent of the number of HTTP requests.
>>
>>> I may have as many 1,000 concurrent HTTP requests. I am
>>> thinking that each of these threads could append to a
>>> single
>>> file with no conflict (the OS sequencing these
>>> operations)
>>> as long as the append is immediately flushed or
>>> buffering is
>>> turned off.
>>
>>I don't know a more polite way to say it, so I'll put it
>>bluntly:
>>you're wrong. You cannot depend on the OS to order the
>>operations.
> ****
> This is one of those "magical mechanisms" he is so font of
> invoking to solve serious

No the Unix/Linux said that appends are atomic. It may have
been referring to appends from the same thread or process.
In any case another process that each thread invokes though
IPC would force the sequencing that I need.

> Frankly, I'm curious what "ordering" has to happen here.
> He did suggest a two-queue model
> for low and high priority tasks, showing a complete lack
> of understanding of realtime
> scheduling, introducing the possibility of priority
> inversion or unused resources.

I have looked into priority inversion, and this can only
occur if there is a shared resource. I don't envision that
these scenarios will occur in the design that I am
proposing.

>>What's the point of doing things this way? Right now,
>>you're planning
>>to write some data to the transaction log, then send a
>>pointer to
>>that data to the OCR engine, then the OCR engine reads the
>>transaction log, does the OCR, updates the transaction
>>log, and
>>alerts the appropriate thread in the web server.
> ****
> The same way ANY CGI-based process notifies its parent
> that it has completed. I believe
> this is by closing stdout. Alternatively, if the service
> is multithreaded, and embedded
> in the server, then any interthread mechanism will work
> well, but these are just way, way
> too obvious. Instead, he assumes that there is no way for
> one process to notify another
> (in linux, signal() accomplishes this nicely!) so othis
> falls into the
> failure-to-understand-means-you-can't-use-the-mechanism
> magical lack of mechanisms.

I was intending to use a named pipe to tell the web server
that a specific job is completed. The web server would then
have the task of informing its own threads based on
Thread-ID.

A signal might also work. I am not sure that this would not
screw some things up because the signal could arrive before
the process is ready to process it. I am guessing that this
is the same sort of reentrant processing that I did when I
was writing MSDOS TSR routines that triggered on the clock
tick interrupt vector. In this case there were times when
the current process could not be interrupted. As I recall
writing to a file was one of these cases.

> ****
>>
>>Instead of sending the data from the input thread to the
>>OCR via the
>>transaction log (and sending a pointer directly from the
>>input to the
>>OCR), send the relevant data directly from the input
>>thread to the
>>OCR engine.
> ****
> If the OCR engine is embedded, an I/O Completion Port
> works just dandy. In linux, other
> queuing mechanisms can exist.

I made a note to look into this.

> And a crash might mean the customer doesn't get billed.
> As I said in an earlier message,
> the simplest implementation that meets the requirements
> that no customer pays for
> undelivered goods should be sufficient. A solution where
> no result is unbilled is also
> nice, but that may be harder.
> joe
> ****

Yes, but, the complex issue raises its head again when the
need to protect against a crash when the customer is adding
money to their account occurs.
(1) Customer add ten dollars to their account
(2) System crash occurs before periodic backup deleting this
transaction
(3) I just lost some of the customer's money

This is the case where the simplest solution that I can
envision is to provide some sort of on-the-fly
transaction-by-transaction offsite backup.

> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 9 Apr 2010 11:56

See below....
On Thu, 8 Apr 2010 20:24:48 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
>message news:4qusr5hrulhbsjgvogtna53ilq52tq4bre(a)4ax.com...
>> See below...
>> On Thu, 8 Apr 2010 08:54:38 -0500, "Peter Olcott"
>> <NoSpam(a)OCR4Screen.com> wrote:
>>
>>>
>>>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
>>>message news:6aspr5dr3kb4npe47j9mu26kbl2ib4s28v(a)4ax.com...
>>>> On Wed, 7 Apr 2010 10:07:02 -0500, "Peter Olcott"
>>>> <NoSpam(a)OCR4Screen.com> wrote:
>>>>
>>>>>
>>>>>Sure so another way to solve this problem is on the rare
>>>>>cases when you do lose a customer's money you simply
>>>>>take
>>>>>their word for it and provide a refund. This also would
>>>>>hurt
>>>>>the reputation though, because this requires the
>>>>>customer
>>>>>to
>>>>>find a mistake that should not have occurred.
>>>> ****
>>>> Incredibly elaborate mechanisms to solve non-problems.
>>>> Simple mechanisms (e.g., "resubmit
>>>> your request") should suffice. Once your requirements
>>>> state what failure modes are
>>>
>>>You are not paying attention. I am talking about a server
>>>crash with loss of data after the customer has added money
>>>to their account, but, before this financial transaction
>>>has
>>>been saved to offsite backup. They add ten bucks to their
>>>account and I lose track of it because the server crashed
>>>and it was not yet time for my periodic backup.
>> ****
>> Actually, I AM paying attention; you are not paying
>> attention. I suggest creating the
>> MINIMUM amount of complexity that guarantees that the
>> customer is not charged for a
>> failure; you are attempting to create incredibly elaborate
>> mechanisms that give you the
>> illusion of 100% reliability. I say: fail and don't
>> charge, or fail and refund, and
>> implement the smallest, simplest system that satisfies
>> this design.
>> joe
>
>OK what is the simplest possible way to make sure that I
>never ever lose the customer's ten bucks, even if the server
>crashes before my next backup and this data is lost in the
>crash?
***
I would use a transacted database system, and record several states of the job submitted
"In the queue", "being processed". "processing completed", "results succesfully sent to
customer", "billing completed". Perhaps fewer states are required. Then, upon recovery,
I would examing these states and determine what recovery action was required. Note that
what I'm talking about here is the specification document; the requirements document
merelly says "Shall not bill customer for undelivered results" and stops right there.

Then, in the specification document, the state machine for billing would be laid out, and
a suggestion of a transacted databased to maintain the state, and so on. Only when you
got to the implementation would you worry about a transacted database as the
implementation strategy, or the details of the state management and error recovery.
joe
****
>
>> ****
>>>
>>>
>> Joseph M. Newcomer [MVP]
>> email: newcomer(a)flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 9 Apr 2010 12:03

See below...
On Thu, 08 Apr 2010 22:21:31 -0400, Hector Santos <sant9442(a)nospam.gmail.com> wrote:

>Joseph M. Newcomer wrote:
>
>> The same way ANY CGI-based process notifies its parent that it has completed. I believe
>> this is by closing stdout.
>
>
>Ultimately, the process ending (which must end) is the deciding factor
>which the parent is waiting on (with a idle timeout in redirected data).
****
As I indicated, it is whatever the normal criterion is. My recollection of CGI was the
closing of stdout meant that there could be no future output, and that was the determining
factor, but I haven't look at this in a decade. The problem with the OP is that since he
doesn't actually look into the details, he hypothesizes how the details MUST work by
falling back on some form of mysticism, and if he sees a blank wall, makes some guesses
based on inadequate information and if the guesses don't work, figures there must not be
such a mechanism. Hence the overconcern with a non-problem, notifying a Web server that
the process has completed its action. The VERY FIRST instance of a Web server launching a
script had such a mechanism, but since he doesn't see it, it must be an unsolved problem!

There must be a Platonic Ideal mechanism; he guesses about its existence. From flickering
shadows on the wall. The rest of us read the documentation or the code.

I get so tired of this...
joe
****
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

First | Prev | Next | Last
Pages: 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system