Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Peter Olcott on 7 Apr 2010 11:07

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:isnnr55eirshsintd983glpf4o76bnkm4r(a)4ax.com...
> See below,,.
> On Tue, 6 Apr 2010 16:59:13 -0500, "Peter Olcott"
> <NoSpam(a)OCR4Screen.com> wrote:
>
>>
>>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
>>message news:2atmr51ml9kn4bb5l5j77h3lpiqtnlq8m3(a)4ax.com...
>>> See below...
>>> On Mon, 5 Apr 2010 21:32:44 -0500, "Peter Olcott"
>>> <NoSpam(a)OCR4Screen.com> wrote:
>>>
>>>>Ah but, then you are ignoring the proposed aspect of my
>>>>design that would handle all those things. What did you
>>>>call
>>>>it "mirrored transactions". I called it on-the-fly
>>>>transaction-by-transaction offsite backup.
>>> ****
>>> If you have a "proposed aspect" I presume you have
>>> examined the budget numbers for actual
>>> dollars required to achieve this, and the complexity of
>>> making sure it works right.
>>>
>>> I am not ignoring the issue, I'm asking if you have
>>> ignored the realities involved in
>>> achieving it!
>>
>>I would simply re-implement some of the aspects of my web
>>application such that there is another web application on
>>another server that the first server can send its
>>transactions to.
> ****
> Ohh, the Magical Mechanism solution! Of course, this adds
> time, complexity, and cost, but
> what do they matter? Maybe you could talk to your ISP
> about "load balancing" among
> multiple servers? They've already got this working! At
> least most ISPs that plan to
> survive have it working already.
> *****
>>
>>>>I don't want to ever lose any data pertaining to
>>>>customers
>>>>adding money to their account. I don't want to have to
>>>>rely
>>>>on the payment processor keeping track of this. Maybe
>>>>there
>>>>are already mechanisms in place that can be completely
>>>>relied upon for this.
>>> ****
>>> If a customer can add $1 and you spend $5 making sure
>>> they
>>> don't lose it, have you won?
>>
>>If you don't make sure that you don't lose the customer's
>>money your reputation will put your out of business. If
>>you
>>can't afford to make sure that you won't lose the
>>customer's
>>money then you can't afford to go into business.
> *****
> Yes, but you have to make sure the mechanisms you create
> to do this are cost-effective.
> See my earlier comments about UPS and FedEx not requiring
> "live signatures" for most
> deliveries! Sometimes, you lef your "insurance" pay this,
> and sometimes, you become your
> own insurer (this is called, technically, being
> "self-insured").
> joe

Sure so another way to solve this problem is on the rare
cases when you do lose a customer's money you simply take
their word for it and provide a refund. This also would hurt
the reputation though, because this requires the customer to
find a mistake that should not have occurred.

On the fly transaction by transaction offsite backup will be
implemented for at least the transactions that add money to
the customers account.

If I lose the transactions that deduct money, then some
customers may get some free service. I can afford to give
the customer more than they paid for. I don't want to ever
give the customer less than they paid for.

>
> *****
>>
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Peter Olcott on 7 Apr 2010 11:32

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:uTR0pyf1KHA.3744(a)TK2MSFTNGP04.phx.gbl...
> Peter Olcott wrote:
>

> Plus what happens if the machine crashes? You lost what
> hasn't been flush. The different is you lower any issues
> at the expense at some speed and open/close overhead which
> will be very minor for you.

The design assumes that every write and every append is
immediately flushed. The transaction file is my FIFO queue.
It is the one piece that every other piece measures its
reliability and performance against.

>> It would be simpler to bypass the need of this and simply
>> delegate writing the transaction log file to a single
>> thread.
>
>
> It helps to have a single point I/O controller, but how
> are you planning to use this thread? How will you talk to
> it? IOW, now you really need to make sure you have
> synchronization.

The web server is designed with one thread per HTTP request.
I may have as many 1,000 concurrent HTTP requests. I am
thinking that each of these threads could append to a single
file with no conflict (the OS sequencing these operations)
as long as the append is immediately flushed or buffering is
turned off.

The OCR process(s) would be notified of a new request using
some sort of IPC (named pipes for now) that also tells it
the byte offset in the transaction log file to find the
transaction details. Each transaction will have three
states:
(a) Available (Init by web server)
(b) Pending (updated by OCR process)
(c) Completed (updated by OCR process)

I am not sure how the OCR process would notify the
appropriate thread within the web server process of the
[Completed] event, but, it would use some sort of IPC. I
would guess that it would notify the web server process, and
let the web server process notify its own thread. This would
probably require that the ThreadID be passed to the OCR (in
the transaction log file) so it can pass it back to the web
server.

>> There are several different types of IPC, I chose the
>> named pipe because it is inherently a FIFO queue.
>
>
> So it every other IPC concept. For your need, named pipes
> is more complex and can be unreliable and very touchy if
> you don't do it right. I mean, your I/O needs to be 100%
> precise and that can't be done in less than 20-30 lines of
> code, and for what you need, 3-4 lines code is sufficient.
>
> Unless you get Named Pipe class that will do all the work,
> error checking, like error 5/32 sharing violation timings,
> etc, exceptions, proper full duplex communications, you
> can certainly run into a ugly mess. I don't recommend it
> for you. You don't need it.

This process has to be event driven rather the a polled
interface so I must have some sort of IPC. None of the
Unix/Linux people are bringing up any issues with named
pipes. Perhaps my design is simple enough to make many of
these issues moot. Two half duplex pipes instead of one full
duplex pipe, thus forming two very simple FIFO queues.

> What about your HTTP request and response model? Does the
> above incorporate a store and forward concept? Meaning,
> don't forgot that you have a RESPONSE to provide. You
> just can't ignore it. You have to at least respond with:
>
> "This will take a long time, we will email you when
> done."
>

The Response model is outlined above.

> Ok, go get yourself a Btree ISAM database library! That
> will giving your fast database needs with all the controls
> you want. You see you did non-SQL database work for 10
> years. Then you should not have a problem here.
>
> BTW, a btree/isam database system is what we use in our
> high end multi-threaded RPC server but its coupled with
> memory mapping technology to speed up large file I/O
> process. The speed is 2nd to

I will have to look into this memory mapped file thing. If
it can provide faster file reads, then my new design could
use it.

> none. The only thing that is been done now is to make it
> 64 bit I/O, not 64 bit compiled but 64 bit read/write.
>
> --
> HLS

From: Pete Delgado on 7 Apr 2010 12:12

"Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote in message
news:L9qdndbIjeoeOCHWnZ2dnUVZ_tKdnZ2d(a)giganews.com...
>
> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
> news:uTR0pyf1KHA.3744(a)TK2MSFTNGP04.phx.gbl...
>> Peter Olcott wrote:
>>
>
>> Plus what happens if the machine crashes? You lost what hasn't been
>> flush. The different is you lower any issues at the expense at some
>> speed and open/close overhead which will be very minor for you.
>
> The design assumes that every write and every append is immediately
> flushed. The transaction file is my FIFO queue. It is the one piece that
> every other piece measures its reliability and performance against.

*Which* design???? You have proposed so many designs and tossed around so
many things that I wonder if even *you* know what your design is!

-Pete

From: Peter Olcott on 7 Apr 2010 13:02

"Pete Delgado" <Peter.Delgado(a)NoSpam.com> wrote in message
news:Ooj3x0m1KHA.4832(a)TK2MSFTNGP04.phx.gbl...
>
> "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote in message
> news:L9qdndbIjeoeOCHWnZ2dnUVZ_tKdnZ2d(a)giganews.com...
>>
>> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in
>> message news:uTR0pyf1KHA.3744(a)TK2MSFTNGP04.phx.gbl...
>>> Peter Olcott wrote:
>>>
>>
>>> Plus what happens if the machine crashes? You lost what
>>> hasn't been flush. The different is you lower any
>>> issues at the expense at some speed and open/close
>>> overhead which will be very minor for you.
>>
>> The design assumes that every write and every append is
>> immediately flushed. The transaction file is my FIFO
>> queue. It is the one piece that every other piece
>> measures its reliability and performance against.
>
> *Which* design???? You have proposed so many designs and
> tossed around so many things that I wonder if even *you*
> know what your design is!
>
> -Pete
>

I have not changed this aspect of the design since it was
initially proposed.
Here is the current design:

(1) A transaction log file forms the FIFO queue between
multiple threads (one per HTTP request) of the web server
and at least one OCR process.

(2) Some form of IPC (probably Unix/Linux named pipes)
informs the OCR process of an HTTP request than needs to be
serviced, it sends the offset within the transaction file.

(3) The OCR process uses the offset within the transaction
file to get the transaction details, and updates the
transaction flag from [Available] to [Pending].

(4) When the OCR process is done with processing it informs
the web server, in another FIFO using IPC (such as a
Unix/Linux named pipe) by passing the offset of the
transaction file.

(5) The web server reads the Thread-ID from this offset of
the transaction file and informs the thread so that this
thread can provide the HTTP response.

I was originally having the OCR process update the
transaction flag from [Pending] to [Complete], but it might
make more sense for the thread that receives the HTTP
acknowledgement of the HTTP response to do this.

From: Hector Santos on 7 Apr 2010 15:20

Peter Olcott wrote:

>> Unless you get Named Pipe class that will do all the work,
>> error checking, like error 5/32 sharing violation timings,
>> etc, exceptions, proper full duplex communications, you
>> can certainly run into a ugly mess. I don't recommend it
>> for you. You don't need it.
>
> This process has to be event driven rather the a polled
> interface so I must have some sort of IPC. None of the
> Unix/Linux people are bringing up any issues with named
> pipes. Perhaps my design is simple enough to make many of
> these issues moot. Two half duplex pipes instead of one full
> duplex pipe, thus forming two very simple FIFO queues.

Or they really didn't want to tell you the bad news or waste time
telling you all the "gotchas." You will have a very, more than
necessary, complexed design with pipes and the odds are high you will
has misfired, blocks that don't return, etc.

Remember, this is your bottleneck:

Many Web Threads ---> 1 FIFO/OCR Thread

I'm not saying it can't be done, but you will waste more time trying
to get that right when it didn't call for it. On the one hand, you are
putting such high constraints on so many other things, that this head
strong focus on named pipes will be a weak point. See below on your
overloading.

>> What about your HTTP request and response model? Does the
>> above incorporate a store and forward concept? Meaning,
>> don't forgot that you have a RESPONSE to provide. You
>> just can't ignore it. You have to at least respond with:
>>
>> "This will take a long time, we will email you when done."
>>
>
> The Response model is outlined above.

At some point, your model MUST turns into a store and forward concept
otherwise it will break down. This is based on your stated boundary
conditions:

1 fifo/ocr thread - 100 ms turn around time.

That means you can only handle 10 request per second.

But you also stated

100 request per second.

So therefore, you need at least 10 fifo/ocr thread handlers to handle
the load, otherwise the bucket will be filled pretty darn fast.

Again remember, this is your bottleneck:

Many Web Threads ---> 1 FIFO/OCR Thread

No matter how you configure it, 10 threads in 1 process, 10 processes
on 1 machine or across machines, you need at least 10 handlers to
handle the 100 TPS with 100 ms transaction times.

Once it can not handle the dynamic response model, it becomes a store
and forward response model.

You can do what you want, but your expectations for high throughput
are unaligned with your proposed implementation method. Maybe if you
said,

"I expect 10 request per second PER OCR station"

than at least you are being more realistic with your 1 fifo/ocr thinking.

--
HLS

First | Prev | Next | Last
Pages: 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system