Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Hector Santos on 7 Apr 2010 19:59

Hector Santos wrote:

> Peter Olcott wrote:
>
>>> That means you can only handle 10 request per second.
>>
>> No it does not. 100 ms is the real-time limit, actual processing time
>> will average much less than this, about 10 ms.
>
>
> Now you are even more unrealistic. That means for a 100 TPS,
> you need now need 100 threads.

I misspoke here. If your unrealistic transaction time is 10 ms, then
your 1 OCR processor would be able to handle 10O TPS.

But 10 ms processing time is very unrealistic.

> But I want you to lookup the term Thread Quantum.

And please do look this up. Its very important Peter.

> In short, what you are claiming is that your complete a request and
> processing in 1 CPU cycle of context switching. A quantum is around ~15
> ms on multi-core/processors.
>
>>> No matter how you configure it, 10 threads in 1 process, 10 processes
>>> on 1 machine or across machines, you need at least 10 handlers to
>>> handle the 100 TPS with 100 ms transaction times.
>>
>> 10 ms transaction time
>
>
> Unrealistic. Dreaming.
>

To prove the point, here is a simple code to show it:

#include <windows.h>

void main(int

DWORD t1 = GetTickCount();
Sleep(1); // sleep 1 millisecond
DWORD t2 = GetTickCount();

You will see the t2-t1 is around ~15 ms. Its call a QUANTUM, your
sleeps is in factors of Quantums:

Sleep(16) --> 2 quantums or ~30 ms
Sleep(32) --> 3 quantums or ~45 ms

#include <windows.h>
#include <stdio.h>

void main(char argc, char *argv[])
{
DWORD t1 = GetTickCount();
Sleep(1); // sleep 1 millisecond
DWORD t2 = GetTickCount();
printf("Sleep Efficiency: %d\n",t2-t1);

t1 = GetTickCount();
Sleep(16);
t2 = GetTickCount();
printf("Sleep Efficiency: %d\n",t2-t1);

t1 = GetTickCount();
Sleep(32);
t2 = GetTickCount();
printf("Sleep Efficiency: %d\n",t2-t1);
}

What it means is that in a code that does not do any preemption on its
own (which will slow it down), just natural code, the CPU and OS will
preempt you every QUANTUM.

I sincerely doubt you can do your OCR processing in less than 1
QUANTUM yet alone 10 ms.

Now, here's the thing:

If indeed you can achieve processing in less than 1 quantum or even 2
quantums, then you really should not be worry about anything else
because your OCR system would be among the fast applications in the world!

--
HLS

From: Hector Santos on 7 Apr 2010 20:08

Peter Olcott wrote:

>> Now you can use ANY web server that supports CGI, use a
>> web server with an CMS in it so you can manage user
>> accounts, etc.
>
> I am going to use a web server that has source-code so I
> won't need any kind of CGI.

true, you don't have call CreateProcess("OCR.EXE") and worry about the
std I/O redirection overhead which is good, but your request would be
handle the same way, dynamically.

Even with the benefit of the doubt of 10 ms processing time (which is
unrealistic), you can do a true CGI very effectively (which mongoose
supports) and you now have a benchmark to see how much you can
dynamically handle.

Anyway, you got the ideas. Its up to you now to program it and learn
for yourself whats possible.

--
HLS

From: Joseph M. Newcomer on 7 Apr 2010 22:56

See below...
On Wed, 07 Apr 2010 19:38:16 -0400, Hector Santos <sant9442(a)nospam.gmail.com> wrote:

>Peter Olcott wrote:
>
>>> That means you can only handle 10 request per second.
>>
>> No it does not. 100 ms is the real-time limit, actual
>> processing time will average much less than this, about 10
>> ms.
>
>
>Now you are even more unrealistic. That means for a 100 TPS,
>you need now need 100 threads.
>
>But I want you to lookup the term Thread Quantum.
***
A scheduler quantum is 2 timer ticks, or 30ms. Buf if a thread completes in 10ms, it will
go to sleep waiting for the next request, and some other thread will immediately become
runnable. So while the nominal scheduling quantum is 30ms, you really could complete 3
10ms transactions in a single quantum, even if they are in different threads. Note also
that Vista+ redefined how the scheduler deals with partial quanta in the scheduler (see
Russinovich's article, but I don't have a citation to it; it was in the list of new
features in Vista article)
joe
***
>
>In short, what you are claiming is that your complete a request and
>processing in 1 CPU cycle of context switching. A quantum is around
>~15 ms on multi-core/processors.
>
>>> No matter how you configure it, 10 threads in 1 process,
>>> 10 processes on 1 machine or across machines, you need at
>>> least 10 handlers to handle the 100 TPS with 100 ms
>>> transaction times.
>>
>> 10 ms transaction time
>
>
>Unrealistic. Dreaming.
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 7 Apr 2010 23:03

See below...
On Wed, 07 Apr 2010 19:59:23 -0400, Hector Santos <sant9442(a)nospam.gmail.com> wrote:

>Hector Santos wrote:
>
>> Peter Olcott wrote:
>>
>>>> That means you can only handle 10 request per second.
>>>
>>> No it does not. 100 ms is the real-time limit, actual processing time
>>> will average much less than this, about 10 ms.
>>
>>
>> Now you are even more unrealistic. That means for a 100 TPS,
>> you need now need 100 threads.
>
>
>I misspoke here. If your unrealistic transaction time is 10 ms, then
>your 1 OCR processor would be able to handle 10O TPS.
>
>But 10 ms processing time is very unrealistic.
>
>> But I want you to lookup the term Thread Quantum.
>
>
>And please do look this up. Its very important Peter.
>
>> In short, what you are claiming is that your complete a request and
>> processing in 1 CPU cycle of context switching. A quantum is around ~15
>> ms on multi-core/processors.
>>
>>>> No matter how you configure it, 10 threads in 1 process, 10 processes
>>>> on 1 machine or across machines, you need at least 10 handlers to
>>>> handle the 100 TPS with 100 ms transaction times.
>>>
>>> 10 ms transaction time
>>
>>
>> Unrealistic. Dreaming.
>>
>
>
>To prove the point, here is a simple code to show it:
>
>#include <windows.h>
>
>void main(int
>
> DWORD t1 = GetTickCount();
> Sleep(1); // sleep 1 millisecond
***
Actually, this is EXACTLY the same as writing Sleep(15);
****
> DWORD t2 = GetTickCount();
****
This is meaningless, because it is subject to what is called "gating error". While
generally you will see the difference as being 15ms, it really deals with how the timer
tick count is updated relative to the scheduler. I do not believe this is defined.
****
>
>You will see the t2-t1 is around ~15 ms. Its call a QUANTUM, your
>sleeps is in factors of Quantums:
>
> Sleep(16) --> 2 quantums or ~30 ms
> Sleep(32) --> 3 quantums or ~45 ms
>
>#include <windows.h>
>#include <stdio.h>
>
>void main(char argc, char *argv[])
>{
> DWORD t1 = GetTickCount();
> Sleep(1); // sleep 1 millisecond
> DWORD t2 = GetTickCount();
> printf("Sleep Efficiency: %d\n",t2-t1);
>
> t1 = GetTickCount();
> Sleep(16);
> t2 = GetTickCount();
> printf("Sleep Efficiency: %d\n",t2-t1);
>
> t1 = GetTickCount();
> Sleep(32);
> t2 = GetTickCount();
> printf("Sleep Efficiency: %d\n",t2-t1);
>}
>
>What it means is that in a code that does not do any preemption on its
>own (which will slow it down), just natural code, the CPU and OS will
>preempt you every QUANTUM.
****
Not quite true. It is actually far more complex and subtle than this. For example, if
there is anothef compute-bound interactive thread running, you will get quite different
numbers. And if you create the following app

void main()
{
for(;;) {}
}

and run it at priority 15, you will get some really UGLY results when you run your above
example at normal priority.
****
>
>I sincerely doubt you can do your OCR processing in less than 1
>QUANTUM yet alone 10 ms.
>
>Now, here's the thing:
>
>If indeed you can achieve processing in less than 1 quantum or even 2
>quantums, then you really should not be worry about anything else
>because your OCR system would be among the fast applications in the world!
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Peter Olcott on 7 Apr 2010 23:56

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:OXQQutq1KHA.5880(a)TK2MSFTNGP05.phx.gbl...
> Peter Olcott wrote:
>
>>> That means you can only handle 10 request per second.
>>
>> No it does not. 100 ms is the real-time limit, actual
>> processing time will average much less than this, about
>> 10 ms.
>
>
> Now you are even more unrealistic. That means for a 100
> TPS,
> you need now need 100 threads.
>
> But I want you to lookup the term Thread Quantum.
>
> In short, what you are claiming is that your complete a
> request and processing in 1 CPU cycle of context
> switching. A quantum is around ~15 ms on
> multi-core/processors.
>
>>> No matter how you configure it, 10 threads in 1 process,
>>> 10 processes on 1 machine or across machines, you need
>>> at least 10 handlers to handle the 100 TPS with 100 ms
>>> transaction times.
>>
>> 10 ms transaction time
>
>
> Unrealistic. Dreaming.

I am only speaking about the OCR part of the process. It
currently processes 72,000 character glyphs per second, and
benchmark testing of the model of the new process indicates
that it will gain a ten-fold increase in speed. So if there
are 7,200 glyphs on a page (one unit of work) then it can
process this in 10 ms.

>
> --
> HLS

First | Prev | Next | Last
Pages: 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system