Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Peter Olcott on 24 Mar 2010 11:09

"Oliver Regenfelder" <oliver.regenfelder(a)gmx.at> wrote in
message news:c9efd$4ba9f330$547743c7$24260(a)news.inode.at...
> Hello Peter,
>
> Peter Olcott wrote:
>> If I was going to do it for real it would be one large
>> single std::vector of std::vectors shared across multiple
>> threads. I really don't want to deal with Virtual Memory
>> issues at all. I want to do everything that I reasonably
>> can to make them all moot. This may eventually require a
>> real-time OS.
>
> So you are using a std::vector of std::vectors inside your
> application?
>
> Somehting like:
>
> std::vector<std::vector<int> >
>
> Best Regards,
>
> Oliver

Yes this part of the new design.

From: Joseph M. Newcomer on 24 Mar 2010 23:55

See below...
On Tue, 23 Mar 2010 09:16:34 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>news:u8xPnamyKHA.2552(a)TK2MSFTNGP04.phx.gbl...
>> Hector Santos wrote:
>>
>>> PS: I noticed the rand() % size is too short, rand() is
>>> limited to RAND_MAX which is 32K. Change that to:
>>>
>>> (rand()*rand()) % size
>>>
>>> to get random range from 0 to size-1. I think thats
>>> right, maybe Joe can give us a good random generator
>>> here, but the above does seem to provide a practical
>>> decent randomness for this task.
>>
>> Peter, using the above RNG seems to be a better test since
>> it hits a wider spectrum. With the earlier one, it was
>> only hitting ranges upto 32K.
>>
>> I also notice when the 32K RNG was used, a dynamic array
>> was 1 to 6 faster than using std::vector. But when using
>> the above RNG, they were both about the same. That is
>> interesting.
>>
>> --
>> HLS
>
>I made this adaptation and it slowed down by about 500%, a
>much smaller cache hit ratio. It still scaled up to four
>cores with 1.5 GB each, and four concurrent processes only
>took about 50% more than a single process.
>
****
If you use multiple threads in a single process, or multiple processes with a shared
memory segment implemented by a Memory Mapped File, you will reduce the footprint since
either all threads (in one process) share the same memory footprint, or the multiple
processes share largely the same memory footprint.
****
>I will probably engineer my new technology to be able to
>handle multiple threads, if all that I have to do is
>implement the heuristics that I mentioned. Since my first
>server will only have a single core, on this server it will
>only have a single thread.
>
>I still think that the FIFO queue is a good idea. Now I will
>have multiple requests and on multi-core machines multiple
>servers.
>
>What is your best suggestion for how I can implement the
>FIFO queue?
>(1) I want it to be very fast
>(2) I want it to be portable across Unix / Linux / Windows,
>and maybe even Mac OS X
>(3) I want it to be as robust and fault tolerant as
>possible.
>
>It may simply provide 32-bit hexadecimal integer names of
>input PNG files. These names roll over to the next
>incremental number. Instead they may be HTTP connection
>numbers. I have to learn about HTTP before I will know what
>I am doing here. Since all customers (including free trial
>customers) will have accounts with valid email address, I
>can always email the results if the connection is lost.
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 24 Mar 2010 23:56

A multithreaded FIFO queue would make more sense; less chance of priority inversion
effects. An I/O Completion Port makes a wonderful multithreaded FIFO queue with
practically no effort!
joe

On Tue, 23 Mar 2010 14:28:56 -0400, Hector Santos <sant9442(a)nospam.gmail.com> wrote:

>Peter Olcott wrote:
>
>> I still think that the FIFO queue is a good idea. Now I will
>> have multiple requests and on multi-core machines multiple
>> servers.
>
>
>IMO, it just that its an odd approach to load balancing. You are
>integrating software components, like a web server with an
>multi-thread ready listening server and you are hampering it with a
>single thread only FIFO queuing. It introduces other design
>considerations. Namely, you will need to consider a store and forward
>concept for your request and delayed responses. But if your request
>processing is very fast, maybe you don't need to worry about it.
>
>In practice the "FIFO" would be at the socket level or listening level
>with concepts dealing with load balancing by restricting and balancing
>your connection with worker pools or simply letting it to wait knowing
>that processing won't be too long. Some servers have guidelines for
>waiting limits. For the WEB, I am not recall coming across any
>specific guideline other than a practical one per implementation. The
>point is you don't want the customers waiting too long - but what is
>"too long."
>
>> What is your best suggestion for how I can implement the
>> FIFO queue?
>> (1) I want it to be very fast
>> (2) I want it to be portable across Unix / Linux / Windows,
>> and maybe even Mac OS X
>> (3) I want it to be as robust and fault tolerant as
>> possible.
>
>
>Any good collection class will do as long as you wrap it with
>synchronization. Example:
>
>
>typedef struct _tagTSlaveData {
> ... data per request.........
>} TSlaveData;
>
>class CBucket : public std::list<TSlaveData>
>{
>public:
> CBucket() { InitializeCriticalSection(&cs); }
> ~CBucket() { DeleteCriticalSection(&cs); }
>
> void Add( const TSlaveData &o )
> {
> EnterCriticalSection(&cs);
> insert(end(), o );
> LeaveCriticalSection(&cs);
> }
>
> BOOL Fetch(TSlaveData &o)
> {
> EnterCriticalSection(&cs);
> BOOL res = !empty();
> if (res) {
> o = front();
> pop_front();
> }
> LeaveCriticalSection(&cs);
> return res;
> }
>private:
> CRITICAL_SECTION cs;
>} Bucket;
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 24 Mar 2010 23:59

See below...

On Tue, 23 Mar 2010 14:11:28 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>Ah so this is the code that you were suggesting?
>I won't be able to implement multi-threading until volume
>grows out of what a single core processor can accomplish.
>I was simply going to use MySQL for the inter-process
>communication, building and maintaining my FIFO queue.
****
Well, I can think of worse ways. For example, writing the data to a floppy disk. Or
punching it to paper tape and asking the user to re-insert the paper tape. MySQL for
interprocess communication? Get serious!
****
>
>One thing else that you may be unaware of std::vector
>generally beats std::list even for list based algorithms,
>including such things as inserting in the middle of the
>list. The reason for this may be that the expensive memory
>allocation cost is allocated over more elements with a
>std::vector, more than enough to pay for the cost of
>reshuffling a few items. This would probably not work for
>very long lists. Also, maybe there is some sort of
>std::list::reserve(), that would mitigate this cost.
****
Actually, the vector/list tradeoff has been known since the late 1970s; look at the LISP
machine papers from that era. Interesting that the old ideas keep getting rediscovered
over and over again (doesn't anybody pay attention to history?)
joe
****
>
>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>news:O5O%23XiryKHA.5936(a)TK2MSFTNGP04.phx.gbl...
>> Example usage of the class below, I added an Add()
>> override to make it easier to add elements for the
>> specific TSlaveData fields:
>>
>> #include <windows.h>
>> #include <conio.h>
>> #include <list>
>> #include <string>
>> #include <iostream>
>>
>> using namespace std;
>>
>> const DWORD MAX_JOBS = 10;
>>
>> typedef struct _tagTSlaveData {
>> DWORD jid; // job number
>> char szUser[256];
>> char szPwd[256];
>> char szHost[256];
>> } TSlaveData;
>>
>> class CBucket : public std::list<TSlaveData>
>> {
>> public:
>> CBucket() { InitializeCriticalSection(&cs); }
>> ~CBucket() { DeleteCriticalSection(&cs); }
>>
>> void Add( const TSlaveData &o )
>> {
>> EnterCriticalSection(&cs);
>> insert(end(), o );
>> LeaveCriticalSection(&cs);
>> }
>>
>> void Add(const DWORD jid,
>> const char *user,
>> const char *pwd,
>> const char *host)
>> {
>> TSlaveData sd = {0};
>> sd.jid = jid;
>> strncpy(sd.szUser,user,sizeof(sd.szUser));
>> strncpy(sd.szPwd,pwd,sizeof(sd.szPwd));
>> strncpy(sd.szHost,host,sizeof(sd.szHost));
>> Add(sd);
>> }
>>
>> BOOL Fetch(TSlaveData &o)
>> {
>> EnterCriticalSection(&cs);
>> BOOL res = !empty();
>> if (res) {
>> o = front();
>> pop_front();
>> }
>> LeaveCriticalSection(&cs);
>> return res;
>> }
>> private:
>> CRITICAL_SECTION cs;
>> } Bucket;
>>
>>
>> void FillBucket()
>> {
>> for (int i = 0; i < MAX_JOBS; i++)
>> {
>> Bucket.Add(i,"user","password", "host");
>> }
>> }
>>
>> //----------------------------------------------------------------
>> // Main Thread
>> //----------------------------------------------------------------
>>
>> int main(char argc, char *argv[])
>> {
>>
>> FillBucket();
>> printf("Bucket Size: %d\n",Bucket.size());
>> TSlaveData o = {0};
>> while (Bucket.Fetch(o)) {
>> printf("%3d | %s\n",o.jid, o.szUser);
>> }
>> return 0;
>> }
>>
>> Your mongoose, OCR thingie, mongoose will Bucket.Add() and
>> each spawned OCR thread will do a Bucket.Fetch().
>>
>> Do it right, it and ROCKS!
>>
>> --
>> HLS
>>
>> Hector Santos wrote:
>>
>>> Peter Olcott wrote:
>>>
>>>> I still think that the FIFO queue is a good idea. Now I
>>>> will have multiple requests and on multi-core machines
>>>> multiple servers.
>>>
>>>
>>> IMO, it just that its an odd approach to load balancing.
>>> You are integrating software components, like a web
>>> server with an multi-thread ready listening server and
>>> you are hampering it with a single thread only FIFO
>>> queuing. It introduces other design considerations.
>>> Namely, you will need to consider a store and forward
>>> concept for your request and delayed responses. But if
>>> your request processing is very fast, maybe you don't
>>> need to worry about it.
>>>
>>> In practice the "FIFO" would be at the socket level or
>>> listening level with concepts dealing with load balancing
>>> by restricting and balancing your connection with worker
>>> pools or simply letting it to wait knowing that
>>> processing won't be too long. Some servers have
>>> guidelines for waiting limits. For the WEB, I am not
>>> recall coming across any specific guideline other than a
>>> practical one per implementation. The point is you don't
>>> want the customers waiting too long - but what is "too
>>> long."
>>>
>>>> What is your best suggestion for how I can implement the
>>>> FIFO queue?
>>>> (1) I want it to be very fast
>>>> (2) I want it to be portable across Unix / Linux /
>>>> Windows, and maybe even Mac OS X
>>>> (3) I want it to be as robust and fault tolerant as
>>>> possible.
>>>
>>>
>>> Any good collection class will do as long as you wrap it
>>> with synchronization. Example:
>>>
>>>
>>> typedef struct _tagTSlaveData {
>>> ... data per request.........
>>> } TSlaveData;
>>>
>>> class CBucket : public std::list<TSlaveData>
>>> {
>>> public:
>>> CBucket() { InitializeCriticalSection(&cs); }
>>> ~CBucket() { DeleteCriticalSection(&cs); }
>>>
>>> void Add( const TSlaveData &o )
>>> {
>>> EnterCriticalSection(&cs);
>>> insert(end(), o );
>>> LeaveCriticalSection(&cs);
>>> }
>>>
>>> BOOL Fetch(TSlaveData &o)
>>> {
>>> EnterCriticalSection(&cs);
>>> BOOL res = !empty();
>>> if (res) {
>>> o = front();
>>> pop_front();
>>> }
>>> LeaveCriticalSection(&cs);
>>> return res;
>>> }
>>> private:
>>> CRITICAL_SECTION cs;
>>> } Bucket;
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> HLS
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 25 Mar 2010 00:00

An IOCP with a multithreaded application will do much better than what you are proposing.
joe

On Tue, 23 Mar 2010 13:54:30 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>news:epsZzaryKHA.5360(a)TK2MSFTNGP06.phx.gbl...
>> Peter Olcott wrote:
>>
>>> I still think that the FIFO queue is a good idea. Now I
>>> will have multiple requests and on multi-core machines
>>> multiple servers.
>>
>>
>> IMO, it just that its an odd approach to load balancing.
>> You are integrating software components, like a web server
>> with an multi-thread ready listening server and you are
>> hampering it with a single thread only FIFO queuing. It
>> introduces other design considerations. Namely, you will
>> need to consider a store and forward concept for your
>> request and delayed responses. But if your request
>> processing is very fast, maybe you don't need to worry
>> about it.
>>
>
>I will probably be implementing the FIFO queue using MySQL.
>I want customers to have a strict first in first out,
>priority the only thing that will change this is that
>multiple servers are now feasible. If it costs me an extra
>1/2 % of amortized response rate, then this is OK.
>
>Because multiple servers are now available, I will drop the
>idea of more than one queue priority. I know that database
>access is slow, but this will be amortized over much longer
>processing time. This solution should be simple and
>portable. Also the database probably has a lot of caching
>going on so it probably won't even cost me the typical disk
>access time of 5 ms.
>
>> In practice the "FIFO" would be at the socket level or
>> listening level with concepts dealing with load balancing
>> by restricting and balancing your connection with worker
>> pools or simply letting it to wait knowing that processing
>> won't be too long. Some servers have guidelines for
>> waiting limits. For the WEB, I am not recall coming
>> across any specific guideline other than a practical one
>> per implementation. The point is you don't want the
>> customers waiting too long - but what is "too long."
>
>This sounds like a more efficient approach, but, it may lack
>portability, and it may take longer to get operational. I
>will already have some sort of SQL learning curve to
>implement my authentication database. This sounds like an
>extra leaning curve over and above everything else. There
>are only so many 1000 page books that I can read in a finite
>amount of time.
>
>>
>>> What is your best suggestion for how I can implement the
>>> FIFO queue?
>>> (1) I want it to be very fast
>>> (2) I want it to be portable across Unix / Linux /
>>> Windows, and maybe even Mac OS X
>>> (3) I want it to be as robust and fault tolerant as
>>> possible.
>>
>>
>> Any good collection class will do as long as you wrap it
>> with synchronization. Example:
>>
>>
>> typedef struct _tagTSlaveData {
>> ... data per request.........
>> } TSlaveData;
>>
>> class CBucket : public std::list<TSlaveData>
>> {
>> public:
>> CBucket() { InitializeCriticalSection(&cs); }
>> ~CBucket() { DeleteCriticalSection(&cs); }
>>
>> void Add( const TSlaveData &o )
>> {
>> EnterCriticalSection(&cs);
>> insert(end(), o );
>> LeaveCriticalSection(&cs);
>> }
>>
>> BOOL Fetch(TSlaveData &o)
>> {
>> EnterCriticalSection(&cs);
>> BOOL res = !empty();
>> if (res) {
>> o = front();
>> pop_front();
>> }
>> LeaveCriticalSection(&cs);
>> return res;
>> }
>> private:
>> CRITICAL_SECTION cs;
>> } Bucket;
>>
>>
>>
>>
>> --
>> HLS
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

First | Prev | Next | Last
Pages: 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system