Prev: Runtime Error does not specify program???
Next: windows 7 - imitating touch by sending wm_gesture ?
From: Hector Santos on 20 Mar 2010 16:53 Geoff wrote: > On Sat, 20 Mar 2010 09:52:33 -0500, "Peter Olcott" > <NoSpam(a)OCR4Screen.com> wrote: > >> Maximum total processing time is 1/10 second for a whole >> page of text. My initial implementation (for testing >> purposes) may simply refuse larger requests. The final >> implementation will place large requests in a separate lower >> priority queue. >> > > Your "memory bandwidth intensive" requirement is the bottleneck to > multithreading or multiprocessing. If your big memory chunk is > read-only, your problem with the DFA is that it lacks locality of > reference to that data. You end up hitting the RAM instead of being > able to utilize the data in the CPU caches. Multiple threads end up > contending with each other for access to RAM memory, hence the > slowdown. Compute-intensive applications benefit from multi-threading > by being able to stay off the RAM bus and utilize the caches in each > core. Threads will benefit by reducing its context switching. The point in all this is what we are taking Pete's poor engineering and WINTEL understanding and software design for his DFA as limits to under utilize the power of a WINTEL QUAD 8MB Windows 7 machine. In other words, he really doesn't know what his boundary conditions are and until he has tried to use memory mapped files for his read-only, mind you, not write (minimize contention you can get) font library of files, I am not convinced it is a single process FIFO queue processing only standalone application. This is a simple engineering problem with simple solution. He just hasn't realize it. Even then, degradation does not have to be linear as he suggest with each process started. The load requirements per thread would be much different than it is per process which is what only sees now. Thread Sharing the data would prove to be highly efficient memory wise especially under a multi-cpu machine. Single CPU? Context switching gets in the way. Under multi-cpu, you have less context switching. -- HLS
From: Hector Santos on 20 Mar 2010 16:59 Peter Olcott wrote: > > Geoff has explained this better than I have. And I don't agree with him - not iota. Until you redesign your software MEMORY USAGE, your current code is not optimize for your WINTEL box or for any web service worth its salt. You might as well get a DOS machine to reduce all Windows overhead especially graphical overhead. Recompile your code to use DMPI and you will be better off than what you have now. -- HLS
From: Peter Olcott on 20 Mar 2010 17:07 "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message news:%23Xe$Z9GyKHA.2644(a)TK2MSFTNGP04.phx.gbl... > Geoff wrote: > >> On Sat, 20 Mar 2010 09:52:33 -0500, "Peter Olcott" >> <NoSpam(a)OCR4Screen.com> wrote: >> >>> Maximum total processing time is 1/10 second for a whole >>> page of text. My initial implementation (for testing >>> purposes) may simply refuse larger requests. The final >>> implementation will place large requests in a separate >>> lower priority queue. >> >> Your "memory bandwidth intensive" requirement is the >> bottleneck to >> multithreading or multiprocessing. If your big memory >> chunk is >> read-only, your problem with the DFA is that it lacks >> locality of >> reference to that data. You end up hitting the RAM >> instead of being >> able to utilize the data in the CPU caches. Multiple >> threads end up >> contending with each other for access to RAM memory, >> hence the >> slowdown. Compute-intensive applications benefit from >> multi-threading >> by being able to stay off the RAM bus and utilize the >> caches in each >> core. > > > Threads will benefit by reducing its context switching. > You missed this part: Multiple threads end up contending with each other for access to RAM memory, hence the slowdown. If you only have X cycles of memory per second, and one process (or thread) uses up all X cycles, adding another process (or thread) can only slow things down, not speed them up. > The point in all this is what we are taking Pete's poor > engineering and WINTEL understanding and software design > for his DFA as limits to under utilize the power of a > WINTEL QUAD 8MB Windows 7 machine. > > In other words, he really doesn't know what his boundary > conditions are and until he has tried to use memory mapped > files for his read-only, mind you, not write (minimize > contention you can get) font library of files, I am not > convinced it is a single process FIFO queue processing > only standalone application. > > This is a simple engineering problem with simple solution. > He just hasn't realize it. > > Even then, degradation does not have to be linear as he > suggest with each process started. The load requirements > per thread would be much different than it is per process > which is what only sees now. Thread Sharing the data > would prove to be highly efficient memory wise especially > under a multi-cpu machine. Single CPU? Context switching > gets in the way. Under multi-cpu, you have less context > switching. > > > -- > HLS
From: Peter Olcott on 20 Mar 2010 17:09 "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message news:O%23cPxAHyKHA.984(a)TK2MSFTNGP05.phx.gbl... > Peter Olcott wrote: > >> >> Geoff has explained this better than I have. > > > And I don't agree with him - not iota. > Let's see where Joe weighs in on this. > Until you redesign your software MEMORY USAGE, your > current code is not optimize for your WINTEL box or for > any web service worth its salt. You might as well get a > DOS machine to reduce all Windows overhead especially > graphical overhead. Recompile your code to use DMPI and > you will be better off than what you have now. > > -- > HLS I won't be running on Wintel, I will be running on Linux Intel. I won't need any GUI.
From: Hector Santos on 20 Mar 2010 17:26
Peter Olcott wrote: >> Threads will benefit by reducing its context switching. >> > You missed this part: > Multiple threads end up contending with each other for > access to RAM memory, hence the slowdown. No, I didn't must this at all and its certainly not something YOU should worry about. You are MOST definitely OVER engineering this to an unreasonable restriction, that quite simply defies engineering logic. When you implement sharable atomic read only memory for multi-core threads, you don't have write contention, you will not be swapping here - it will be FASTER than single multiple processes running LOADING redundant DATA BLOCKS putting MORE pressure on the system to manage not 4GB, but 8GB of memory - OF COURSE, your system will degrade as you seen it by just starting two EXE copies! DUH!! And again, it is NOT going to degrade at any LINEAR rate. It is not a lock read concept where the other thread has to WAIT until the other thread finishes reading the memory. You are completely, absolutely, wrong about this, and again, where are talking about Scaling Efficiency - you can't measure that by just starting TWO EXE copies! That is completely wrong - not when each has to load 4GB - redundantly. > If you only have X cycles of memory per second, and one > process (or thread) uses up all X cycles, adding another > process (or thread) can only slow things down, not speed > them up. Again, you are thinking LINEAR degradation and that is SIMPLY not the case. -- HLS |