From: Hector Santos on 23 Mar 2010 01:54 Peter Olcott wrote: > "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message > news:etOekekyKHA.5036(a)TK2MSFTNGP02.phx.gbl... >> Hmmmmm, you mean two threads in one process? >> >> What is this: >> >> num = Data[num] >> >> Do you mean: >> >> num = Data[i]; > > No I mean it just like it is. I init all of memory with > random numbers and then access the memory location > references by these random numbers in a tight loop. This > attempts to force memory bandwidth use to its limit. Even > with four cores I do not reach the limit. Ok. > What are the heuristics for making a process thread safe? > (1) keep all data in locals to the best extent possible. > (2) Eliminate the need for global data that must be written > to if possible. > (3) Global data that must be read from is OK > (4) Only use thread safe libraries. > > I think If I can follow all those rules, then the much more > complex rules aren't even needed. > > Did I miss anything? Thats correct above. If the global data is READ ONLY and never changes you will be ok. You can write to GLOBAL data only if you synchronize the resource/object or whatever it is. You can use whats called Reader/Writer locks to do this very efficiently. You will probably have a need to pass/send results back to the calling thread or main thread, or maybe display something. Depending on what you need here, there are good solutions. Consider you the method I did with TThreadData as the thread proc parameter. That can be global to easily pass results from the thread. Maybe some other things off the top of my head: - Never try to use time to synchronize "things" or behavior. Use Kernel objects (Mutexes, etc). - Keep the local block data small. If it must be large, use the heap. - Make sure the thread does not do a lot of context switching where there is little work done. -- HLS
From: Hector Santos on 23 Mar 2010 03:40 Peter Olcott wrote: > "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message > news:etOekekyKHA.5036(a)TK2MSFTNGP02.phx.gbl... >> Hmmmmm, you mean two threads in one process? >> >> What is this: >> >> num = Data[num] >> >> Do you mean: >> >> num = Data[i]; > > No I mean it just like it is. I init all of memory with > random numbers and then access the memory location > references by these random numbers in a tight loop. This > attempts to force memory bandwidth use to its limit. Even > with four cores I do not reach the limit. ok, but I guess I don't see this: uint32 num; for (uint32 r = 0; r < repeat; r++) for (uint32 i = 0; i < size; i++) num = Data[num]; num is not initialized, but we can assume its zero to start. If you have 10 random number filled in: 2 1 0 5 0 9 0 0 4 5 then the iterations is: num num = Data[num] ----- --------------- 0 2 2 0 0 2 and so on. You are potentially only hitting two spots. The stress points come from having a large range and doing far jumps and back that are beyond the working set. Read MSDN information on GetProcessWorkingSetSize() and SetProcessWorkingSetSize(). Increasing the minimum will bring in more data from vm, however, the OS may not guarantee it. It works on a first come, first serve. Since you would not have another 2nd INSTANCE, you have any competition, so it might work very nicely for you. But this pressure is what the Memory Load % will show. If you serialize it from 0 to size, you will see that memory load percentage value grow. When it is random jumping, it will be lower because we are not demanding data from VM. You would be the judge of what better emulates your memory access, but for stress simulation, you need to have it access the entire range. The reality (when in production), you would not be as stress because you won't be this stressing, so if you fine tune with the stress, your program will ROCK! PS: I noticed the rand() % size is too short, rand() is limited to RAND_MAX which is 32K. Change that to: (rand()*rand()) % size to get random range from 0 to size-1. I think thats right, maybe Joe can give us a good random generator here, but the above does seem to provide a practical decent randomness for this task. -- HLS
From: Hector Santos on 23 Mar 2010 04:55 Hector Santos wrote: > PS: I noticed the rand() % size is too short, rand() is limited to > RAND_MAX which is 32K. Change that to: > > (rand()*rand()) % size > > to get random range from 0 to size-1. I think thats right, maybe Joe > can give us a good random generator here, but the above does seem to > provide a practical decent randomness for this task. Peter, using the above RNG seems to be a better test since it hits a wider spectrum. With the earlier one, it was only hitting ranges upto 32K. I also notice when the 32K RNG was used, a dynamic array was 1 to 6 faster than using std::vector. But when using the above RNG, they were both about the same. That is interesting. -- HLS
From: Peter Olcott on 23 Mar 2010 10:16 "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message news:u8xPnamyKHA.2552(a)TK2MSFTNGP04.phx.gbl... > Hector Santos wrote: > >> PS: I noticed the rand() % size is too short, rand() is >> limited to RAND_MAX which is 32K. Change that to: >> >> (rand()*rand()) % size >> >> to get random range from 0 to size-1. I think thats >> right, maybe Joe can give us a good random generator >> here, but the above does seem to provide a practical >> decent randomness for this task. > > Peter, using the above RNG seems to be a better test since > it hits a wider spectrum. With the earlier one, it was > only hitting ranges upto 32K. > > I also notice when the 32K RNG was used, a dynamic array > was 1 to 6 faster than using std::vector. But when using > the above RNG, they were both about the same. That is > interesting. > > -- > HLS I made this adaptation and it slowed down by about 500%, a much smaller cache hit ratio. It still scaled up to four cores with 1.5 GB each, and four concurrent processes only took about 50% more than a single process. I will probably engineer my new technology to be able to handle multiple threads, if all that I have to do is implement the heuristics that I mentioned. Since my first server will only have a single core, on this server it will only have a single thread. I still think that the FIFO queue is a good idea. Now I will have multiple requests and on multi-core machines multiple servers. What is your best suggestion for how I can implement the FIFO queue? (1) I want it to be very fast (2) I want it to be portable across Unix / Linux / Windows, and maybe even Mac OS X (3) I want it to be as robust and fault tolerant as possible. It may simply provide 32-bit hexadecimal integer names of input PNG files. These names roll over to the next incremental number. Instead they may be HTTP connection numbers. I have to learn about HTTP before I will know what I am doing here. Since all customers (including free trial customers) will have accounts with valid email address, I can always email the results if the connection is lost.
From: Pete Delgado on 23 Mar 2010 12:43
"Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote in message news:AeidnYxrl7T0vzXWnZ2dnUVZ_judnZ2d(a)giganews.com... > > I don't want to hear about memory mapped files because I don't want to > hear about optimizing virtual memory usage because I don't want to hear > about virtual memory until it is proven beyond all possible doubt that my > process does not (and can not be made to be) resident in actual RAM all > the time. From my understanding of your "test" (simply viewing the number of page faults reported by task manager) you can only conclude that there have not been any significant page faults since your application loaded the data, not that your application and data have remined in main memory. If you actually attempt to access all of your code and data and there are no page faults, I would be very surprised. In fact, knowing what I do about the cache management in Windows 7, I'm very surprised that you are not seeing any page faults at all unless you have disabled the caching service. > > Since a test showed that my process did remain in actual RAM for at least > twelve hours, No. That is not what your simple test showed unless your actual test differed significantly from what you expressed here. -Pete |