Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Hector Santos on 22 Mar 2010 12:06

Peter Olcott wrote:

> Since my process (currently) requires unpredictable access
> to far more memory than can fit into the largest cache, I
> see no possible way that adding 1000-fold slower disk access
> could possibly speed things up. This seems absurd to me.

And I would agree it would be seem to be absurd to inexperience people.

But you need to TRUST the power of your multi-processor computer
because YOU are most definitely under utilizing it by a long shot.

The code I posted is the proof!

Your issue is akin to having a pickup truck, overloading the back,
piling things on each other, overweight beyond the recommended safety
levels per specifications of the car manufacturer (and city/state
ordinances), and now your driving, speed, vision of your truck are all
altered. Your truck won't go as fast now and if even if you could,
things can fall, people can die, crashes can happen.

You have two choices:

- You can stop and unload stuff and come back and pick it up on
2nd strip, your total travel time doubled.

- you can get a 2nd pick up truck, split the load and get
on a four lanes highway and drive side by side, sometimes
one creeps ahead, and the other moves ahead, and both reach
the destination at the near same expected time.

Same thing!

You are overloading your machine to the point it working very very
hard to satisfy your single thread process needs. You may "believe"
it is working at optimal speeds because it has uninterrupted exclusive
access but it is not reality. You are under utilizing the power of
your machine.

Whether you realize it or not, the overloaded pickup truck is smart
and is stopping you every X milliseconds checking if you have a 2nd
pickup truck to offload some work and do some moving for you!!

You need to change your thinking.

However, at this point, I don't think you have any coding skills,
because if you did, you would be EAGERLY JUMPING at the code I
provided to see for yourself.

--
HLS

From: Hector Santos on 22 Mar 2010 12:20

Peter Olcott wrote:

>> If you can see that in the code, then quite honestly, you
>> don't know how to program or understand the concept of
>> programming.

> I am telling you the truth, I am almost compulsive about
> telling the truth. When the conclusions are final I will post a link here.

What GROUP is this? No one will trust your SUMMARY unless you cite
the group. Until you do so, you're lying and making things up.

I repeat: If you can't see the code I posted proves your thinking is
incorrect, you don't know what you are talking about and its becoming
obvious now you don't have any kind of programming or even engineering
skills.

--
HLS

From: Peter Olcott on 22 Mar 2010 13:15

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:uGVpjmdyKHA.5948(a)TK2MSFTNGP06.phx.gbl...
> Peter Olcott wrote:
>
>> Since my process (currently) requires unpredictable
>> access to far more memory than can fit into the largest
>> cache, I see no possible way that adding 1000-fold slower
>> disk access could possibly speed things up. This seems
>> absurd to me.
>
>
> And I would agree it would be seem to be absurd to
> inexperience people.
>
> But you need to TRUST the power of your multi-processor
> computer because YOU are most definitely under utilizing
> it by a long shot.
>
> The code I posted is the proof!

If it requires essentially nothing besides random access to
entirely different places of 100 MB of memory, thenn (then
and only then) would it be reasonably representative of my
process. Nearly all the my process does is look up in memory
the next place to look up in memory.

>
> Your issue is akin to having a pickup truck, overloading
> the back, piling things on each other, overweight beyond
> the recommended safety levels per specifications of the
> car manufacturer (and city/state ordinances), and now your
> driving, speed, vision of your truck are all altered.
> Your truck won't go as fast now and if even if you could,
> things can fall, people can die, crashes can happen.
>
> You have two choices:
>
> - You can stop and unload stuff and come back and pick
> it up on
> 2nd strip, your total travel time doubled.
>
> - you can get a 2nd pick up truck, split the load and
> get
> on a four lanes highway and drive side by side,
> sometimes
> one creeps ahead, and the other moves ahead, and
> both reach
> the destination at the near same expected time.
>
> Same thing!
>
> You are overloading your machine to the point it working
> very very hard to satisfy your single thread process
> needs. You may "believe" it is working at optimal speeds
> because it has uninterrupted exclusive access but it is
> not reality. You are under utilizing the power of your
> machine.
>
> Whether you realize it or not, the overloaded pickup truck
> is smart and is stopping you every X milliseconds checking
> if you have a 2nd pickup truck to offload some work and do
> some moving for you!!
>
> You need to change your thinking.
>
> However, at this point, I don't think you have any coding
> skills, because if you did, you would be EAGERLY JUMPING
> at the code I provided to see for yourself.
>
> --
> HLS

From: Peter Olcott on 22 Mar 2010 13:21

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:%23o9WRudyKHA.3304(a)TK2MSFTNGP06.phx.gbl...
> Peter Olcott wrote:
>
>
>>> If you can see that in the code, then quite honestly,
>>> you don't know how to program or understand the concept
>>> of programming.
>
>
>> I am telling you the truth, I am almost compulsive about
>> telling the truth. When the conclusions are final I will
>> post a link here.
>
>
> What GROUP is this? No one will trust your SUMMARY unless
> you cite the group. Until you do so, you're lying and
> making things up.
>
> I repeat: If you can't see the code I posted proves your
> thinking is incorrect, you don't know what you are talking
> about and its becoming obvious now you don't have any kind
> of programming or even engineering skills.
>
> --
> HLS

I did not examine the code because I did not want to spend
time looking at something that is not representative of my
process. Looks at the criteria on my other post, and if you
agree that it meets this criteria, then I will look at your
code.

You keep bringing up memory mapped files. Although this may
very well be a very good way to use disk as RAM, or to load
RAM from disk, I do not see any possible reasoning that
could every possibly show that a hybrid combination of disk
and RAM could ever exceed the speed of pure RAM alone.

If you can then please show me the reasoning that supports
this. Reasoning is the ONLY source of truth that I trust,
all other sources of truth are subject to errors. Reasoning
is also subject to errors, but, these errors can be readily
discerned as breaking one or more of the rules of correct
reasoning.

From: Hector Santos on 22 Mar 2010 13:27

On Mar 22, 11:02 am, "Peter Olcott" <NoS...(a)OCR4Screen.com> wrote:

> (2) When a process requires essentially random (mostly
> unpredictable) access to far more memory than can possibly
> fit into the largest cache, then actual memory access time
> becomes a much more significant factor in determining actual
> response time.

As a follow up, in the simulator ProcessData() function:

void ProcessData()
{
KIND num;
for(DWORD r = 0; r < nRepeat; r++) {
Sleep(1);
for (DWORD i=0; i < size; i++) {
//num = data[i]; // array
num = fmdata[i]; // file mapping array view
}
}
}

This is a serialize access to the data. Its not random. When you have
multi-threads, you approach a empirical boundary condition where
multiple accessors are requesting the same memory. So in one hand,
the peter viewpoint, you have contention issue hence slow downs. On
the other hand, the you have a CACHING effect, where the reading done
by one thread benefits all others.

Now, we can alter this ProcessData() by adding a random access logic:

void ProcessData()
{
KIND num;
for(DWORD r = 0; r < nRepeat; r++) {
Sleep(1);
for (DWORD i=0; i < size; i++) {
DWORD j = (rand() % size);
//num = data[j]; // array
num = fmdata[j]; // file mapping array view
}
}
}

One would suspect higher pressures to move virtual memory into the
process working set in random fashion. But in reality, that
randomness may not be as over pressuring as you expect.

Lets test this randomness.

First a test with serialized access with two thread using a 1.5GB file
map.

V:\wc5beta>testpeter3t /r:2 /s:3000000 /t:2
- size : 3000000
- memory : 1536000000 (1500000K)
- repeat : 2
- Memory Load : 22%
- Allocating Data .... 0
* Starting threads
- Creating thread 0
- Creating thread 1
* Resuming threads
- Resuming thread# 0 in 743 msecs.
- Resuming thread# 1 in 868 msecs.
* Wait For Thread Completion
- Memory Load: 95%
* Done
---------------------------------------
0 | Time: 5734 | Elapsed: 0
1 | Time: 4906 | Elapsed: 0
---------------------------------------
Total Time: 10640

Notice the MEMORY LOAD climbed to 95%, thats because the entire
spectrum of the data was read in.

Now lets try unpredictable random access. I added a /j switch to
enable the random indexing.

V:\wc5beta>testpeter3t /r:2 /s:3000000 /t:2 /j
- size : 3000000
- memory : 1536000000 (1500000K)
- repeat : 2
- Memory Load : 22%
- Allocating Data .... 0
* Starting threads
- Creating thread 0
- Creating thread 1
* Resuming threads
- Resuming thread# 0 in 116 msecs.
- Resuming thread# 1 in 522 msecs.
* Wait For Thread Completion
- Memory Load: 23%
* Done
---------------------------------------
0 | Time: 4250 | Elapsed: 0
1 | Time: 4078 | Elapsed: 0
---------------------------------------
Total Time: 8328

BEHOLD, it is even faster because of the randomness. The memory
loading didn't climb because it didn't need to virtually load the
entire 1.5GB into the process working set.

So once again, your engineering (and lack thereof) philosophy is
completely off base. You are under utilizing the power of your
machine.

--
HLS

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system