From: robin on 2 Jun 2010 09:20 "helvio" <helvio.vairinhos(a)googlemail.com> wrote in message news:0a10cc09-880c-4c42-9b9d-21a499ec18ff(a)y12g2000vbr.googlegroups.com... | Hi all, | | I have a huge F90 code composed by several modules, with several | module procedures each, and a main program. No external procedures are | used. I've come across this situation: some modules have very large | arrays declared in the global scope (with sizes of order ~50000 each), | but some of these arrays are only used conditionally. They might be | used elsewhere, but there's also the possibility that they might not. | The situation is the following, schematically: If U is used conditionally, ALLOCATABLE is fine. That's the kind of thing for which it is intended to be used. I notice that N is 50,000. Is that some maximum value, or is it just a value that is larger than anything you expect. For instance, could N be read in?
From: Richard Maine on 2 Jun 2010 11:35 helvio <helvio.vairinhos(a)googlemail.com> wrote: > In sum, I think my doubts reduce to the question of whether the > efficiency of accessing the physical memory depends on the size of the > allocated memory, or if it is independent of it. It should be independent of it, or anyway close enough that you won't be able to measure the difference. > > I also have the following related question: can the ALLOCATION / > DEALLOCATION statements slow down the program if they are called > multiple times, as compared with a single static declaration of "U"? Yes, definitely. Of course, as with many things, that's only going to matter if it is in an inner enough loop to be called lots of times. It would seem that some of the classic advice on performance optimization is in order. I don't feel like digging up the exact quotes; there are some pretty well known ones. But roughly... 1. Worry about making the code right before you worry about making it fast. 2. Improvements in algorithm are worth far more than code tweaks. 3. Even major improvements in code performance aren't going to matter unless they are in time-critical potions of the program in the first place. That one applies to your question above. Allocation and deallocation do take time, but if much computation happens between the allocation and deallocation, then their time usage is not likely to matter relative to the computation. 4. When you do get to trying to tweak code to improve performance, *MEASURE* the effects with your own code. Even experts can and regularly do get surprised and things do vary from code to code. That means you should not just accept performance judgements that people might give you here. Yes, "people" includes me. -- Richard Maine | Good judgment comes from experience; email: last name at domain . net | experience comes from bad judgment. domain: summertriangle | -- Mark Twain
From: helvio on 2 Jun 2010 12:14 On Jun 2, 4:35 pm, nos...(a)see.signature (Richard Maine) wrote: > helvio <helvio.vairin...(a)googlemail.com> wrote: > > In sum, I think my doubts reduce to the question of whether the > > efficiency of accessing the physical memory depends on the size of the > > allocated memory, or if it is independent of it. > > It should be independent of it, or anyway close enough that you won't be > able to measure the difference. Thank you! :) > > I also have the following related question: can the ALLOCATION / > > DEALLOCATION statements slow down the program if they are called > > multiple times, as compared with a single static declaration of "U"? > > Yes, definitely. Of course, as with many things, that's only going to > matter if it is in an inner enough loop to be called lots of times. Yup! These statements are indeed called many times, in one of the most time consuming areas of my code. But the amount of matrix multiplications among rank-2 subarrays of the U's and V's between ALLOCATE and DEALLOCATE will definitely overshadow the amount of time taken to allocate them. My main worry was that the program could slow down if memory access depended significantly on the size of the total allocated memory (because I will have to access the memory locations of the V's many times for matrix-multiplying their rank-2 subarrays). I always had this idea in my head that physical memory access is one of the slowest elementary processes, I just don't have a feeling for how significantly slow it is. But since you say that memory access is essentially independent of the size of allocated memory, all I have to worry about is not to exceed the available physical memory (N = 50000 was just an example). ;) > It would seem that some of the classic advice on performance > optimization is in order. I don't feel like digging up the exact quotes; > there are some pretty well known ones. Yup! This doesn't stop me at all from writing my code! I asked it mostly as an academic question, to learn a little bit more. > 1. Worry about making the code right before you worry about making it > fast. *thumbs up* > 2. Improvements in algorithm are worth far more than code tweaks. *thumbs up* > 3. Even major improvements in code performance aren't going to matter > unless they are in time-critical potions of the program in the first > place. That one applies to your question above. Allocation and > deallocation do take time, but if much computation happens between the > allocation and deallocation, then their time usage is not likely to > matter relative to the computation. Yup! Not a problem. > 4. When you do get to trying to tweak code to improve performance, > *MEASURE* the effects with your own code. Even experts can and regularly > do get surprised and things do vary from code to code. That means you > should not just accept performance judgements that people might give you > here. Yes, "people" includes me. It's not a major tweak, it's just about choosing between two straightforward ways of declaring arrays, both of which work. I'll stick to one of them until it's time to test my code. Only then I will measure the difference between the two options. And if I witness any significant effects, then I might come back to this post and make a comment about it. Thanks a lot to all! You're always very helpful! --helvio
From: glen herrmannsfeldt on 2 Jun 2010 17:16 helvio <helvio.vairinhos(a)googlemail.com> wrote: > On Jun 2, 4:35�pm, nos...(a)see.signature (Richard Maine) wrote: >> helvio <helvio.vairin...(a)googlemail.com> wrote: >> > In sum, I think my doubts reduce to the question of whether the >> > efficiency of accessing the physical memory depends on the size of the >> > allocated memory, or if it is independent of it. >> It should be independent of it, or anyway close enough that >> you won't be able to measure the difference. In some theoretical calculations log(n) is used, and as an approximation that probably isn't so bad. >> > I also have the following related question: can the ALLOCATION / >> > DEALLOCATION statements slow down the program if they are called >> > multiple times, as compared with a single static declaration of "U"? >> Yes, definitely. Of course, as with many things, that's only going to >> matter if it is in an inner enough loop to be called lots of times. > Yup! These statements are indeed called many times, in one of the most > time consuming areas of my code. But the amount of matrix > multiplications among rank-2 subarrays of the U's and V's between > ALLOCATE and DEALLOCATE will definitely overshadow the amount of time > taken to allocate them. It is mostly a problem in object-oriented programming. Objects have to be allocated and deallocated, often many times. A matrix usually won't be allocated in the inner loop, but two loops out (for the two dimensions of the matrix). > My main worry was that the program could slow down if memory access > depended significantly on the size of the total allocated memory > (because I will have to access the memory locations of the V's many > times for matrix-multiplying their rank-2 subarrays). I always had > this idea in my head that physical memory access is one of the slowest > elementary processes, I just don't have a feeling for how > significantly slow it is. Well, it is but the rules are more complicated. Consider the two: DO I=1,N DO J=1,N A(I,J)=B(I,J)+C(I,J) ENDDO ENDDO DO J=1,N DO I=1,N A(I,J)=B(I,J)+C(I,J) ENDDO ENDDO The number of memory accesses is the same for both, but the times might be very different. > But since you say that memory access is essentially independent of the > size of allocated memory, all I have to worry about is not to exceed > the available physical memory (N = 50000 was just an example). ;) Probably you should stay below about half physical memory. The OS may be using some, and that can make a big difference. >> It would seem that some of the classic advice on performance >> optimization is in order. I don't feel like digging up the >> exact quotes; there are some pretty well known ones. > Yup! This doesn't stop me at all from writing my code! I asked it > mostly as an academic question, to learn a little bit more. >> 1. Worry about making the code right before you worry >> about making it fast. > *thumbs up* >> 2. Improvements in algorithm are worth far more than code tweaks. > *thumbs up* >> 3. Even major improvements in code performance aren't going to matter >> unless they are in time-critical potions of the program in the first >> place. That one applies to your question above. Allocation and >> deallocation do take time, but if much computation happens between the >> allocation and deallocation, then their time usage is not likely to >> matter relative to the computation. Well, if you add a bunch of 2x2 matrices you might notice... > Yup! Not a problem. >> 4. When you do get to trying to tweak code to improve performance, >> *MEASURE* the effects with your own code. Even experts can and regularly >> do get surprised and things do vary from code to code. That means you >> should not just accept performance judgements that people might give you >> here. Yes, "people" includes me. Code to Code, compiler to compiler, system to system. Way too many ways to keep track of. > It's not a major tweak, it's just about choosing between two > straightforward ways of declaring arrays, both of which work. I'll > stick to one of them until it's time to test my code. Only then I will > measure the difference between the two options. And if I witness any > significant effects, then I might come back to this post and make a > comment about it. For smaller arrays, one guess is that it takes one more memory access for automatic over static, and one more for allocatable over automatic. That can be less true as they get larger, though. -- glen
From: robin on 2 Jun 2010 21:57 "helvio" <helvio.vairinhos(a)googlemail.com> wrote in message news:5a8433aa-8b46-4485-98b0-dab0b822b1d7(a)f14g2000vbn.googlegroups.com... I also have the following related question: can the ALLOCATION / DEALLOCATION statements slow down the program if they are called multiple times, as compared with a single static declaration of "U"? e.g. by introducing a loop in my example above: do i=1,M call using_UV ! U is allocated here call kill_U ! U is deallocated here end do ALLOCATE and DEALLOCATE do take extra time, but it is unlikely you could notice the time, let alone measure it. CALLing a subroutine will take more time than ALLOCATE does. You would have to ALLOCATE / DEALLOCATE a million times before the time becomes significant, and even then, the time taken by the remainder of the loop will be far far far greater than the time taken by ALLOCATE / DEALLOCATE.
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: New gfortran bug Next: optimized code crashes under ifort |