From: rs1n on 8 Dec 2009 23:17 I am currently rewriting some code that writes data to the display. For anyone who has experience in timings of loops: How much overhead (timewise) does a loop cost versus unrolling the loop if the loop is fairly short? (I am trying to optimize some code for speed and size) Also, is there any other method for comparing run times short of wrapping snippets of code within an even larger loop that reruns the code snippet enough times to actually make a valid comparison? Take for example the following code, which renders one character on the screen (first done as straight code, then as a loop) Entry: D1 -> current screen position A[6] = 6 nibbles, each nibble containing row data of a char (FNT1 data) P= 1-1 DAT1=A P D1=D1+ 16 D1=D1+ 16 D1=D1+ 2 P= 2-1 DAT1=A P D1=D1+ 16 D1=D1+ 16 D1=D1+ 2 . . . P= 6-1 DAT1=A P P= 0 versus P= 15-6 - DAT1=A 0 D1=D1+ 16 D1=D1+ 16 D1=D1+ 2 ASR W P=P+1 GONC -
From: Raymond Del Tondo on 9 Dec 2009 00:10 Hi Han, you could also simply add up cycle times, which can be found in SASM.DOC;-) HTH Raymond
From: Han on 9 Dec 2009 09:38 Hmm, I'll look into that. On Dec 8, 11:10 pm, "Raymond Del Tondo" <Ih8...(a)nowhere.com> wrote: > Hi Han, > > you could also simply add up cycle times, > which can be found in SASM.DOC;-) > > HTH > > Raymond
From: Dave Hayden on 9 Dec 2009 10:27 On Dec 8, 11:17 pm, rs1n <handuongs...(a)gmail.com> wrote: > I am currently rewriting some code that writes data to the display. > For anyone who has experience in timings of loops: > I'm not sure that Raymond's suggestion to add up cycle times will work. After all, the Saturn is emulated on an ARM processor now, so the cycle times are probably no longer accurate. This might be a case where you'd be better off dropping into ARM assembly code. There is an example of how to do this in the 50G Advanced Users' Reference. It's a little tricky because you usually have to move the ARM code onto a 4-byte boundary before executing it. Also, although the example in the AUR doesn't show it, I worry that you may need to flush the cache after moving the code. Good luck with it! Dave
From: Yann on 9 Dec 2009 14:54 In your case, i would use loop without hesitation, for both compact code and maintenance reason (easier modifications later if needed). Operations on P are very fast, unnoticeable. And tests on Carry are also very reasonable (especially when no jump occurs, this is almost free). The only minor modification suggested in your proposition is getting rid of operation of full W register, which is substantially longer. If you could present your data in A[6] in a reverse order, this would be more straightforward : P= 6-1 - DAT1=A P D1=D1+ 16 D1=D1+ 16 D1=D1+ 2 P=P-1 GONC -
|
Next
|
Last
Pages: 1 2 Prev: Partial derivative of order n Next: SpeedUI - Important update to v. 9.08 |