Prev: OctaOS
Next: DIV overflow
From: Wolfgang Kern on 29 Mar 2007 09:57 Hi Guga, > " SHL al,4 ;mul by 16 (entry size) and get rid of 030 > " > > i suppose that using: > shl al 4 > shr al 4 > > is faster then > > sub al 030 > > right ? Not at all, just by coincidence we can save on the sub al 030 (AND al 0F would do it as well), when we shift the four upper bits into nowhere :) __ wolfgang
From: Wolfgang Kern on 29 Mar 2007 10:09 Wannabee screv: >> But.. how do i multiply a value by 10 without using mul ? >> >> Is there a way to use lea to multiply a number by 10 ? The usage of >> lea is only to speed up a little. > lea ecx D$ecx*4+ecx > lea ecx D$ecx*2 Yes, this keeps all flags alive, But Intel-CPUs use the slow shift mechanism for SIB-factorising. I'd replace the *2 with LEA ecx D$ecx+ecx, but you wont see much a difference anyway due to address(LEA) and register(dependecy) stall penalties. __ wolfgang
From: /o//annabee on 29 Mar 2007 19:12 P� Thu, 29 Mar 2007 16:09:38 +0200, skrev Wolfgang Kern <nowhere(a)never.at>: > > Wannabee screv: > > >>> But.. how do i multiply a value by 10 without using mul ? >>> >>> Is there a way to use lea to multiply a number by 10 ? The usage of >>> lea is only to speed up a little. > >> lea ecx D$ecx*4+ecx >> lea ecx D$ecx*2 > > Yes, this keeps all flags alive, I understood that. > But Intel-CPUs use the slow shift mechanism for SIB-factorising. ok. i understood (slow) and (intel) > I'd replace the *2 with LEA ecx D$ecx+ecx, > but you wont see much a difference anyway > due to address(LEA) and register(dependecy) stall penalties. I saw you used "add ecx ecx" for it above. It did not occured to me that there would be a diffrence between LEA ecx D$ecx+ecx and lea ecx D$ecx*2 but I see they end up using diffrent parts of the CPU logic, controller, gates ? (words i never use much) I would have guess they ended up the same, internally. since I thought multiplication was performed allways via addition. > __ > wolfgang > > >
From: Wolfgang Kern on 30 Mar 2007 07:07 Wannabee screv: .... >> But Intel-CPUs use the slow shift mechanism for SIB-factorising. > ok. i understood (slow) and (intel) AMD's got faster shift hardware. > > I'd replace the *2 with LEA ecx D$ecx+ecx, > > but you wont see much a difference anyway > > due to address(LEA) and register(dependecy) stall penalties. > > I saw you used "add ecx ecx" for it above. > It did not occured to me that there > would be a diffrence between > > LEA ecx D$ecx+ecx > and > lea ecx D$ecx*2 > > but I see they end up using diffrent > parts of the CPU logic, controller, gates ? (words i never use much) This CPU-internal job-queus can work almost in parallel if they can use different 'PIPES' (w/o dependencies of course). > I would have guess they ended up the same, internally. > since I thought multiplication was performed > allways via addition. Yes intMUL act with shift-add, but the SIB(*2^0..3) use just shift. __ wolfgang
From: Wolfgang Kern on 31 Mar 2007 12:50
Hello Guga, < For example, if i have a decimal string as: < Decimal: < 48148617815154186478618618618258218 .... < 687878489746514564564897848745174861831717821729 < The correct values i found are: < [Value: < Value.Conv32Bit: D$ 0A9549D21 < Value.Conv64Bit: D$ 056372845 < Value.Conv96Bit: D$ 0334CD7E6 < Value.Conv128Bit: D$ 0F2D05D75] ??? an 128 bit binary value is limited to: integer(log(2)*MSbitNr) 0,301029 * 128 = 38 decimal digits 2^128 = 3,4028236692093846346337460743177... e+38 So you should limit the input to 38 digits and to be <2^128. I didn't count how many digits you posted here, but for this ascii-number you might need a 'very large' result buffer. ie: you need a 1024-bit result for 308 decimal digits. __ wolfgang |