Prev: OctaOS
Next: DIV overflow
From: Guga on 28 Mar 2007 13:55 On Mar 28, 9:02 am, "Wolfgang Kern" <nowh...(a)never.at> wrote: > Hello Guga, > [..] > > <quote Guga..> > Hi wolfgang.. tks for the reply. > > Have a good party :):):) > > I can remove the usage of stack frames. I´m just used to them for > readability purposes mainly. I distinguish better a functin from > another when i see the 1st "Proc" on the begginning of a line. > > About thye checkings for errors and limits on the ascii string.. yes.. > they are used in other routine. On the example i provided, i built it > only when the string was already checked. > > "80 bit conversion could be done in three registers, but for 128 bit > I'm afraid you either need a few LOCALs or use SSE to speed it up. " > > Yes.. this is what i was thinking. Using 80 bit in 3 registers, and on > 128 bit, using local to compute the data, and returning them in 4 > registers (or returning in inside a global data - like a structure) > > "I can extract the method for fix-sized conversion and convert it into > readable ASM." > > Tks.. i´ll appreciate it :) > > But.. if you suceed to do for 128 bit.. is SSE really needed ? > </quote> > > The party ended somehow heavy this morning :) > > I found my old (KESYS1998) 80-bit conversion which is short but > slow (later versions don't support this 'odd' IEEE-754 format anymore). > It first compressed the string to BCD and used FBLD followed by > FISTP and needed some rounding overhead, > but you asked for an 80-bit integer and not the 80-bit FPU format ? > btw: where is this required ? > > For speed reasons I now use my tiny calculator routines which > works with a DEC<->2^n LUT on 256 bit variables. > This table is quite long (78*9 entries 32 byte each ~22KB), > [maximal 77.1 decimal digits can be represented with 256 unsigned bits, > only nine entries per decade in the table (a partial log-LUT)] > and so it's also usable for many other calculation. > > For the rare used 512-bit values I use a shorter table > which contain just every 10th digitvalue but needs one line multiply. > > In your 128/80-bit case the LUT would need 39*9*16 bytes [~5.5 KB] > and you can use it for 64-bit conversion as well. > > this then would work like (somehow fast): > ___________________ > LUT_CONV_ASCII2BIN: > > XOR esi,esi ;result go to three regs for 80/96 bit > MOV ebx,esi ; > MOV edx,esi ; > > MOV ecx str_len -1 ;this is power10 of 1st digit (MSD) > L1: > MOVZX eax B$strptr+ecx ;we start with MSD, just for fun? > SHL al,4 ;mul by 16 (entry size) and get rid of 030 > JZ L2> ;skip if 0 > > ; LEA edi,D$ecx+ecx*8+table_ptr ;not sure if RosAsm accept this ? > ; so you might need to split it into two lines: > > LEA edi D$ecx+ecx*8 ;mul digits power by 9 > ADD edi table_ptr ;table offset for power > > ADD edi,eax ;table offset for digit > ; as above, LEA could combine the two ADD lines > > ;now just add the table entry to your destination: > ;ie 80 bit: > ADD ebx D$edi > ADC edx D$edi+4 > ADC si W$edi+8 > L2: > DEC ecx | JNS L1< ;next digit, and we include "+0" . > done: > RET > _____________ > > For 128-bit the story can be similar, and if you could avoid > the stack-frames then ebp could be the 'missing forth' register. ;) > > If you need the code for table creation or just the table > I can mail it to you. > But it just contains binary expressed decimals starting > with 1..9,10..90, and so on. So I'm sure you can do it as well. > > And No, SSE is not really required, even it may be faster than > a plain register/buffer line MUL solution. > > __ > wolfgang Oops.. i forgot... i just answerd to you... But.. how do i multiply a value by 10 without using mul ? Is there a way to use lea to multiply a number by 10 ? The usage of lea is only to speed up a little. Best Regards Guga
From: Wolfgang Kern on 28 Mar 2007 14:56 Hello Guga, [..] <> Oops.. i forgot... i just answerd to you... But.. how do i multiply a value by 10 without using mul ? Is there a way to use lea to multiply a number by 10 ? The usage of lea is only to speed up a little. </> for 32 bits it will only work until 0_1999_9999*0A: LEA eax D$eax+eax*4 ;*5 ADD eax,eax ;*2 but you got larger figures to multiply, so you could either use (similar what Randy posted) a bitwise shift-add which takes its time or you can do it 32-bit wise (faster) instead. Both variants loop a lot and needs much time, so I finally used the LUT solution. __ wolfgang
From: Guga on 28 Mar 2007 16:14 On Mar 28, 10:56 am, "Wolfgang Kern" <nowh...(a)never.at> wrote: > Hello Guga, > [..] > <> > Oops.. i forgot... i just answerd to you... > > But.. how do i multiply a value by 10 without using mul ? > > Is there a way to use lea to multiply a number by 10 ? The usage of > lea is only to speed up a little. > </> > > for 32 bits it will only work until 0_1999_9999*0A: > > LEA eax D$eax+eax*4 ;*5 > ADD eax,eax ;*2 > > but you got larger figures to multiply, > so you could either use (similar what Randy posted) > a bitwise shift-add which takes its time or > you can do it 32-bit wise (faster) instead. > > Both variants loop a lot and needs much time, > so I finally used the LUT solution. > > __ > wolfgang Ok.. but in cases of overflows, it will place the resultant overflown value in edx ? Best Regards, Guga
From: Guga on 28 Mar 2007 17:03 On Mar 28, 12:14 pm, "Guga" <Guga...(a)gmail.com> wrote: > On Mar 28, 10:56 am, "Wolfgang Kern" <nowh...(a)never.at> wrote: > > > > > > > Hello Guga, > > [..] > > <> > > Oops.. i forgot... i just answerd to you... > > > But.. how do i multiply a value by 10 without using mul ? > > > Is there a way to use lea to multiply a number by 10 ? The usage of > > lea is only to speed up a little. > > </> > > > for 32 bits it will only work until 0_1999_9999*0A: > > > LEA eax D$eax+eax*4 ;*5 > > ADD eax,eax ;*2 > > > but you got larger figures to multiply, > > so you could either use (similar what Randy posted) > > a bitwise shift-add which takes its time or > > you can do it 32-bit wise (faster) instead. > > > Both variants loop a lot and needs much time, > > so I finally used the LUT solution. > > > __ > > wolfgang > > Ok.. but in cases of overflows, it will place the resultant overflown > value in edx ? > > Best Regards, > > Guga- Hide quoted text - > > - Show quoted text - Hi wolfgang This is the rpeliminary version. It can work with literally _any_ bit size. The example of code i´m posting converts an decimal ascii string to 128bit I´m now reviewing the code, and see if i can improve it´s speed, and insert comments on how to use for other bit sizes (512, for example :) :) It seems to be precise. here is the code.. it is subject to changes.. so i´m posting here mainly for testing. [Value: Value.Conv32Bit: D$0 Value.Conv64Bit: D$ 0 Value.Conv96Bit: D$0 Value.Conv128Bit: D$ 0] [Value.Conv32BitDis 0 Value.Conv64BitDis 4 Value.Conv96BitDis 8 Value.Conv128BitDis 12] [MUL_32BIT 1] [MUL_64BIT 2] [MUL_96BIT 3] [MUL_128BIT 4] Proc AtoiAnyBit: Arguments @String mov edi D(a)String mov esi Value While B$edi <> 0 call alldecmulGuga push edi movsx eax B$edi sub eax '0' cdq add D$esi+Value.Conv32BitDis eax ; 1st char adc D$esi+Value.Conv64BitDis edx ; the product of the multiplication of the char by 10 is stored in esi adc D$esi+Value.Conv96BitDis edx ; the product of the multiplication of the char by 10 is stored in esi adc D$esi+Value.Conv128BitDis edx ; the product of the multiplication of the char by 10 is stored in esi pop edi inc edi End_While EndP [OverflowData: D$ 0] alldecmulGuga: xor eax eax ; always initialize eax 0 mov ecx 10 push edi mov edi MUL_128BIT ; 4 dwords to analyse mov ebx Value.Conv128BitDis ; we always star fromt the last member and keep decreasing it L1: mov D$OverflowData eax ; copy to be added to the remainder mov eax D$esi+ebx;Value.Conv32BitDis; Value.Conv32Bit;ebx --- the targeted value mul ecx mov D$esi+ebx eax ; added now. copy the result value to the targedt value add edx D$OverflowData cmp edi MUL_128BIT | je L0> ; avoid copying it ouside the limits of the structure mov D$esi+ebx+4 edx ; copy the remainder to the next 4 dword of the structure L0: dec ebx ; subtract edi by 4. we need to decrease it to point to the next member of the structure dec ebx ; Also we don´t want to affect the carry flag. This is why use dec, instead sub edi 4 dec ebx dec ebx dec edi | jne L1< pop edi ret Best Regards, Guga
From: Guga on 28 Mar 2007 17:27
Hi wolfgang tks for the tip of " SHL al,4 ;mul by 16 (entry size) and get rid of 030 " i suppose that using: shl al 4 shr al 4 is faster then sub al 030 right ? |