comment on my, code, Im little bit rusty [ASM]

Prev: Was NOP deprecated for AA64? NASM not disassembling...
Next: The project BIEW was renamed into BEYE

From: Branimir Maksimovic on 1 Mar 2010 14:35

Bobbias wrote:
>
> It would have been a better idea to upload the source to something
> like http://pastebin.com/ and link to the page there, giving those who
> are interested, the ability to read the source, and those who aren't
> interested, the chance to avoid scrolling for 5 minutes to get to
> another post, lol.

Well, when I started to use usenet downloading off line
mail/news from ISP was 3 time cheaper then conneting to internet,
so putting internet links to posts wasn;t preferable method ;0)
You can;t satisfy all ;)

All in all, I got 5 seconds sped up by just using xlat
to convert ACGT -> 0,1,2,3, and reducing
size of table by two, and using hash=hash*181 + str[i]
instead of 31. ;)
Seems that c++ hash table implementation has much more
cache hits than this one I use, table[(hash&0x1ffff)<<5] (4 str entries
per raw)
and probably no collisions.
But I can make memory usage 4 times smaller since there
is possibility to use just 2 bits to encode string (only 4
possible chars) , instead of 8 ;)

Greets

From: Trifle Menot on 1 Mar 2010 14:49

On Mon, 01 Mar 2010 20:35:35 +0100, Branimir Maksimovic
<bmaxa(a)hotmail.com> wrote:

>> It would have been a better idea to upload the source to something
>> like http://pastebin.com/ and link to the page there, giving those who
>> are interested, the ability to read the source, and those who aren't
>> interested, the chance to avoid scrolling for 5 minutes to get to
>> another post, lol.

>Well, when I started to use usenet downloading off line
>mail/news from ISP was 3 time cheaper then conneting to internet,
>so putting internet links to posts wasn;t preferable method ;0)
>You can;t satisfy all ;)

Nowdays, so many usenet text groups are dead, or near death, anything
that comes across the wire is amazing to see.

--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php

From: Branimir Maksimovic on 7 Mar 2010 15:34

I got speed boost now significantly,
with this approach:

Hash is now packed A,C,G,T represented as 0,1,2,3
(2 bits) (string is hash itself), I just check 2 32 bit words instead
of whole string
Limited to just two 32 bit words as benchmark programs maximum
string length is 18 bytes , and you can pack 32 bytes in those words.

macro hash str,size
{
local l1,l2,l3
mov ecx,size
mov ebx,str
xor eax,eax
mov esi,16
mov edi,2
l1:
shl eax,2
movzx edx,byte [ebx]
or eax,edx
inc ebx
dec esi
jnz l2
push eax
xor eax,eax
mov esi,16
dec edi
l2:
dec ecx
jnz l1
l3:
push eax
dec edi
jnz l3
}

hash find is slightly complicated as now table of pointers
to rows is allocated, (rather then fixed slot entries
in one alloc), it saves memory at cost of another level of indirection.

macro hashfind data,elements,block,srchstr,srchlen
{
push srchstr
hash srchstr,srchlen
mov ebx,data
strfind elements,block
}

macro strfind elements,block
{
local l1,l2,l3,l4,l5,e1
pop edx eax ecx
push eax
xor esi,esi
and eax,0x3ffff
shl eax,2
cmp dword[ebx+eax],0
push eax ebx
jne l3
l1:
; allocate
add esi,20
pop ebx eax
push eax ebx ecx edx
ccall realloc,dword[ebx+eax],esi
mov esi,eax
cmp esi,0
jz e2
pop edx ecx ebx eax
cmp dword[ebx+eax],0
mov dword[ebx+eax],esi
jne l2
mov esi, dword[ebx+eax]
mov dword[esi],0
l2:
mov ebx,dword[ebx+eax]
add ebx,4
mov eax,dword[ebx-4]
imul eax,16
mov dword[ebx+eax+12],edx
mov esi,dword[esp]
mov dword[ebx+eax+8],esi
mov dword[ebx+eax+4],ecx
mov dword[ebx+eax],0
inc dword[elements]
inc dword[ebx-4]
push esi esi
jmp e1
;search
l3:
mov ebx,dword[ebx+eax]
add ebx,4
xor eax,eax

l4:
mov esi,dword[ebx-4]
imul esi,16
cmp eax,esi
jge l1 ; we need to reallocate
mov esi,dword[esp+8]
cmp dword[ebx+eax+8],esi ; check first half of packed string
jne l5
cmp dword[ebx+eax+12],edx ; check second half of packed string
je e1

l5:
add eax,16
jmp l4
e1:
pop esi esi esi
lea eax,[ebx+eax]
}

All in all, now I have just one second behind c++ program.
18 secs my proggy, 17 sec c++ (single treaded), but memory usage
is maximally compacted.
C++ hash table implementation is extremely cache friendly.
It needs 11 sec for hash lookup, I need 4 more because
of dynamically allocated rows (more cache misses)!
But I got it with packed string ;), while c++ program
checks whole string.
Now lets see how it will perform multi-threaded.

Greets

--
http://maxa.homedns.org/

Sometimes online sometimes not

First | Prev |
Pages: 1 2
Prev: Was NOP deprecated for AA64? NASM not disassembling...
Next: The project BIEW was renamed into BEYE