Prev: Was NOP deprecated for AA64? NASM not disassembling...
Next: The project BIEW was renamed into BEYE
From: Branimir Maksimovic on 1 Mar 2010 14:35 Bobbias wrote: > > It would have been a better idea to upload the source to something > like http://pastebin.com/ and link to the page there, giving those who > are interested, the ability to read the source, and those who aren't > interested, the chance to avoid scrolling for 5 minutes to get to > another post, lol. Well, when I started to use usenet downloading off line mail/news from ISP was 3 time cheaper then conneting to internet, so putting internet links to posts wasn;t preferable method ;0) You can;t satisfy all ;) All in all, I got 5 seconds sped up by just using xlat to convert ACGT -> 0,1,2,3, and reducing size of table by two, and using hash=hash*181 + str[i] instead of 31. ;) Seems that c++ hash table implementation has much more cache hits than this one I use, table[(hash&0x1ffff)<<5] (4 str entries per raw) and probably no collisions. But I can make memory usage 4 times smaller since there is possibility to use just 2 bits to encode string (only 4 possible chars) , instead of 8 ;) Greets
From: Trifle Menot on 1 Mar 2010 14:49 On Mon, 01 Mar 2010 20:35:35 +0100, Branimir Maksimovic <bmaxa(a)hotmail.com> wrote: >> It would have been a better idea to upload the source to something >> like http://pastebin.com/ and link to the page there, giving those who >> are interested, the ability to read the source, and those who aren't >> interested, the chance to avoid scrolling for 5 minutes to get to >> another post, lol. >Well, when I started to use usenet downloading off line >mail/news from ISP was 3 time cheaper then conneting to internet, >so putting internet links to posts wasn;t preferable method ;0) >You can;t satisfy all ;) Nowdays, so many usenet text groups are dead, or near death, anything that comes across the wire is amazing to see. -- Web mail, POP3, and SMTP http://www.beewyz.com/freeaccounts.php
From: Branimir Maksimovic on 7 Mar 2010 15:34 I got speed boost now significantly, with this approach: Hash is now packed A,C,G,T represented as 0,1,2,3 (2 bits) (string is hash itself), I just check 2 32 bit words instead of whole string Limited to just two 32 bit words as benchmark programs maximum string length is 18 bytes , and you can pack 32 bytes in those words. macro hash str,size { local l1,l2,l3 mov ecx,size mov ebx,str xor eax,eax mov esi,16 mov edi,2 l1: shl eax,2 movzx edx,byte [ebx] or eax,edx inc ebx dec esi jnz l2 push eax xor eax,eax mov esi,16 dec edi l2: dec ecx jnz l1 l3: push eax dec edi jnz l3 } hash find is slightly complicated as now table of pointers to rows is allocated, (rather then fixed slot entries in one alloc), it saves memory at cost of another level of indirection. macro hashfind data,elements,block,srchstr,srchlen { push srchstr hash srchstr,srchlen mov ebx,data strfind elements,block } macro strfind elements,block { local l1,l2,l3,l4,l5,e1 pop edx eax ecx push eax xor esi,esi and eax,0x3ffff shl eax,2 cmp dword[ebx+eax],0 push eax ebx jne l3 l1: ; allocate add esi,20 pop ebx eax push eax ebx ecx edx ccall realloc,dword[ebx+eax],esi mov esi,eax cmp esi,0 jz e2 pop edx ecx ebx eax cmp dword[ebx+eax],0 mov dword[ebx+eax],esi jne l2 mov esi, dword[ebx+eax] mov dword[esi],0 l2: mov ebx,dword[ebx+eax] add ebx,4 mov eax,dword[ebx-4] imul eax,16 mov dword[ebx+eax+12],edx mov esi,dword[esp] mov dword[ebx+eax+8],esi mov dword[ebx+eax+4],ecx mov dword[ebx+eax],0 inc dword[elements] inc dword[ebx-4] push esi esi jmp e1 ;search l3: mov ebx,dword[ebx+eax] add ebx,4 xor eax,eax l4: mov esi,dword[ebx-4] imul esi,16 cmp eax,esi jge l1 ; we need to reallocate mov esi,dword[esp+8] cmp dword[ebx+eax+8],esi ; check first half of packed string jne l5 cmp dword[ebx+eax+12],edx ; check second half of packed string je e1 l5: add eax,16 jmp l4 e1: pop esi esi esi lea eax,[ebx+eax] } All in all, now I have just one second behind c++ program. 18 secs my proggy, 17 sec c++ (single treaded), but memory usage is maximally compacted. C++ hash table implementation is extremely cache friendly. It needs 11 sec for hash lookup, I need 4 more because of dynamically allocated rows (more cache misses)! But I got it with packed string ;), while c++ program checks whole string. Now lets see how it will perform multi-threaded. Greets -- http://maxa.homedns.org/ Sometimes online sometimes not
First
|
Prev
|
Pages: 1 2 Prev: Was NOP deprecated for AA64? NASM not disassembling... Next: The project BIEW was renamed into BEYE |