From: Chris Morley on 10 Jun 2010 10:11 >> movl 8(%ebp), %edx >> movl 12(%ebp), %ecx >> movb 16(%ebp), %al >> movl %edx, %edi >> rep stosb >> >>sets up the byte pattern in AL, the destination address in EDX and the >>count in ECX. "rep stosb" triggers the loop which implements: >> >> while (--ECX >= 0) >> *EDX++ = AL; > > Whoops, "EDX" should have been "EDI" in the above description. Going back to the OP's question about can you do better with a hand loop then typically yes. There are optimisations which can be made in C/C++ source (or assembler) which involve better use of bus width & cache. Some are general, others processor/platform specific & involved. e.g. on the 386DX(!) it was significantly quicker to movsd/stosd vs movsb/stosb as you push 32 bits per access. (still is now!) e.g. unrolling movsd vs rep movsd e.g. Pentium it was worth pushing doubles around (regardless of actual data type) to move 64bits (extend for MMX, 3dNow, then SSE(n) etc.) e.g. Cache block prefetching This is worth a read, while from 2002 & AMD specific it still has relevance (e.g. page 174+): http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf Also mentioned in post: http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/79dfd15c698a7187/bb52552bab788aba?lnk=gst&q=22007#bb52552bab788aba There are examples memcopy... the p75 version bandwidth ~1630 Mbytes/sec vs. the p67 rep mobsb ~570 Mbytes/sec. A memset with movntq for example for blocks >512 bytes. You will however need intrinsics/assembler to do this which stops being "c++" quite quickly!! Before people invoke 'portable' if you are targeting a specific platform you can optimise for that platform and still default to something else for other builds... You _can_ beat the memxxx libraries & std::x if you want/need but probably not worth the time/effort unless you are actually bandwidth limited. You would also sacrifice the generality & safety of the std:: funcs which other posters point out. Regards, Chris -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
First
|
Prev
|
Pages: 1 2 3 4 5 Prev: Temporaries passed to constructor yield invalid member references Next: About Stan Lippman |