From: Vincent Fatica on
On Mon, 7 Sep 2009 10:58:23 +0800, "xiaosi" <xiaosi(a)cn99.com> wrote:

|The titles of msdn are some misleading, while intrin.h is more truthful:
|** __MACHINEI : Intel (32 bit x86) and X64
|__MACHINEI(void __stosb(unsigned char *, unsigned char, size_t))

memset() in in there too (__MACHINE, for all compilers). But I can find **no**
evidence that it's implemented as intrinsic by VC9. No matter what I do (/Oi,
#pragma intrinsic(memset), ...) I always get push, push, push, call _memset.
--
- Vince
From: xiaosi on
#pragma intrinsic(memset) does not force memset to be inline:

"The intrinsic pragma tells the compiler that a function has known behavior. The compiler may call the function and not replace the
function call with inline instructions, if it will result in better performance."[1]

"I tried to look at the library implementation of memset, to understand why the compiler is refusing to generate an intrinsic
version memset (even if I use #pragma intrinsic(memset)!). Turns out the library memset checks to see if your CPU supports SSE2, and
if it does, it calls _VEC_memzero to perform this huge unrolled loop, doing eight 128-bit aligned stores in each iteration. I guess
they profiled this code and found it's faster than using old-fashioned "rep stosd", but since it's too complicated to inline several
different variations, they decided it's better to call the library function."[2]

If you have installed DDK[3] or WDK[4], you can use memset of ntdll.lib instead of crt lib:
#pragma comment(lib, "F:\\WINDDK\\3790.1830\\lib\\wxp\\i386\\ntdll.lib")
STARTUPINFO si = {sizeof(si)};

00401052 6a40 push 40h
00401054 8d4c2424 lea ecx,[esp+24h]
00401058 6a00 push 0
0040105a 51 push ecx
0040105b c744242844000000 mov dword ptr [esp+28h],44h
00401063 e834000000 call test!memset (0040109c)
test!memset:
0040109c ff2518204000 jmp dword ptr [test!_imp__memset (00402018)] ds:0023:00402018={ntdll!memset (7c922435)}
ntdll!memset:
7c922435 8b54240c mov edx,dword ptr [esp+0Ch] ss:0023:0012ff60=00000040

[1] http://msdn.microsoft.com/en-us/library/tzkfha43.aspx
[2] http://www.codeguru.com/forum/showthread.php?t=371491&page=2
[3] http://www.microsoft.com/whdc/devtools/ddk/default.mspx
[4] http://www.microsoft.com/whdc/DevTools/WDK/WDKpkg.mspx

"Vincent Fatica" <vince(a)blackholespam.net> wrote:
> On Mon, 7 Sep 2009 10:58:23 +0800, "xiaosi" <xiaosi(a)cn99.com> wrote:
>
> |The titles of msdn are some misleading, while intrin.h is more truthful:
> |** __MACHINEI : Intel (32 bit x86) and X64
> |__MACHINEI(void __stosb(unsigned char *, unsigned char, size_t))
>
> memset() in in there too (__MACHINE, for all compilers). But I can find **no**
> evidence that it's implemented as intrinsic by VC9. No matter what I do (/Oi,
> #pragma intrinsic(memset), ...) I always get push, push, push, call _memset.
> --
> - Vince

From: xiaosi on
"Vincent Fatica" <vince(a)blackholespam.net> wrote:
> while ( *pCmdLine && *pCmdLine != L' ' )
> pCmdLine += 1;
> pCmdLine += 1;

The above codes work when there's only one space between argv[0] and argv[1].
If there's more than one space or tab, the above codes should change to:
while ( *pCmdLine && *pCmdLine <= L' ' )
pCmdLine += 1;
From: Ulrich Eckhardt on
Vincent Fatica wrote:
> (VC9) I am trying to avoid the runtime library in a tiny app (something I
> do
> regularly). When I try to zero-fill a STARTUPINFO struct with a for-loop,
> the compiler turns my for-loop into a call to _memset.
>
> ; 13 : STARTUPINFO si;
> ; 14 : si.cb = sizeof(si);
> ; 15 : for (BYTE *p = (BYTE*) &si + sizeof(si.cb);
> p < (BYTE*) &si + sizeof(si); p++)
> ; 16 : *p=0;

I'd call this pretty asinine, how about a portable (yeah, as if it mattered
to win32 code...) and straight-forward

STARTUPINFO si = {0};

and leaving the initialisation to the compiler then?

> push 64 ; 00000040H
> lea edx, DWORD PTR _si$[esp+104]
> push 0
> push edx
> add esi, 2
> mov DWORD PTR _si$[esp+108], 68 ; 00000044H
> call _memset
> add esp, 12 ; 0000000cH
>
> How do I avoid that (elegantly)? Is it some kind of optimization I can
> simply
> turn off? I can trick the compiler with the likes of
>
> ; 16 : *p = p ? 0 : 1; // in the loop
>
> That avoids the _memset, but seems particularly kludgy.

How about this:

char simem[sizeof (STARTUPINFO)] = {0};
STARTUPINFO si = (STARTUPINFO*)simem;

or maybe even a union?


Just wondering: Why do you care?

Uli

--
C++ FAQ: http://parashift.com/c++-faq-lite

Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
From: xiaosi on
Vincent had written "avoid the runtime library in a tiny app".

If you use memset of vc runtime library, the app will imports many codes (which are unnecessary) of vc runtime library, the app size
will be larger than 40 KB(/MT), or 7KB(/MD) + 640 KB MSVCR90.DLL.

By excluding the vc runtime library, this app size is only 3 KB.


"Ulrich Eckhardt" <eckhardt(a)satorlaser.com> wrote:
> I'd call this pretty asinine, how about a portable (yeah, as if it mattered
> to win32 code...) and straight-forward
>
> STARTUPINFO si = {0};
>
> and leaving the initialisation to the compiler then?
>
> How about this:
>
> char simem[sizeof (STARTUPINFO)] = {0};
> STARTUPINFO si = (STARTUPINFO*)simem;
>
> or maybe even a union?
>
> Just wondering: Why do you care?
>
> Uli