From: sam on
Our IIS worker processes are crashing in our Production environment
intermittently with a memory access violation exception.

Our Production Configuration:
- 2 VMWare ESX Server 3.5.0 instances
- Each VMWare instance is running 3 web servers (Windows Server 2003/IIS6)

The crash frequency is pretty random. Some days we may get no crashes.
Other days, we may get 1 or 2 from different servers. And other days, we may
get more than that, and even multiple crashes from the same server. So far,
it does not appear to happen more or less often depending on workload. It
just seems pretty random.

To get more information on the crashes, I attached adplus in crash mode to
our w3wp processes and collected information on the exception. I am pasting
2 shortened samples below from 2 different crash occurrences:

---------------------------------------------------------------------------------------------------------------------------
1st sample:
---------------------------------------------------------------------------------------------------------------------------

---
--- 2nd chance AccessViolation exception ----
---------------------------------------------------------------

Occurrence happened at:
Debug session time: Wed Jun 16 21:16:58.906 2010 (GMT-4)
System Uptime: 11 days 12:31:39.624
Process Uptime: 0 days 22:22:16.080
Kernel time: 0 days 0:20:20.921
User time: 0 days 0:27:50.171

Faulting stack below ---
*** ERROR: Symbol file could not be found. Defaulted to export symbols for
C:\WINDOWS\system32\USER32.dll -
# ChildEBP RetAddr Args to Child
WARNING: Frame IP not in any known module. Following frames may be wrong.
Missing image name, possible paged-out or corrupt data.
Missing image name, possible paged-out or corrupt data.
Missing image name, possible paged-out or corrupt data.
00 0a6efd94 7739b6e3 04d20158 0000001e 00000000 <Unloaded_leup.dll>+0x23484107
01 0a6efdc0 7739b874 23640fef 04d20158 0000001e USER32!LoadCursorW+0x4cf5
02 0a6efe38 7739ba92 00000000 23640fef 04d20158 USER32!LoadCursorW+0x4e86
03 0a6efea0 7739bad0 08a3499c 00000000 0a6efecc
USER32!TranslateMessageEx+0x10d
*** ERROR: Symbol file could not be found. Defaulted to export symbols for
C:\WINDOWS\system32\comsvcs.dll -
04 0a6efeb0 4a77bde2 08a3499c 08a34998 08a34908 USER32!DispatchMessageW+0xf
05 0a6efecc 4a77bcf2 08a34928 08a34908 08a3498c
comsvcs!DllUnregisterServer+0x270
06 0a6efee4 4a77c7de 08a34998 00000001 08a34908
comsvcs!DllUnregisterServer+0x180
07 0a6eff04 4a77cabf 00000000 01968628 01944b38
comsvcs!DllUnregisterServer+0xc6c
*** ERROR: Symbol file could not be found. Defaulted to export symbols for
C:\WINDOWS\system32\msvcrt.dll -
08 0a6eff84 77bcb530 08a34908 00000000 00000000
comsvcs!DllUnregisterServer+0xf4d
*** ERROR: Symbol file could not be found. Defaulted to export symbols for
C:\WINDOWS\system32\kernel32.dll -
09 0a6effb8 77e64829 01944b38 00000000 00000000 msvcrt!endthreadex+0xa3
0a 0a6effec 00000000 77bcb4bc 01944b38 00000000 kernel32!GetModuleHandleA+0xdf

Creating C:\Program Files\Debugging Tools for Windows
(x86)\Crash_Mode__Date_06-15-2010__Time_22-56-35PM\PID-248336__W3WP.EXE_-DefaultAppPool-__2nd_chance_AccessViolation__full_cc90_2010-06-16_21-16-59-093_ca10.dmp - mini user dump
Dump successfully written

Executing custom commands
User Mode Time
Thread Time
89:3e074 0 days 0:00:22.406

-------------
(*) Note: Because I didn't have the symbols, my stack trace was a bit off.
I added the symbols when I opened the crash dump in VS2010, and here is the
corrected stack trace:

Exception Code: 0xC0000005
Exception Information: The thread tried to read from or or write to a
virtual address for which it does not have the appropriate access.

> 23484178()
user32.dll!_InternalCallWinProc(a)20() + 0x28 bytes
user32.dll!_UserCallWinProcCheckWow(a)32() + 0xa2 bytes
user32.dll!_DispatchMessageWorker@8() + 0xc8 bytes
user32.dll!_DispatchMessageW@4() + 0xf bytes
comsvcs.dll!CSTAQueueLessMessageWork::DoWork() + 0x4e bytes
comsvcs.dll!CSTAThread::DoWork() + 0x18 bytes
comsvcs.dll!CSTAThread::ProcessQueueWork() + 0x37 bytes
comsvcs.dll!CSTAThread::WorkerLoop() + 0x190 bytes
msvcrt.dll!__endthreadex() + 0xa3 bytes
kernel32.dll!_BaseThreadStart@8() + 0x34 bytes

---------------------------------------------------------------------------------------------------------------------------
2nd sample:
---------------------------------------------------------------------------------------------------------------------------

---
--- 2nd chance AccessViolation exception ----
---------------------------------------------------------------

Occurrence happened at:
Debug session time: Thu Jun 17 14:22:01.049 2010 (GMT-4)
System Uptime: 12 days 5:36:39.748
Process Uptime: 0 days 17:04:30.659
Kernel time: 0 days 0:16:12.078
User time: 0 days 0:21:33.515

Faulting stack below ---
*** ERROR: Symbol file could not be found. Defaulted to export symbols for
C:\WINDOWS\system32\USER32.dll -
# ChildEBP RetAddr Args to Child
WARNING: Frame IP not in any known module. Following frames may be wrong.
00 0a19fd94 7739b6e3 00ad026e 0000001e 00000000 <Unloaded_ON~1.DLL>+0x1e714127
01 0a19fdc0 7739b874 14520fef 00ad026e 0000001e USER32!LoadCursorW+0x4cf5
02 0a19fe38 7739ba92 00000000 14520fef 00ad026e USER32!LoadCursorW+0x4e86
03 0a19fea0 7739bad0 0a98520c 00000000 0a19fecc
USER32!TranslateMessageEx+0x10d
*** ERROR: Symbol file could not be found. Defaulted to export symbols for
C:\WINDOWS\system32\comsvcs.dll -
04 0a19feb0 4a77bde2 0a98520c 0a985208 0a985178 USER32!DispatchMessageW+0xf
05 0a19fecc 4a77bcf2 0a985198 0a985178 0a9851fc
comsvcs!DllUnregisterServer+0x270
06 0a19fee4 4a77c7de 0a985208 00000001 0a985178
comsvcs!DllUnregisterServer+0x180
07 0a19ff04 4a77cabf 00000000 01943518 0193de60
comsvcs!DllUnregisterServer+0xc6c
*** ERROR: Symbol file could not be found. Defaulted to export symbols for
C:\WINDOWS\system32\msvcrt.dll -
08 0a19ff84 77bcb530 0a985178 00000000 00000000
comsvcs!DllUnregisterServer+0xf4d
*** ERROR: Symbol file could not be found. Defaulted to export symbols for
C:\WINDOWS\system32\kernel32.dll -
09 0a19ffb8 77e64829 0193de60 00000000 00000000 msvcrt!endthreadex+0xa3
0a 0a19ffec 00000000 77bcb4bc 0193de60 00000000 kernel32!GetModuleHandleA+0xdf

Creating C:\Program Files\Debugging Tools for Windows
(x86)\Crash_Mode__Date_06-17-2010__Time_00-36-36AM\PID-270104__W3WP.EXE_-DefaultAppPool-__2nd_chance_AccessViolation__full_22b0_2010-06-17_14-22-01-205_1f18.dmp - mini user dump
Dump successfully written

Executing custom commands
User Mode Time
Thread Time
7:40e88 0 days 0:00:00.000


--------------
(*) Note: Here is the corrected stack trace with the symbols:

Exception Code: 0xC0000005
Exception Information: The thread tried to read from or or write to a
virtual address for which it does not have the appropriate access.

> 1e714178()
user32.dll!_InternalCallWinProc(a)20() + 0x28 bytes
user32.dll!_UserCallWinProcCheckWow(a)32() + 0xa2 bytes
user32.dll!_DispatchMessageWorker@8() + 0xc8 bytes
user32.dll!_DispatchMessageW@4() + 0xf bytes
comsvcs.dll!CSTAQueueLessMessageWork::DoWork() + 0x4e bytes
comsvcs.dll!CSTAThread::DoWork() + 0x18 bytes
comsvcs.dll!CSTAThread::ProcessQueueWork() + 0x37 bytes
comsvcs.dll!CSTAThread::WorkerLoop() + 0x190 bytes
msvcrt.dll!__endthreadex() + 0xa3 bytes
kernel32.dll!_BaseThreadStart@8() + 0x34 bytes
---------------------------------------------------------------------------------------------------------------------------


What I noticed is that the crash always occur while trying to execute an
instruction from an unloaded dll... <Unloaded_XXXX.DLL>...
According to other posts, this would indicate a bug in the dll, perhaps a
ref counting bug of some sort.

However, what worries me is that:
1) the crashes do not consistently happen on the same dll. Already, the 2
samples above are different. And I checked several others that crashed in
other dlls as well. Does that mean that all the different dlls (all from
different third party vendors) are buggy?

2) These crashes all started right after we updated our web application
build on June 5th 2010, which seems to indicate that something we did during
that update is causing (or exposing) all of this. Yet, none of the crashing
dlls were changed during this update. No Windows or IIS software or
configuration was modified. We did add a few third party libraries, and we
did add the VC++ 2005/2008 Runtimes, but I can't figure out what they have to
do with anything. So again, it's hard to point the finger at the "unloaded"
dlls, since they were there since before the crashes started to occur.

BTW, is it normal that the <Unloaded_DllNameGoesHere.DLL> does not include
the full name of the dll? For instance, the <Unloaded_leup.dll> seems to me
like it was referring to safilup.dll that we do use. I haven't figured out
which dll <Unloaded_ON~1.DLL> is referring to. And other traces where I have
been able to determine the dll, they all have an incomplete portion of the
dll name; most of the time, the full name with the first 4 letters chopped
off. I just hope that the data I'm reading is accurate, because I am basing
my assumptions on this...

I read in other posts that IIS6 supposedly calls CoFreeUnusedLibraries every
5 minutes to unload unused modules. This would explain why some dlls get
unloaded. I did some testing, and in fact dlls do get unloaded and reloaded
as part of the normal production operation. It seems this has been that way
since way before the crashes started happening. Granted, it's not the best
performace-wise, but it did work up until recently without any glitches or
crashes.

Something that I did find a bit weird, and maybe someone can confirm if this
is normal IIS behavior, is that pretty much every IIS worker thread attempts
to unload the unused dlls about every 5 minutes. In PROD, that meant that I
had about 50 different threads that were all concurrently trying to unload
libraries. Is this normal? Could this lead to some sort of race condition?
Honestly, I was expecting a single thread to be running the unload routine
every 5 minutes.



Does anyone have any idea what might actually be going on in our Production
system that is causing these intermittent crashes and what we can do to
resolve the problem?
What I am particularly interested in, is if anyone has any idea what we
might have changed that suddenly caused the crashes to start happening...

Any help is much appreciated,
sam