From: sam on 25 Jun 2010 18:25 Our IIS worker processes are crashing in our Production environment intermittently with a memory access violation exception. Our Production Configuration: - 2 VMWare ESX Server 3.5.0 instances - Each VMWare instance is running 3 web servers (Windows Server 2003/IIS6) The crash frequency is pretty random. Some days we may get no crashes. Other days, we may get 1 or 2 from different servers. And other days, we may get more than that, and even multiple crashes from the same server. So far, it does not appear to happen more or less often depending on workload. It just seems pretty random. To get more information on the crashes, I attached adplus in crash mode to our w3wp processes and collected information on the exception. I am pasting 2 shortened samples below from 2 different crash occurrences: --------------------------------------------------------------------------------------------------------------------------- 1st sample: --------------------------------------------------------------------------------------------------------------------------- --- --- 2nd chance AccessViolation exception ---- --------------------------------------------------------------- Occurrence happened at: Debug session time: Wed Jun 16 21:16:58.906 2010 (GMT-4) System Uptime: 11 days 12:31:39.624 Process Uptime: 0 days 22:22:16.080 Kernel time: 0 days 0:20:20.921 User time: 0 days 0:27:50.171 Faulting stack below --- *** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\WINDOWS\system32\USER32.dll - # ChildEBP RetAddr Args to Child WARNING: Frame IP not in any known module. Following frames may be wrong. Missing image name, possible paged-out or corrupt data. Missing image name, possible paged-out or corrupt data. Missing image name, possible paged-out or corrupt data. 00 0a6efd94 7739b6e3 04d20158 0000001e 00000000 <Unloaded_leup.dll>+0x23484107 01 0a6efdc0 7739b874 23640fef 04d20158 0000001e USER32!LoadCursorW+0x4cf5 02 0a6efe38 7739ba92 00000000 23640fef 04d20158 USER32!LoadCursorW+0x4e86 03 0a6efea0 7739bad0 08a3499c 00000000 0a6efecc USER32!TranslateMessageEx+0x10d *** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\WINDOWS\system32\comsvcs.dll - 04 0a6efeb0 4a77bde2 08a3499c 08a34998 08a34908 USER32!DispatchMessageW+0xf 05 0a6efecc 4a77bcf2 08a34928 08a34908 08a3498c comsvcs!DllUnregisterServer+0x270 06 0a6efee4 4a77c7de 08a34998 00000001 08a34908 comsvcs!DllUnregisterServer+0x180 07 0a6eff04 4a77cabf 00000000 01968628 01944b38 comsvcs!DllUnregisterServer+0xc6c *** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\WINDOWS\system32\msvcrt.dll - 08 0a6eff84 77bcb530 08a34908 00000000 00000000 comsvcs!DllUnregisterServer+0xf4d *** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\WINDOWS\system32\kernel32.dll - 09 0a6effb8 77e64829 01944b38 00000000 00000000 msvcrt!endthreadex+0xa3 0a 0a6effec 00000000 77bcb4bc 01944b38 00000000 kernel32!GetModuleHandleA+0xdf Creating C:\Program Files\Debugging Tools for Windows (x86)\Crash_Mode__Date_06-15-2010__Time_22-56-35PM\PID-248336__W3WP.EXE_-DefaultAppPool-__2nd_chance_AccessViolation__full_cc90_2010-06-16_21-16-59-093_ca10.dmp - mini user dump Dump successfully written Executing custom commands User Mode Time Thread Time 89:3e074 0 days 0:00:22.406 ------------- (*) Note: Because I didn't have the symbols, my stack trace was a bit off. I added the symbols when I opened the crash dump in VS2010, and here is the corrected stack trace: Exception Code: 0xC0000005 Exception Information: The thread tried to read from or or write to a virtual address for which it does not have the appropriate access. > 23484178() user32.dll!_InternalCallWinProc(a)20() + 0x28 bytes user32.dll!_UserCallWinProcCheckWow(a)32() + 0xa2 bytes user32.dll!_DispatchMessageWorker@8() + 0xc8 bytes user32.dll!_DispatchMessageW@4() + 0xf bytes comsvcs.dll!CSTAQueueLessMessageWork::DoWork() + 0x4e bytes comsvcs.dll!CSTAThread::DoWork() + 0x18 bytes comsvcs.dll!CSTAThread::ProcessQueueWork() + 0x37 bytes comsvcs.dll!CSTAThread::WorkerLoop() + 0x190 bytes msvcrt.dll!__endthreadex() + 0xa3 bytes kernel32.dll!_BaseThreadStart@8() + 0x34 bytes --------------------------------------------------------------------------------------------------------------------------- 2nd sample: --------------------------------------------------------------------------------------------------------------------------- --- --- 2nd chance AccessViolation exception ---- --------------------------------------------------------------- Occurrence happened at: Debug session time: Thu Jun 17 14:22:01.049 2010 (GMT-4) System Uptime: 12 days 5:36:39.748 Process Uptime: 0 days 17:04:30.659 Kernel time: 0 days 0:16:12.078 User time: 0 days 0:21:33.515 Faulting stack below --- *** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\WINDOWS\system32\USER32.dll - # ChildEBP RetAddr Args to Child WARNING: Frame IP not in any known module. Following frames may be wrong. 00 0a19fd94 7739b6e3 00ad026e 0000001e 00000000 <Unloaded_ON~1.DLL>+0x1e714127 01 0a19fdc0 7739b874 14520fef 00ad026e 0000001e USER32!LoadCursorW+0x4cf5 02 0a19fe38 7739ba92 00000000 14520fef 00ad026e USER32!LoadCursorW+0x4e86 03 0a19fea0 7739bad0 0a98520c 00000000 0a19fecc USER32!TranslateMessageEx+0x10d *** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\WINDOWS\system32\comsvcs.dll - 04 0a19feb0 4a77bde2 0a98520c 0a985208 0a985178 USER32!DispatchMessageW+0xf 05 0a19fecc 4a77bcf2 0a985198 0a985178 0a9851fc comsvcs!DllUnregisterServer+0x270 06 0a19fee4 4a77c7de 0a985208 00000001 0a985178 comsvcs!DllUnregisterServer+0x180 07 0a19ff04 4a77cabf 00000000 01943518 0193de60 comsvcs!DllUnregisterServer+0xc6c *** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\WINDOWS\system32\msvcrt.dll - 08 0a19ff84 77bcb530 0a985178 00000000 00000000 comsvcs!DllUnregisterServer+0xf4d *** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\WINDOWS\system32\kernel32.dll - 09 0a19ffb8 77e64829 0193de60 00000000 00000000 msvcrt!endthreadex+0xa3 0a 0a19ffec 00000000 77bcb4bc 0193de60 00000000 kernel32!GetModuleHandleA+0xdf Creating C:\Program Files\Debugging Tools for Windows (x86)\Crash_Mode__Date_06-17-2010__Time_00-36-36AM\PID-270104__W3WP.EXE_-DefaultAppPool-__2nd_chance_AccessViolation__full_22b0_2010-06-17_14-22-01-205_1f18.dmp - mini user dump Dump successfully written Executing custom commands User Mode Time Thread Time 7:40e88 0 days 0:00:00.000 -------------- (*) Note: Here is the corrected stack trace with the symbols: Exception Code: 0xC0000005 Exception Information: The thread tried to read from or or write to a virtual address for which it does not have the appropriate access. > 1e714178() user32.dll!_InternalCallWinProc(a)20() + 0x28 bytes user32.dll!_UserCallWinProcCheckWow(a)32() + 0xa2 bytes user32.dll!_DispatchMessageWorker@8() + 0xc8 bytes user32.dll!_DispatchMessageW@4() + 0xf bytes comsvcs.dll!CSTAQueueLessMessageWork::DoWork() + 0x4e bytes comsvcs.dll!CSTAThread::DoWork() + 0x18 bytes comsvcs.dll!CSTAThread::ProcessQueueWork() + 0x37 bytes comsvcs.dll!CSTAThread::WorkerLoop() + 0x190 bytes msvcrt.dll!__endthreadex() + 0xa3 bytes kernel32.dll!_BaseThreadStart@8() + 0x34 bytes --------------------------------------------------------------------------------------------------------------------------- What I noticed is that the crash always occur while trying to execute an instruction from an unloaded dll... <Unloaded_XXXX.DLL>... According to other posts, this would indicate a bug in the dll, perhaps a ref counting bug of some sort. However, what worries me is that: 1) the crashes do not consistently happen on the same dll. Already, the 2 samples above are different. And I checked several others that crashed in other dlls as well. Does that mean that all the different dlls (all from different third party vendors) are buggy? 2) These crashes all started right after we updated our web application build on June 5th 2010, which seems to indicate that something we did during that update is causing (or exposing) all of this. Yet, none of the crashing dlls were changed during this update. No Windows or IIS software or configuration was modified. We did add a few third party libraries, and we did add the VC++ 2005/2008 Runtimes, but I can't figure out what they have to do with anything. So again, it's hard to point the finger at the "unloaded" dlls, since they were there since before the crashes started to occur. BTW, is it normal that the <Unloaded_DllNameGoesHere.DLL> does not include the full name of the dll? For instance, the <Unloaded_leup.dll> seems to me like it was referring to safilup.dll that we do use. I haven't figured out which dll <Unloaded_ON~1.DLL> is referring to. And other traces where I have been able to determine the dll, they all have an incomplete portion of the dll name; most of the time, the full name with the first 4 letters chopped off. I just hope that the data I'm reading is accurate, because I am basing my assumptions on this... I read in other posts that IIS6 supposedly calls CoFreeUnusedLibraries every 5 minutes to unload unused modules. This would explain why some dlls get unloaded. I did some testing, and in fact dlls do get unloaded and reloaded as part of the normal production operation. It seems this has been that way since way before the crashes started happening. Granted, it's not the best performace-wise, but it did work up until recently without any glitches or crashes. Something that I did find a bit weird, and maybe someone can confirm if this is normal IIS behavior, is that pretty much every IIS worker thread attempts to unload the unused dlls about every 5 minutes. In PROD, that meant that I had about 50 different threads that were all concurrently trying to unload libraries. Is this normal? Could this lead to some sort of race condition? Honestly, I was expecting a single thread to be running the unload routine every 5 minutes. Does anyone have any idea what might actually be going on in our Production system that is causing these intermittent crashes and what we can do to resolve the problem? What I am particularly interested in, is if anyone has any idea what we might have changed that suddenly caused the crashes to start happening... Any help is much appreciated, sam
|
Pages: 1 Prev: Deleting log files Next: XMLHTTP.send throws "Operation aborted" exception |