From: Dirk Zabel on 30 May 2007 06:25 Hi, first of all, thanks to all who responded. I played around with the disassembly command and got the following infos: the instruction '18 a0 ff 15 b0 6b' is indeed suspicious as Ivan Brugiolo wrote. The address is at a00029fd = LeaveCrit+04x, i.e. not inside my binary. This address ist not at the begin but in the middle of the first instruction of LeaveCriticalSection If I disassemble LeaveCtriticalSection, I see: win32k!LeaveCrit: a0002667 8b0de02318a0 mov ecx,dword ptr [win32k!gpresUser (a01823e0)] a000266d ff15b06b17a0 call dword ptr [win32k!_imp_ExReleaseResourceLite (a0176bb0)] This explains the MISALIGNED_IP line in the output of the !analyze - command. Unfortunately, the whole stack trace does not include instructions from user mode address space, so I cannot see what user program called KiThreadStartup eventually. The next lower stack frame reads bfd6fb5c a00ad527 00000000 00000000 00000000 win32k!TimersProc+0x133 and disassembly of a00ad520 gives a00ad520 75a6 jne win32k!RawInputThread+0x4ea (a00ad4c8) a00ad522 e8ab53f5ff call win32k!TimersProc (a00028d2) a00ad527 a1dc9e18a0 mov eax,dword ptr [win32k!gnRetryReadInput (a0189edc)] Disassembly of TimersProc begins with win32k!TimersProc: a00028d2 55 push ebp a00028d3 8bec mov ebp,esp a00028d5 83ec0c sub esp,0Ch a00028d8 53 push ebx a00028d9 56 push esi a00028da 57 push edi a00028db e8b8fdffff call win32k!EnterCrit (a0002698) a00028e0 ba0000fe7f mov edx,offset SharedUserData (7ffe0000) a00028e5 8b02 mov eax,dword ptr [edx] a00028e7 f76204 mul eax,dword ptr [edx+4] a00028ea 0facd018 shrd eax,edx,18h a00028ee 8b350c2518a0 mov esi,dword ptr [win32k!gptmrFirst (a018250c)] a00028f4 8bf8 mov edi,eax i.e. TimersProc calls EnterCrit(icalSection?), some pages later I see a00029db 99 cdq a00029dc 68f0d8ffff push 0FFFFD8F0h a00029e1 52 push edx a00029e2 50 push eax a00029e3 e8dcfcffff call win32k!_allmul (a00026c4) a00029e8 6a00 push 0 a00029ea 52 push edx a00029eb 50 push eax a00029ec ff35042518a0 push dword ptr [win32k!gptmrMaster (a0182504)] a00029f2 ff15ec6d17a0 call dword ptr [win32k!_imp__KeSetTimer (a0176dec)] a00029f8 e86afcffff call win32k!LeaveCrit (a0002667) a00029fd 5f pop edi a00029fe 5e pop esi a00029ff 5b pop ebx a0002a00 c9 leave a0002a01 c3 ret Now I don't see how any corruption of the user-mode program or corruption of the data which user-mode program feeds to a system call can result in an incorrect jump INSIDE some instruction of a kernel-mode procedure -- this looks more like a hardware quirk sending the cpu to LeaveCriticalSecion+4 instead of LeaveCriticalSecion to me. Or is this impression too far-fetched? Some other question, though: I could not get an overview what processes where active when the fault occured. I had thought the command View | Processes and Threads does this. The result is only a window showing (transcript as ascii-art) : [-] 000:f0f0f0f0 ntoskrnl.exe +-000:1 This does not seem not to be a problem of this special dump, however, as I got the same when I produced deliberately a dump from some other W2k system running inside VirtualPC (using NotMyFault from Mark Russinovich), loaded the resulting full memory dump into windbg and tried the "Processes and Threads" command on this dump. I guess I am doing wrong something simple, but what? I did setup windbg to use the MS symbol server and the symbol cache seems to be ok. Thank you for any comments, - Dirk
From: Ivan Brugiolo [MSFT] on 30 May 2007 13:11 > > Now I don't see how any corruption of the user-mode program or > corruption of the data which user-mode program feeds to a system call > can result in an incorrect jump INSIDE some instruction of a > kernel-mode procedure -- this looks more like a hardware quirk sending > the cpu to LeaveCriticalSecion+4 instead of LeaveCriticalSecion to > me. Or is this impression too far-fetched? the byte-code for `call win32k!LeaveCrit (a0002667)` is `e8 6a fc ff ff`. It's encoded with a @eip relative offset. I'd bet that flipping one bit at the time in the offset you can easily get the `+4` displacement. This looks like code-page single-bit corruption. Unless you have ECC memory with MCA events, it's hard to make further progress -- -- This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm "Dirk Zabel" <dzabel(a)community.nospam> wrote in message news:8FF85CDB-71F8-4D1E-A25F-9B66B5B03841(a)microsoft.com... > Hi, > first of all, thanks to all who responded. > I played around with the disassembly command and got the following > infos: > the instruction '18 a0 ff 15 b0 6b' is indeed suspicious as Ivan > Brugiolo wrote. The address is at a00029fd = LeaveCrit+04x, i.e. not > inside my binary. This address ist not at the begin but in the middle > of the first instruction of LeaveCriticalSection > > If I disassemble LeaveCtriticalSection, I see: > win32k!LeaveCrit: > a0002667 8b0de02318a0 mov ecx,dword ptr [win32k!gpresUser > (a01823e0)] > a000266d ff15b06b17a0 call dword ptr > [win32k!_imp_ExReleaseResourceLite (a0176bb0)] > > This explains the MISALIGNED_IP line in the output of the !analyze - > command. > > Unfortunately, the whole stack trace does not include instructions > from user mode address space, so I cannot see what user program called > KiThreadStartup eventually. > > The next lower stack frame reads > > bfd6fb5c a00ad527 00000000 00000000 00000000 win32k!TimersProc+0x133 > and disassembly of a00ad520 gives > > a00ad520 75a6 jne win32k!RawInputThread+0x4ea (a00ad4c8) > a00ad522 e8ab53f5ff call win32k!TimersProc (a00028d2) > a00ad527 a1dc9e18a0 mov eax,dword ptr [win32k!gnRetryReadInput > (a0189edc)] > > Disassembly of TimersProc begins with > win32k!TimersProc: > a00028d2 55 push ebp > a00028d3 8bec mov ebp,esp > a00028d5 83ec0c sub esp,0Ch > a00028d8 53 push ebx > a00028d9 56 push esi > a00028da 57 push edi > a00028db e8b8fdffff call win32k!EnterCrit (a0002698) > a00028e0 ba0000fe7f mov edx,offset SharedUserData (7ffe0000) > a00028e5 8b02 mov eax,dword ptr [edx] > a00028e7 f76204 mul eax,dword ptr [edx+4] > a00028ea 0facd018 shrd eax,edx,18h > a00028ee 8b350c2518a0 mov esi,dword ptr [win32k!gptmrFirst > (a018250c)] > a00028f4 8bf8 mov edi,eax > > i.e. TimersProc calls EnterCrit(icalSection?), some pages later I see > > a00029db 99 cdq > a00029dc 68f0d8ffff push 0FFFFD8F0h > a00029e1 52 push edx > a00029e2 50 push eax > a00029e3 e8dcfcffff call win32k!_allmul (a00026c4) > a00029e8 6a00 push 0 > a00029ea 52 push edx > a00029eb 50 push eax > a00029ec ff35042518a0 push dword ptr [win32k!gptmrMaster (a0182504)] > a00029f2 ff15ec6d17a0 call dword ptr [win32k!_imp__KeSetTimer > (a0176dec)] > a00029f8 e86afcffff call win32k!LeaveCrit (a0002667) > a00029fd 5f pop edi > a00029fe 5e pop esi > a00029ff 5b pop ebx > a0002a00 c9 leave > a0002a01 c3 ret > > > Now I don't see how any corruption of the user-mode program or > corruption of the data which user-mode program feeds to a system call > can result in an incorrect jump INSIDE some instruction of a > kernel-mode procedure -- this looks more like a hardware quirk sending > the cpu to LeaveCriticalSecion+4 instead of LeaveCriticalSecion to > me. Or is this impression too far-fetched? > > Some other question, though: I could not get an overview what > processes where active when the fault occured. I had thought the > command View | Processes and Threads does this. The result is only a > window showing (transcript as ascii-art) : > [-] 000:f0f0f0f0 ntoskrnl.exe > +-000:1 > > This does not seem not to be a problem of this special dump, however, as I > got > the same when I produced deliberately a dump from some other W2k > system running inside VirtualPC (using NotMyFault from Mark > Russinovich), loaded the resulting full memory dump into windbg and > tried the "Processes and Threads" command on this dump. I guess I am > doing wrong something simple, but what? I did setup windbg to use the > MS symbol server and the symbol cache seems to be ok. > > Thank you for any comments, > > - Dirk > > > > >
From: Jeffrey Walton on 30 May 2007 14:37 On May 29, 11:52 pm, "Alexander Grigoriev" <a...(a)earthlink.net> wrote: > If a single byte is corrupted, it looks like memory fault. The OP needs to > run memory diagnostics. > > I dare to suggest my test fromhttp://home.earthlink.net/~alegr/download/memtest.htm Hi Alexander, You are probably correct. I'll add this to my war chest. Jeff > > "Ivan Brugiolo [MSFT]" <ivanb...(a)online.microsoft.com> wrote in messagenews:uNuwwNhoHHA.1240(a)TK2MSFTNGP04.phx.gbl... > > > > The instruction being executed `18 a0 ff 15 b0 6b` looks suspicious. > > Can you compare the code stream with a known good binary ? > > I'd supect some form of code corruption, one example of which > > (that I debugged recently from a crashdump) is reported below. > > The code you are crashing at is know to not take external input, > > and, to have been resonably stable, > > > > >SNIP [Dump Analysis] > > > > "Dirk Zabel" <dza...(a)community.nospam> wrote in message > >news:BCBED70E-F1C4-455F-BBE9-F742A77AEC9A(a)microsoft.com... > >> Hi, > >> on a Windows 2000 machine I had cooparating programs running (one is > >> communicating with an external device via the rs232 port and exchanges > >> data > >> with the other using udp/ip). The computer was running for some weeks > >> without > >> problems, but now a blue screen occured. I had configured it to generate > >> a > >> full memory dump in such case. When analyzing the dump using windbg, I > >> see > >> this: > >> kd> !analyze -v > >> SNIP > >> As far as I know, my programs cannot be responsible for any blue screen, > >> as > >> they are running on ring 3 (user mode). So I expected some driver to be > >> listet in the dump. But- Hide quoted text - > > - Show quoted text -
From: Dirk Zabel on 31 May 2007 02:39 Ivan Brugiolo [MSFT] schrieb: >> Now I don't see how any corruption of the user-mode program or >> corruption of the data which user-mode program feeds to a system call >> can result in an incorrect jump INSIDE some instruction of a >> kernel-mode procedure -- this looks more like a hardware quirk sending >> the cpu to LeaveCriticalSecion+4 instead of LeaveCriticalSecion to >> me. Or is this impression too far-fetched? > > > the byte-code for `call win32k!LeaveCrit (a0002667)` is `e8 6a fc ff > ff`. > It's encoded with a @eip relative offset. I'd bet that flipping one bit at > the time > in the offset you can easily get the `+4` displacement. > This looks like code-page single-bit corruption. > Unless you have ECC memory with MCA events, > it's hard to make further progress > Ok, I think this is exactly what happended. The !analyze -v output said: LAST_CONTROL_TRANSFER: from a00029fd to a000266b disassembly around the calling instruction: kd> u a00029e8 win32k!TimersProc+0x11e: a00029e8 6a00 push 0 a00029ea 52 push edx a00029eb 50 push eax a00029ec ff35042518a0 push dword ptr [win32k!gptmrMaster (a0182504)] a00029f2 ff15ec6d17a0 call dword ptr [win32k!_imp__KeSetTimer (a0176dec)] a00029f8 e86afcffff call win32k!LeaveCrit (a0002667) a00029fd 5f pop edi a00029fe 5e pop esi i.e. in a00029f8 there is a call to a00026677, but the next instruction executed was at a000266b. This happens if the displacement byte "6a" is incorrectly read as "66", so this seems to be a flip of bit 2. I will try the memory test Alexander recommended and take a look into the bios settings (memory timing). Thank you again to all who helped. I think I learned something about interpreting windbg output. Yours - Dirk
From: Dirk Zabel on 31 May 2007 02:45 Ivan Brugiolo [MSFT] schrieb: >> Now I don't see how any corruption of the user-mode program or >> corruption of the data which user-mode program feeds to a system call >> can result in an incorrect jump INSIDE some instruction of a >> kernel-mode procedure -- this looks more like a hardware quirk sending >> the cpu to LeaveCriticalSecion+4 instead of LeaveCriticalSecion to >> me. Or is this impression too far-fetched? > > > the byte-code for `call win32k!LeaveCrit (a0002667)` is `e8 6a fc ff > ff`. > It's encoded with a @eip relative offset. I'd bet that flipping one bit at > the time > in the offset you can easily get the `+4` displacement. > This looks like code-page single-bit corruption. > Unless you have ECC memory with MCA events, > it's hard to make further progress > Ok, I think this is exactly what happended. The !analyze -v output said: LAST_CONTROL_TRANSFER: from a00029fd to a000266b disassembly around the calling instruction: kd> u a00029e8 win32k!TimersProc+0x11e: a00029e8 6a00 push 0 a00029ea 52 push edx a00029eb 50 push eax a00029ec ff35042518a0 push dword ptr [win32k!gptmrMaster (a0182504)] a00029f2 ff15ec6d17a0 call dword ptr [win32k!_imp__KeSetTimer (a0176dec)] a00029f8 e86afcffff call win32k!LeaveCrit (a0002667) a00029fd 5f pop edi a00029fe 5e pop esi i.e. in a00029f8 there is a call to a00026677, but the next instruction executed was at a000266b. This happens if the displacement byte "6a" is incorrectly read as "66", so this seems to be a flip of bit 2. I will try the memory test Alexander recommended and take a look into the bios settings (memory timing). Thank you again to all who helped. I think I learned something about interpreting windbg output. Yours - Dirk
First
|
Prev
|
Pages: 1 2 Prev: Unable to add USB printer through command line Next: CreateProcessAsUser and delete ACL |