Prev: vmlinux.lds.h: allow people to set a smaller rootfs alignment
Next: x86/sfi: fix ioapic gsi range
From: Jeffrey Merkey on 7 Jun 2010 20:00 ---------- Forwarded message ---------- From: Jeffrey Merkey <jeffmerkey(a)gmail.com> Date: Mon, Jun 7, 2010 at 5:54 PM Subject: Re: EXT3 File System Corruption 2.6.34 To: Eric Sandeen <sandeen(a)sandeen.net> REPLY TO ALL CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set Whether set this way or not, should not see corruption. �I am seeing data corruption including the following: /boot/grub/grub.conf getting filled with binary chars /root/.viminfo filled with strange text chars (not binary) ..o files filled with the same garbage. Looks like EXT3 meta data -- maybe some blocks getting transposed somewhere? I will recreate the data patterns I see during corruption and post here. �They are consitent with some sort of fill pattern -- at least what I see in viminfo is. In the case of corrupted .o files, the endian headers are missing and trashed in the OBJ section headers -- chances are the same kind of garbage. Jeff >> Still seeing file system corruption after journal recovery in EXT3. >> It's easy to reproduce, though the symptoms vary. �One way is to >> rebuild a program and while the program is being compiled just shut >> off power to the system by pulling the plug. �I am seeing the >> /root/.viminfo file trashed after recovery if Vim was active during >> poweroff. �I am also seeing object modules getting built which the LD >> linker claims are "invalid" following a recovery event. �I suspect a >> bug in the buffer cache since deleting the file still causes the old >> data to be returned from buffer cache even when the sectors are >> overwritten, but both are interrelated. �Seems in some way related to >> EXT3 recovery which results in the buffer cache returning old sectors >> and junk. >> >> Not hard to reproduce, but the symptoms are always a little different >> but the /root/.viminfo file getting nuked seems a common affect of >> this bug. > > "file system corruption" usually means corrupted metadata, but I guess > here you mean file corruption, i.e. corrupted data. > > If you have buffered data in the cache, it will be lost when you pull > the plug. �If your userspace doesn't sync it, this is expected. �But it's > not clear to me what you're seeing. > > I'm also not clear on what you mean about deleting the file and having old > data returned. �Maybe a little cut and paste from the screen would help > explain what you see. > > I'd also check CONFIG_EXT3_DEFAULTS_TO_ORDERED and be sure you're > using data=ordered mode by default. > > -Eric > >> Jeff > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Eric Sandeen on 7 Jun 2010 22:10 Jeffrey Merkey wrote: > OK. I will set this up. You may want to make this option the default > in the build scripts. here is a corrupted file. It was default, but Linus changed it a while back. > This was a .gif > image file I saved THEN AFTER SAVING THE FILE I pulled the power to > the machine and during recovery the file was FUCKED. I assume your application did not sync the data, and buffered data loss is expected on a power loss. > At any rate, > this does not happen with 2.6.28. that I can't explain for sure.... different timing perhaps. > I dumped the file with xdump a util I use internally for my own use so > you could see the file contents as text and I could post it here. > This was an image file but look what ended up in it -- directory > blocks and such. Take a look: As I said, stale blocks exposed due to data=writeback. Known behavior, unfortunately the default for ext3. If you find similar problems when mounted data=ordered, it's a more interesting report. -Eric > 0 1 2 3 4 5 6 7 8 9 A B C D E F > 00000000 6C 73 0A 63 64 20 2E 2E 0A 63 6C 73 0A 6C 73 0A ls.cd ...cls.ls. > 00000010 63 64 20 6C 69 6E 75 78 2D 32 2E 36 2E 33 34 2D cd linux-2.6.34- > 00000020 6D 64 62 2F 0A 63 6C 73 0A 6C 73 0A 63 64 20 2E mdb/.cls.ls.cd . > 00000030 2E 0A 63 6C 73 0A 6C 73 0A 63 64 20 6C 69 6E 75 ..cls.ls.cd linu <giant snip> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Bill Davidsen on 10 Jun 2010 17:10 Jeffrey Merkey wrote: > ---------- Forwarded message ---------- > From: Jeffrey Merkey <jeffmerkey(a)gmail.com> > Date: Mon, Jun 7, 2010 at 7:55 PM > Subject: Re: EXT3 File System Corruption 2.6.34 > To: Eric Sandeen <sandeen(a)sandeen.net> > > >> On Jun 7, 2010, at 6:55 PM, Jeffrey Merkey <jeffmerkey(a)gmail.com> wrote: >> >>> ---------- Forwarded message ---------- >>> From: Jeffrey Merkey <jeffmerkey(a)gmail.com> >>> Date: Mon, Jun 7, 2010 at 5:54 PM >>> Subject: Re: EXT3 File System Corruption 2.6.34 >>> To: Eric Sandeen <sandeen(a)sandeen.net> >>> >>> >>> REPLY TO ALL >>> >>> CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set >>> >>> Whether set this way or not, should not see corruption. >> Here you are mistaken. Mount with data=ordered and see. Writeback can >> expose stale data. >> >> -Eric >> > > OK. I will set this up. You may want to make this option the default > in the build scripts. here is a corrupted file. This was a .gif > image file I saved THEN AFTER SAVING THE FILE I pulled the power to > the machine and during recovery the file was FUCKED. At any rate, > this does not happen with 2.6.28. > Having bad things happen when power is removed is not much of a surprise, and various options can fix that at the cost of speed. The fact that this didn't happen with 2.6.28 is bothersome. I actually take some care to avoid testing behavior in this area, not my normal intended mode of operation. -- Bill Davidsen <davidsen(a)tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jeffrey Merkey on 11 Jun 2010 12:30 Well, I set the system to the default ordered mode and the problem went away. EXT3 recovers nicely now. I run across this all the time since I develop high speed kernel stuff and have a lot of cases where a bug crashes the system. This time it showed up while developing the MDB debugger with the hw_breakpoint interface which caused the system to crash until I figured out this newer interface had hooked the notify_die handlers and was trapping breakpoints which caused a lot of hangs until I fixed it, so it is something I ran across coincidently. The default ordered mode makes ext3 robust again. Jeff On Thu, Jun 10, 2010 at 3:04 PM, Bill Davidsen <davidsen(a)tmr.com> wrote: > Jeffrey Merkey wrote: >> >> ---------- Forwarded message ---------- >> From: Jeffrey Merkey <jeffmerkey(a)gmail.com> >> Date: Mon, Jun 7, 2010 at 7:55 PM >> Subject: Re: EXT3 File System Corruption 2.6.34 >> To: Eric Sandeen <sandeen(a)sandeen.net> >> >> >>> On Jun 7, 2010, at 6:55 PM, Jeffrey Merkey <jeffmerkey(a)gmail.com> wrote: >>> >>>> ---------- Forwarded message ---------- >>>> From: Jeffrey Merkey <jeffmerkey(a)gmail.com> >>>> Date: Mon, Jun 7, 2010 at 5:54 PM >>>> Subject: Re: EXT3 File System Corruption 2.6.34 >>>> To: Eric Sandeen <sandeen(a)sandeen.net> >>>> >>>> >>>> REPLY TO ALL >>>> >>>> CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set >>>> >>>> Whether set this way or not, should not see corruption. >>> >>> Here you are mistaken. �Mount with data=ordered and see. �Writeback can >>> expose stale data. >>> >>> -Eric >>> >> >> OK. �I will set this up. �You may want to make this option the default >> in the build scripts. �here is a corrupted file. �This was a .gif >> image file I saved THEN AFTER SAVING THE FILE I pulled the power to >> the machine and during recovery the file was FUCKED. �At any rate, >> this does not happen with 2.6.28. >> > Having bad things happen when power is removed is not much of a surprise, > and various options can fix that at the cost of speed. The fact that this > didn't happen with 2.6.28 is bothersome. > > I actually take some care to avoid testing behavior in this area, not my > normal intended mode of operation. > > -- > Bill Davidsen <davidsen(a)tmr.com> > �"We have more to fear from the bungling of the incompetent than from > the machinations of the wicked." �- from Slashdot > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Pages: 1 Prev: vmlinux.lds.h: allow people to set a smaller rootfs alignment Next: x86/sfi: fix ioapic gsi range |