Prev: [PATCH] drm/radeon/kms: Convert RV515 to new init path and associated cleanup
Next: net: VMware virtual Ethernet NIC driver: vmxnet3
From: Theodore Tso on 28 Sep 2009 16:30 On Mon, Sep 28, 2009 at 12:16:44PM -0700, Andy Isaacson wrote: > After a hard lockup and reboot, my test box (running recent Linus git > 851b147) came up with: > > [ 5.016854] EXT4-fs (sda1): mounted filesystem with ordered data mode > [ 8.809125] EXT4-fs (sda1): internal journal on sda1:8 > [ 10.165239] EXT4-fs error (device sda1): ext4_lookup: deleted inode referenced: 524788 > [ 10.165286] Aborting journal on device sda1:8. > [ 10.168111] EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal > [ 10.168169] EXT4-fs (sda1): Remounting filesystem read-only > [ 10.171614] EXT4-fs (sda1): Remounting filesystem read-only It would be useful to see what pathname is associated with inode 524788. You can use debugfs to find this out. For example to find a pathname which points to inode 14666, you can do this: # debugfs /dev/sda1 debugfs 1.41.9 (22-Aug-2009) debugfs: ncheck 14666 Inode Pathname 14666 /grub/menu.lst Also try using the debugfs stat command, send me the output, please: debugfs: stat <14666> > 2. after a lockup the journal recovery should not fail. I'm not sure it was a matter of the journal recovery failing. All we know for certain is that filesystem was corrupted after the lockup and remounting the filesystem. What caused the file system corruption is open to question at the moment; it could have been caused by the lockup; or it could have been a file that was deleted right about the time of the lockup; or it could have been some completely random filesystem corruption that. It would be useful to know whether the inode was in question was supposed to have been deleted. If it was, it would be useful to know if the dtime reported by debugfs's stat was around the time of the original lockup. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Theodore Tso on 28 Sep 2009 23:20
On Mon, Sep 28, 2009 at 02:28:38PM -0700, Andy Isaacson wrote: > > I've attached the complete output from "fsck -n /dev/sda1" and "stat > <%d>" on each inode reported to be deleted. > So the large numbers of multiply-claimed blocks message is definitely a clue: > Multiply-claimed block(s) in inode 919422: 3704637 > Multiply-claimed block(s) in inode 928410: 3704637 > Multiply-claimed block(s) in inode 928622: 3703283 > Multiply-claimed block(s) in inode 943927: 3703283 > Multiply-claimed block(s) in inode 933307: 3702930 > Multiply-claimed block(s) in inode 943902: 3702930 What this indicates to me is that an inode table block was written to the wrong location on disk. In fact, given large numbers of inode numbers involved, it looks like large numbers of inode table blocks were written to the wrong location on disk. So what happend with the file "/etc/rcS.d/S90mountdebugfs" is probably _not_ that it was deleted on September 22nd, but rather sometime recently the inode table block containing to inode #524788 was overwritten by another inode table block, containing a deleted inode at that relative position in the inode table block. This must have happened since the last successful boot, since with /etc/rcS.d/S90mountdebugfs pointing at a deleted inode, any attempt to boot the system after the corruption had taken place would have resulted in catastrophe. I'm surprised by how many inode tables blocks apparently had gotten mis-directed. Almost certainly there must have been some kind of hardware failure that must have triggered this. I'm not sure what caused it, but it does seem like your filesystem has been toasted fairly badly. At this point my advice to you would be to try to recover as much data from the disk as you can, and to *not* try to run fsck or mount the filesystem read/write until you are confident you have recovered all of the critical files you care about, or have made a image copy of the disk using dd to a backup hard drive first. If you're really curious we could try to look at the dumpe2fs output and see if we can find the pattern of what might have caused so many misdirected writes, but there's no guarantee that we would be able to find the definitive root cause, and from a recovery perspective, it's probably faster and less risk to reinstall your system disk from scratch. Good luck, and I'm sorry your file system had gotten so badly disrupted. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |