From: John Stumbles on 4 Feb 2010 07:32 I have a server box with a raid setup using mdadm. It currently only has one drive (that's another story ...) but was working OK as a degraded array. However since rebooting the system last night it's no longer coming up. On bootup after the PC's normal F1-for-bios message (where I can go into BIOS setup) I get a Tab-for-raid-setup, and if I do that I can see that the SATA card's VIA BIOS is seeing the drive. Normal Linux bootup (this is Debian stable) halts as fsck fails because it can't find /dev/md0. dmesg seems to be seeing the drive and reports sd 3:0:0:0 [sdb] Attached SCSI disk Normally mdadm should Just Work and create /dev/md0 from the physical drive. Any suggestions what's going wrong, or what to look for next? -- John Stumbles If we'd known how much fun grandchildren are we'd have had them first
From: Aragorn on 4 Feb 2010 10:18 On Thursday 04 February 2010 13:32 in comp.os.linux.misc, somebody identifying as John Stumbles wrote... > I have a server box with a raid setup using mdadm. It currently only > has one drive (that's another story ...) [... But one that may be relevant. :-) If the other hard disk(s) have failed, then there is a chance that your now last remaining disk is starting to fail as well. > ...] but was working OK as a degraded array. However since rebooting > the system last night it's no longer coming up. Hmm... You don't keep this server running 24/7? > On bootup after the PC's normal F1-for-bios message (where I can go > into BIOS setup) I get a Tab-for-raid-setup, and if I do that I can > see that the SATA card's VIA BIOS is seeing the drive. This is irrelevant, though. Motherboard RAID implementations are typically FakeRAID solutions. The RAID functionality of those chipsets is limited to real mode only so that an operating system can boot from such a RAID array and then implement its software RAID solution on it once the drivers have been loaded. In other words, activating the RAID in the BIOS would normally only work until Linux is taking over, which in this case, it appears to be failing to do. > Normal Linux bootup (this is Debian stable) halts as fsck fails > because it can't find /dev/md0. Did you by any chance install, remove or update any software on this system recently without that it had been shut down or rebooted since? If so, it might be a problem with /udev/ - which could also be related to /sysfs/ of course. Modern GNU/Linux systems usually use /udev/ to create the device special files on demand. > dmesg seems to be seeing the drive and reports > sd 3:0:0:0 [sdb] Attached SCSI disk So the kernel sees the drive, but somehow the device special file is not created. Again, with the knowledge I have at the moment, things seem to point at /udev/ and/or /sysfs/ - is it mounted from within the initrd? > Normally mdadm should Just Work and create /dev/md0 from the physical > drive. Any suggestions what's going wrong, or what to look for next? Have you tried booting up with a rescue CD or a live CD to see whether the drive is recognized and/or whether the device special files for it are created? -- *Aragorn* (registered GNU/Linux user #223157)
From: Nigel Wade on 4 Feb 2010 11:09 On Thu, 04 Feb 2010 16:18:05 +0100, Aragorn wrote: > On Thursday 04 February 2010 13:32 in comp.os.linux.misc, somebody > identifying as John Stumbles wrote... > >> Normally mdadm should Just Work and create /dev/md0 from the physical >> drive. Any suggestions what's going wrong, or what to look for next? > > Have you tried booting up with a rescue CD or a live CD to see whether > the drive is recognized and/or whether the device special files for it > are created? Whilst booted to a rescue environment, verify from the partition table that the relevant partitions have type 'fd' (Linux raid autodetect). Also check that the magic numbers match those /etc/mdadm.conf. The md system only automatically starts raids which are defined in /etc/ mdadm.conf (at least it does on RedHat and OpenSUSE). -- Nigel Wade
From: John Stumbles on 4 Feb 2010 18:03 On Thu, 04 Feb 2010 16:18:05 +0100, Aragorn wrote: > On Thursday 04 February 2010 13:32 in comp.os.linux.misc, somebody > identifying as John Stumbles wrote... > >> I have a server box with a raid setup using mdadm. It currently only >> has one drive (that's another story ...) [... > > But one that may be relevant. :-) If the other hard disk(s) have > failed, then there is a chance that your now last remaining disk is > starting to fail as well. True, or the SATA card playing up. I've had 2 similar cards (eBay cheapies) go bad so I'm not too confident of this one. The SATA card BIOS sees the drive, Linux also sees it, the drive itself is OK in another machine, and an identical (make/model/formatting) drive which works in another machine similarly isn't recognised in this one. >> ...] but was working OK as a degraded array. However since rebooting >> the system last night it's no longer coming up. > > Hmm... You don't keep this server running 24/7? Normally I do. The machine also has an external USB drive attached which is used for backups and that had stopped working. I had physically moved the drive (gently, trying to keep the connections intact) so assumed I must have inadvertently interrupted the connection to that as it became unmounted so rebooted to try to sort it out. Which is when the fun started :-( >> On bootup after the PC's normal F1-for-bios message (where I can go >> into BIOS setup) I get a Tab-for-raid-setup, and if I do that I can see >> that the SATA card's VIA BIOS is seeing the drive. > > This is irrelevant, though. Motherboard RAID implementations are > typically FakeRAID solutions. Yes, I'm using mdadm, not the SATA BIOS RAID: I just referred to the latter to show that the drive seemed to be recognised by the system. >> Normal Linux bootup (this is Debian stable) halts as fsck fails because >> it can't find /dev/md0. > > Did you by any chance install, remove or update any software on this > system recently without that it had been shut down or rebooted since? No. >> dmesg seems to be seeing the drive and reports sd 3:0:0:0 [sdb] >> Attached SCSI disk > > So the kernel sees the drive, but somehow the device special file is not > created. Again, with the knowledge I have at the moment, things seem to > point at /udev/ and/or /sysfs/ - is it mounted from within the initrd? You've lost me there. There isn't a /udev or a /sysfs (or were you italicising those names?). And is what mounted from within initrd, and how would I know? >> Normally mdadm should Just Work and create /dev/md0 from the physical >> drive. Any suggestions what's going wrong, or what to look for next? > > Have you tried booting up with a rescue CD or a live CD to see whether > the drive is recognized and/or whether the device special files for it > are created? knoppix sees the drive (as /dev/sdb - /dev/sda is a small drive the system boots off) but debian doesn't (any more). Hmmm...... -- John Stumbles The rain, it rains upon the Just, and on the Unjust fella But more upon the Just because the Unjust's got the Just's umbrella
From: Aragorn on 4 Feb 2010 19:10
On Friday 05 February 2010 00:03 in comp.os.linux.misc, somebody identifying as John Stumbles wrote... > On Thu, 04 Feb 2010 16:18:05 +0100, Aragorn wrote: > >> On Thursday 04 February 2010 13:32 in comp.os.linux.misc, somebody >> identifying as John Stumbles wrote... >> >>> I have a server box with a raid setup using mdadm. It currently only >>> has one drive (that's another story ...) [... >> >> But one that may be relevant. :-) If the other hard disk(s) have >> failed, then there is a chance that your now last remaining disk is >> starting to fail as well. > > True, or the SATA card playing up. I've had 2 similar cards (eBay > cheapies) go bad so I'm not too confident of this one. The SATA > card BIOS sees the drive, Linux also sees it, the drive itself is OK > in another machine, and an identical (make/model/formatting) drive > which works in another machine similarly isn't recognised in this one. Well, it is possible for two specimens of the same RAID controller to differently format the disks so that they are only usable on the controller they were formatted on. On the other hand, if you've had trouble with this kind of controllers before... >>> ...] but was working OK as a degraded array. However since rebooting >>> the system last night it's no longer coming up. >> >> Hmm... You don't keep this server running 24/7? > > Normally I do. The machine also has an external USB drive attached > which is used for backups and that had stopped working. This is another thing to investigate. Could be related... > I had physically moved the drive (gently, trying to keep the > connections intact) so assumed I must have inadvertently interrupted > the connection to that as it became unmounted so rebooted to try to > sort it out. Which is when the fun started :-( Normally, the reconnection and subsequent reboot ought to guarantee normal operation again. You might be running into a forced filesystem check upon boot due to the unclean shutdown, but all should normally work as before again. Can you, by means of your Knoppix CD, peruse the "/var/log/messages" on the hard disk for any possible error messages? >>> On bootup after the PC's normal F1-for-bios message (where I can go >>> into BIOS setup) I get a Tab-for-raid-setup, and if I do that I can >>> see that the SATA card's VIA BIOS is seeing the drive. >> >> This is irrelevant, though. Motherboard RAID implementations are >> typically FakeRAID solutions. > > Yes, I'm using mdadm, not the SATA BIOS RAID: I just referred to the > latter to show that the drive seemed to be recognised by the system. Okay, so the SATA card sees the disk and the kernel also sees it. My guess at this stage would be filesystem damage which may have erased your "/etc/mdadm.conf" or damaged your boot-up scripts. >>> dmesg seems to be seeing the drive and reports sd 3:0:0:0 [sdb] >>> Attached SCSI disk >> >> So the kernel sees the drive, but somehow the device special file is >> not created. Again, with the knowledge I have at the moment, things >> seem to point at /udev/ and/or /sysfs/ - is it mounted from within >> the initrd? > > You've lost me there. There isn't a /udev or a /sysfs (or were you > italicising those names?). They were italicized, yes. udev mounts a dynamic, tmpfs-based device filesystem on "/dev", and sysfs is an in-kernel pseudofilesystem, which - as a spinoff from procfs - is mounted on "/sys". The udev system uses the information in "/sys" to create or delete device special files in "/dev" as required. (Or at least, it should. It's not quite perfect yet.) > And is what mounted from within initrd, and how would I know? Well, considering the modular nature of most stock binary distribution kernels, udev is usually already activated from there - at least, RedHat and derivatives used to do it like that for a while; don't know whether they still are - so as to have a "/dev" population already available before the actual on-disk root filesystem is mounted. Other approaches - e.g. in Gentoo - are to launch udev from the init scripts and have it already mount the dynamic "/dev" population from there, along with sysfs on "/sys" and devpts on "/dev/pts". This as opposed to having "/sys", "/dev" and "/dev/pts" mounted at a later stage (at mount time of the additional local filesystems) by the "mount -a" command and the information in "/etc/fstab". If your init scripts are still intact, you should be able to ascertain in the boot scripts at which stage udev is started and "/dev/" and "/sys" are loaded. >>> Normally mdadm should Just Work and create /dev/md0 from the >>> physical drive. Any suggestions what's going wrong, or what to look >>> for next? >> >> Have you tried booting up with a rescue CD or a live CD to see >> whether the drive is recognized and/or whether the device special >> files for it are created? > > knoppix sees the drive (as /dev/sdb - /dev/sda is a small drive the > system boots off) but debian doesn't (any more). > > Hmmm...... Anything else you were able to ascertain while perusing the on-disk root filesystem from the Knoppix environment? -- *Aragorn* (registered GNU/Linux user #223157) |