Prev: Debian virus/spy-ware detection and detection technique.
Next: Alternative to snapshot.debian.net
From: Matthew Glubb on 17 Jul 2010 04:20 Hi All, I have a problem replacing a failed disk with a LVM volume on a RAID1 array. Normally in the past when a disk has failed, I have dropped the offending disk from the array, replaced the disk, booted, rebuilt the filesystem on the new disk and re-synced the array. I've done this about four times with this method. However, I recently upgraded from Etch to Lenny. This week, I had a degraded array warning; a disk is failing. So. I duly repeated the steps to replace the disk but on booting with the new unformatted disk, I get the following error: "Alert! dev/mapper/vg00-lv01 does not exist ... ....Dropping to shell" At the moment, I have had to reinstall the old, failing disk in order to be able to boot and run the system. Has anyone had this problem before? Does anyone know of any solution to it? I've included the relevant disk/raid configuration at the end of this email. The device /dev/sdb is the one that is failing. Thanks very much, Matt -- # cat /etc/fstab # /etc/fstab: static file system information. # # <file system> <mount point> <type> <options> <dump> <pass> proc /proc proc defaults 0 0 /dev/mapper/vg00-lv01 / ext3 defaults,errors=remount-ro 0 1 /dev/md0 /boot ext3 defaults 0 2 /dev/mapper/vg00-lv00 none swap sw 0 0 /dev/fd0 /media/floppy0 auto rw,user,noauto 0 0 #df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg00-lv01 442G 80G 340G 20% / tmpfs 1.7G 0 1.7G 0% /lib/init/rw udev 10M 720K 9.3M 8% /dev tmpfs 1.7G 0 1.7G 0% /dev/shm /dev/md0 942M 46M 849M 6% /boot # cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sda2[0] 478512000 blocks [2/1] [U_] md0 : active raid1 sda1[0] 979840 blocks [2/1] [U_] # lvdisplay /dev/mapper/vg00-lv01 --- Logical volume --- LV Name /dev/vg00/lv01 VG Name vg00 LV UUID tvzjKH-hSpH-sDYk-YlWY-osUY-VxrA-ka2UCW LV Write Access read/write LV Status available # open 1 LV Size 448.34 GB Current LE 114776 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:1 The next one is swap space: # lvdisplay /dev/mapper/vg00-lv00 --- Logical volume --- LV Name /dev/vg00/lv00 VG Name vg00 LV UUID aosfiq-oBUr-70Xn-Y5OJ-lsSV-i59V-nTXJG6 LV Write Access read/write LV Status available # open 2 LV Size 8.00 GB Current LE 2048 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:0 # fdisk -l Disk /dev/sda: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sda1 * 1 122 979933+ fd Linux raid autodetect /dev/sda2 123 59694 478512090 fd Linux raid autodetect Disk /dev/sdb: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdb1 * 1 122 979933+ fd Linux raid autodetect /dev/sdb2 123 59694 478512090 fd Linux raid autodetect -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/625072BE-4574-41FC-810B-C680C6108A1F(a)vaguerant.com
From: Alan Chandler on 17 Jul 2010 15:20 On 17/07/10 09:11, Matthew Glubb wrote: > Hi All, > > I have a problem replacing a failed disk with a LVM volume on a RAID1 array. Normally in the past when a disk has failed, I have dropped the offending disk from the array, replaced the disk, booted, rebuilt the filesystem on the new disk and re-synced the array. I've done this about four times with this method. However, I recently upgraded from Etch to Lenny. This week, I had a degraded array warning; a disk is failing. > > So. I duly repeated the steps to replace the disk but on booting with the new unformatted disk, I get the following error: > > "Alert! dev/mapper/vg00-lv01 does not exist ... > ...Dropping to shell" > > At the moment, I have had to reinstall the old, failing disk in order to be able to boot and run the system. Has anyone had this problem before? Does anyone know of any solution to it? > > I've included the relevant disk/raid configuration at the end of this email. The device /dev/sdb is the one that is failing. You don't include the one piece of information that shows that the volume group sits on the raid device. Can you do either pvdisplay, or vgdisplay -v Does it show the volume group sitting on raid device -- Alan Chandler http://www.chandlerfamily.org.uk -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/4C41FFA4.3080607(a)chandlerfamily.org.uk
From: Stan Hoeppner on 17 Jul 2010 16:20 Matthew Glubb put forth on 7/17/2010 3:11 AM: > Normally in the past when a disk has failed, I have dropped the offending disk from the array, replaced the disk, booted, rebuilt the filesystem on the new disk and re-synced the array. I've done this about four times with this method. Once you fix your immediate problem you really need to address the larger issue, which is: Why are you suffering so many disk failures, apparently on a single host? The probability of one OP/host suffering 4 disk failures, even over a long period such as 10 years, is astronomically low. If you manage a server farm of a few dozen or more hosts and had one disk failure on each of four of them, the odds are bit higher. However in your case we're not talking about a farm situation are we? Are these disks really failing, or are you seeing the software RAID driver flag disks that aren't really going bad? What make/model disk drives are these that are apparently failing? Do you have sufficient airflow in the case to cool the drives? Is the host in an environment with a constant ambient temperature over 80 degrees Fahrenheit? -- Stan -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/4C421052.9090701(a)hardwarefreak.com
From: Gabor Heja on 17 Jul 2010 19:50 Hello, The four failures seem really high to me too. This might be a silly question but: have you checked/replaced the controller and cables yet? I had a machine with four disks and one of them picked randomly were reported as bad every few weeks (all of them connected to the motherboard). I ruled out the cables and HDDs so I decided to put a PCI controller card in the machine and since then I got no errors. (Of course, my best choice would be to replace the motherboard, but that was not an option at that time.) Are you sure your disks are bad? Have you ran badblocks on them ("badblocks -vws" for read-WRITE mode, check man page before running)? Regards, Gabor On Sat, 17 Jul 2010 15:19:30 -0500, Stan Hoeppner <stan(a)hardwarefreak.com> wrote: > Matthew Glubb put forth on 7/17/2010 3:11 AM: > >> Normally in the past when a disk has failed, I have dropped the > offending disk from the array, replaced the disk, booted, rebuilt the > filesystem on the new disk and re-synced the array. I've done this about > four times with this method. > > Once you fix your immediate problem you really need to address the larger > issue, which is: > > Why are you suffering so many disk failures, apparently on a single host? > > The probability of one OP/host suffering 4 disk failures, even over a long > period such as 10 years, is astronomically low. If you manage a server > farm > of a few dozen or more hosts and had one disk failure on each of four of > them, > the odds are bit higher. However in your case we're not talking about a > farm > situation are we? > > Are these disks really failing, or are you seeing the software RAID driver > flag disks that aren't really going bad? What make/model disk drives are > these that are apparently failing? Do you have sufficient airflow in the > case > to cool the drives? Is the host in an environment with a constant ambient > temperature over 80 degrees Fahrenheit? > > -- > Stan > > > -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/af40b2729f935fb2666a1f9370b2c616(a)localhost
From: Alan Chandler on 20 Jul 2010 08:00 On 18/07/10 18:30, Matthew Glubb wrote: > Hi Alan, > > Thanks very much for your reply. Lets take this back to the list - not keep it between us - and I am subscribed to the list so no need to copy me. > > On 17 Jul 2010, at 20:08, Alan Chandler wrote: >> >> You don't include the one piece of information that shows that the volume group sits on the raid device. >> >> Can you do either pvdisplay, or vgdisplay -v >> >> Does it show the volume group sitting on raid device > > It appears to me to be showing the volume group sitting on the raid device. Any ideas what the problem might me? > I don't know. When I have had a problem before, I have just repartioned the old/new device and add these partitions using mdadm It then syncs up (albeit over several hours). I don't format it or put filesystems on it - which I think your original mail mentioned. > # vgdisplay -v > Finding all volume groups > Finding volume group "vg00" > Fixing up missing size (456.34 GB) for PV /dev/md1 > --- Volume group --- > VG Name vg00 > System ID > Format lvm2 > Metadata Areas 1 > Metadata Sequence No 3 > VG Access read/write > VG Status resizable > MAX LV 0 > Cur LV 2 > Open LV 2 > Max PV 0 > Cur PV 1 > Act PV 1 > VG Size 456.34 GB > PE Size 4.00 MB > Total PE 116824 > Alloc PE / Size 116824 / 456.34 GB > Free PE / Size 0 / 0 > VG UUID Urdpix-a5Ik-U1fq-Tw7T-umoT-paaR-e0s0Oz > > --- Logical volume --- > LV Name /dev/vg00/lv00 > VG Name vg00 > LV UUID aosfiq-oBUr-70Xn-Y5OJ-lsSV-i59V-nTXJG6 > LV Write Access read/write > LV Status available > # open 2 > LV Size 8.00 GB > Current LE 2048 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:0 > > --- Logical volume --- > LV Name /dev/vg00/lv01 > VG Name vg00 > LV UUID tvzjKH-hSpH-sDYk-YlWY-osUY-VxrA-ka2UCW > LV Write Access read/write > LV Status available > # open 1 > LV Size 448.34 GB > Current LE 114776 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:1 > > --- Physical volumes --- > PV Name /dev/md1 > PV UUID GSKYlk-d8z0-mGXj-kQny-gSMy-aOMR-zq1dXT > PV Status allocatable > Total PE / Free PE 116824 / 0 > > >> -- >> Alan Chandler >> http://www.chandlerfamily.org.uk >> >> >> -- >> To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org >> Archive: http://lists.debian.org/4C41FFA4.3080607(a)chandlerfamily.org.uk >> > -- Alan Chandler http://www.chandlerfamily.org.uk -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/4C458DED.9010800(a)chandlerfamily.org.uk
|
Next
|
Last
Pages: 1 2 Prev: Debian virus/spy-ware detection and detection technique. Next: Alternative to snapshot.debian.net |