Problem Replacing LVM on RAID1 Disk [Debian]

Prev: Debian virus/spy-ware detection and detection technique.
Next: Alternative to snapshot.debian.net

From: Matthew Glubb on 17 Jul 2010 04:20

Hi All,

I have a problem replacing a failed disk with a LVM volume on a RAID1 array. Normally in the past when a disk has failed, I have dropped the offending disk from the array, replaced the disk, booted, rebuilt the filesystem on the new disk and re-synced the array. I've done this about four times with this method. However, I recently upgraded from Etch to Lenny. This week, I had a degraded array warning; a disk is failing.

So. I duly repeated the steps to replace the disk but on booting with the new unformatted disk, I get the following error:

"Alert! dev/mapper/vg00-lv01 does not exist ...
....Dropping to shell"

At the moment, I have had to reinstall the old, failing disk in order to be able to boot and run the system. Has anyone had this problem before? Does anyone know of any solution to it?

I've included the relevant disk/raid configuration at the end of this email. The device /dev/sdb is the one that is failing.

Thanks very much,

Matt

--
# cat /etc/fstab
# /etc/fstab: static file system information.
#
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc defaults 0 0
/dev/mapper/vg00-lv01 / ext3 defaults,errors=remount-ro 0 1
/dev/md0 /boot ext3 defaults 0 2
/dev/mapper/vg00-lv00 none swap sw 0 0
/dev/fd0 /media/floppy0 auto rw,user,noauto 0 0

#df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-lv01
442G 80G 340G 20% /
tmpfs 1.7G 0 1.7G 0% /lib/init/rw
udev 10M 720K 9.3M 8% /dev
tmpfs 1.7G 0 1.7G 0% /dev/shm
/dev/md0 942M 46M 849M 6% /boot

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda2[0]
478512000 blocks [2/1] [U_]

md0 : active raid1 sda1[0]
979840 blocks [2/1] [U_]

# lvdisplay /dev/mapper/vg00-lv01
--- Logical volume ---
LV Name /dev/vg00/lv01
VG Name vg00
LV UUID tvzjKH-hSpH-sDYk-YlWY-osUY-VxrA-ka2UCW
LV Write Access read/write
LV Status available
# open 1
LV Size 448.34 GB
Current LE 114776
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:1

The next one is swap space:

# lvdisplay /dev/mapper/vg00-lv00
--- Logical volume ---
LV Name /dev/vg00/lv00
VG Name vg00
LV UUID aosfiq-oBUr-70Xn-Y5OJ-lsSV-i59V-nTXJG6
LV Write Access read/write
LV Status available
# open 2
LV Size 8.00 GB
Current LE 2048
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0

# fdisk -l

Disk /dev/sda: 750.1 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000

Device Boot Start End Blocks Id System
/dev/sda1 * 1 122 979933+ fd Linux raid autodetect
/dev/sda2 123 59694 478512090 fd Linux raid autodetect

Disk /dev/sdb: 750.1 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 122 979933+ fd Linux raid autodetect
/dev/sdb2 123 59694 478512090 fd Linux raid autodetect

--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/625072BE-4574-41FC-810B-C680C6108A1F(a)vaguerant.com

From: Alan Chandler on 17 Jul 2010 15:20

On 17/07/10 09:11, Matthew Glubb wrote:
> Hi All,
>
> I have a problem replacing a failed disk with a LVM volume on a RAID1 array. Normally in the past when a disk has failed, I have dropped the offending disk from the array, replaced the disk, booted, rebuilt the filesystem on the new disk and re-synced the array. I've done this about four times with this method. However, I recently upgraded from Etch to Lenny. This week, I had a degraded array warning; a disk is failing.
>
> So. I duly repeated the steps to replace the disk but on booting with the new unformatted disk, I get the following error:
>
> "Alert! dev/mapper/vg00-lv01 does not exist ...
> ...Dropping to shell"
>
> At the moment, I have had to reinstall the old, failing disk in order to be able to boot and run the system. Has anyone had this problem before? Does anyone know of any solution to it?
>
> I've included the relevant disk/raid configuration at the end of this email. The device /dev/sdb is the one that is failing.

You don't include the one piece of information that shows that the
volume group sits on the raid device.

Can you do either pvdisplay, or vgdisplay -v

Does it show the volume group sitting on raid device
--
Alan Chandler
http://www.chandlerfamily.org.uk

--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/4C41FFA4.3080607(a)chandlerfamily.org.uk

From: Stan Hoeppner on 17 Jul 2010 16:20

Matthew Glubb put forth on 7/17/2010 3:11 AM:

> Normally in the past when a disk has failed, I have dropped the offending disk from the array, replaced the disk, booted, rebuilt the filesystem on the new disk and re-synced the array. I've done this about four times with this method.

Once you fix your immediate problem you really need to address the larger
issue, which is:

Why are you suffering so many disk failures, apparently on a single host?

The probability of one OP/host suffering 4 disk failures, even over a long
period such as 10 years, is astronomically low. If you manage a server farm
of a few dozen or more hosts and had one disk failure on each of four of them,
the odds are bit higher. However in your case we're not talking about a farm
situation are we?

Are these disks really failing, or are you seeing the software RAID driver
flag disks that aren't really going bad? What make/model disk drives are
these that are apparently failing? Do you have sufficient airflow in the case
to cool the drives? Is the host in an environment with a constant ambient
temperature over 80 degrees Fahrenheit?

--
Stan

--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/4C421052.9090701(a)hardwarefreak.com

From: Gabor Heja on 17 Jul 2010 19:50

Hello,

The four failures seem really high to me too. This might be a silly
question but: have you checked/replaced the controller and cables yet?

I had a machine with four disks and one of them picked randomly were
reported as bad every few weeks (all of them connected to the motherboard).
I ruled out the cables and HDDs so I decided to put a PCI controller card
in the machine and since then I got no errors. (Of course, my best choice
would be to replace the motherboard, but that was not an option at that
time.)

Are you sure your disks are bad? Have you ran badblocks on them ("badblocks
-vws" for read-WRITE mode, check man page before running)?

Regards,
Gabor

On Sat, 17 Jul 2010 15:19:30 -0500, Stan Hoeppner <stan(a)hardwarefreak.com>
wrote:
> Matthew Glubb put forth on 7/17/2010 3:11 AM:
>
>> Normally in the past when a disk has failed, I have dropped the
> offending disk from the array, replaced the disk, booted, rebuilt the
> filesystem on the new disk and re-synced the array. I've done this about
> four times with this method.
>
> Once you fix your immediate problem you really need to address the larger
> issue, which is:
>
> Why are you suffering so many disk failures, apparently on a single host?
>
> The probability of one OP/host suffering 4 disk failures, even over a
long
> period such as 10 years, is astronomically low. If you manage a server
> farm
> of a few dozen or more hosts and had one disk failure on each of four of
> them,
> the odds are bit higher. However in your case we're not talking about a
> farm
> situation are we?
>
> Are these disks really failing, or are you seeing the software RAID
driver
> flag disks that aren't really going bad? What make/model disk drives are
> these that are apparently failing? Do you have sufficient airflow in the
> case
> to cool the drives? Is the host in an environment with a constant
ambient
> temperature over 80 degrees Fahrenheit?
>
> --
> Stan
>
>
>

--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/af40b2729f935fb2666a1f9370b2c616(a)localhost

From: Alan Chandler on 20 Jul 2010 08:00

On 18/07/10 18:30, Matthew Glubb wrote:
> Hi Alan,
>
> Thanks very much for your reply.

Lets take this back to the list - not keep it between us - and I am
subscribed to the list so no need to copy me.

>
> On 17 Jul 2010, at 20:08, Alan Chandler wrote:
>>
>> You don't include the one piece of information that shows that the volume group sits on the raid device.
>>
>> Can you do either pvdisplay, or vgdisplay -v
>>
>> Does it show the volume group sitting on raid device
>
> It appears to me to be showing the volume group sitting on the raid device. Any ideas what the problem might me?
>

I don't know. When I have had a problem before, I have just repartioned
the old/new device and add these partitions using mdadm

It then syncs up (albeit over several hours).

I don't format it or put filesystems on it - which I think your original
mail mentioned.

> # vgdisplay -v
> Finding all volume groups
> Finding volume group "vg00"
> Fixing up missing size (456.34 GB) for PV /dev/md1
> --- Volume group ---
> VG Name vg00
> System ID
> Format lvm2
> Metadata Areas 1
> Metadata Sequence No 3
> VG Access read/write
> VG Status resizable
> MAX LV 0
> Cur LV 2
> Open LV 2
> Max PV 0
> Cur PV 1
> Act PV 1
> VG Size 456.34 GB
> PE Size 4.00 MB
> Total PE 116824
> Alloc PE / Size 116824 / 456.34 GB
> Free PE / Size 0 / 0
> VG UUID Urdpix-a5Ik-U1fq-Tw7T-umoT-paaR-e0s0Oz
>
> --- Logical volume ---
> LV Name /dev/vg00/lv00
> VG Name vg00
> LV UUID aosfiq-oBUr-70Xn-Y5OJ-lsSV-i59V-nTXJG6
> LV Write Access read/write
> LV Status available
> # open 2
> LV Size 8.00 GB
> Current LE 2048
> Segments 1
> Allocation inherit
> Read ahead sectors auto
> - currently set to 256
> Block device 253:0
>
> --- Logical volume ---
> LV Name /dev/vg00/lv01
> VG Name vg00
> LV UUID tvzjKH-hSpH-sDYk-YlWY-osUY-VxrA-ka2UCW
> LV Write Access read/write
> LV Status available
> # open 1
> LV Size 448.34 GB
> Current LE 114776
> Segments 1
> Allocation inherit
> Read ahead sectors auto
> - currently set to 256
> Block device 253:1
>
> --- Physical volumes ---
> PV Name /dev/md1
> PV UUID GSKYlk-d8z0-mGXj-kQny-gSMy-aOMR-zq1dXT
> PV Status allocatable
> Total PE / Free PE 116824 / 0
>
>
>> --
>> Alan Chandler
>> http://www.chandlerfamily.org.uk
>>
>>
>> --
>> To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
>> Archive: http://lists.debian.org/4C41FFA4.3080607(a)chandlerfamily.org.uk
>>
>

--
Alan Chandler
http://www.chandlerfamily.org.uk

--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/4C458DED.9010800(a)chandlerfamily.org.uk

| Next | Last
Pages: 1 2
Prev: Debian virus/spy-ware detection and detection technique.
Next: Alternative to snapshot.debian.net