What to expect with out-of-sync RAID devices? [Setup]

Prev: Not Kernel installable
Next: Does anyone have pam_mount working well for RHEL 5?

From: Snyder on 30 Jan 2010 17:18

Let a RAID-1 consist of two devices, set for autodetect (type
'fd'). Assume that they get out-of-sync, for instance when the
system is booted with only one of the devices connected, and then
the other device is written to. Now the system is booted with
both devices connected again. Then a degraded array is assembled
at boot time. So much I found out in experiments.

It remains the question _which_ of the two devices is chosen for
the degraded array? I observed different behavior, which partly
seems systematic, but partly random. Maybe someone can explain
what the general principle is.

From: Nico Kadel-Garcia on 31 Jan 2010 09:45

On Jan 30, 5:18 pm, "Snyder" <inva...(a)invalid.invalid> wrote:
> Let a RAID-1 consist of two devices, set for autodetect (type
> 'fd'). Assume that they get out-of-sync, for instance when the
> system is booted with only one of the devices connected, and then
> the other device is written to. Now the system is booted with
> both devices connected again. Then a degraded array is assembled
> at boot time. So much I found out in experiments.
>
> It remains the question _which_ of the two devices is chosen for
> the degraded array? I observed different behavior, which partly
> seems systematic, but partly random. Maybe someone can explain
> what the general principle is.

If the RAID1 is configured correctly, it should never write to the
"degraded" part of the array. This is one of the tricky parts of
software RAID: it still allows direct access to that part of the array
from the normal operating system tools. If you corrupt it behind the
back of software RAID, well, re-assembling it is gong to be a problem.
Normally that "disconnected" drive would be marked as out of sync at
boot time, and restoring the array would cause the active disk to be
mirrored to the second disk. That's why restoring the array takes so
long: it has to read all of one disk, and verify and potentially write
to all of the second one. But that kind of problem is inevitable if
you have removable drives in RAID1, such as USB drives.

Why did the drive go offline, and when? And are you using software or
hardware RAID? How did the second drive get written to?

From: Snyder on 31 Jan 2010 10:29

Nico Kadel-Garcia <nkadel(a)gmail.com> writes:

> If the RAID1 is configured correctly, it should never write to the
> "degraded" part of the array. This is one of the tricky parts of
> software RAID: it still allows direct access to that part of the array
> from the normal operating system tools. If you corrupt it behind the
> back of software RAID, well, re-assembling it is gong to be a problem.
> Normally that "disconnected" drive would be marked as out of sync at
> boot time, and restoring the array would cause the active disk to be
> mirrored to the second disk. That's why restoring the array takes so
> long: it has to read all of one disk, and verify and potentially write
> to all of the second one. But that kind of problem is inevitable if
> you have removable drives in RAID1, such as USB drives.

> Why did the drive go offline, and when? And are you using software or
> hardware RAID? How did the second drive get written to?

I am using software RAID on two USB drives. I know that
re-syncing can take ages; but I am prepared for this. Yet I must
prevent that one drive gets written to and then the other one
gets also written to, so that concurrent versions emerge and none
of the two drives is the "old" one which can be safely
overwritten with a mirror of the "current" one.

In other words: At each point in time, both drives must have the
same content, or one of them must have only obsolete content.

Lets call the drives A and B. Assume that I remove drive B by
pulling the USB plug. Then I do "touch current-drive" to mark
the remaining drive. Then I shutdown the system, re-connect drive
B and boot again. In all my experiments, this lead to a degraded
array being assembled with partitions from drive A. So far this
is what I needed. I can then re-add the partitions from drive B
with something like "mdadm /dev/mdX -a /dev/sdXX".

However, I also did the following experiment: after pulling the
plug on B, writing the file "current-drive" to A and finally
shutting down, I booted with only B connected. The system got up
and did its fsck (as expected, since the filesystems on B were
not cleanly unmounted before). I then shut the system down,
re-connected drive A and booted again.

In some cases, drive A was used to build the degraded array, and
in some cases drive B was used. I did not detect a pattern here.
This is not very convincing. One must keep in mind that this
series of events may also occur unprovoked: just think of an
unreliable USB hub.

You wrote that the "disconnected" drive would be marked as out of
sync at boot time. I presume this looks like this:

md: kicking non-fresh sda1 from array!

But by what criteria is a drive being categorized as "non-fresh"?

|
Pages: 1
Prev: Not Kernel installable
Next: Does anyone have pam_mount working well for RHEL 5?