From: Snyder on 30 Jan 2010 17:18 Let a RAID-1 consist of two devices, set for autodetect (type 'fd'). Assume that they get out-of-sync, for instance when the system is booted with only one of the devices connected, and then the other device is written to. Now the system is booted with both devices connected again. Then a degraded array is assembled at boot time. So much I found out in experiments. It remains the question _which_ of the two devices is chosen for the degraded array? I observed different behavior, which partly seems systematic, but partly random. Maybe someone can explain what the general principle is.
From: Nico Kadel-Garcia on 31 Jan 2010 09:45 On Jan 30, 5:18 pm, "Snyder" <inva...(a)invalid.invalid> wrote: > Let a RAID-1 consist of two devices, set for autodetect (type > 'fd'). Assume that they get out-of-sync, for instance when the > system is booted with only one of the devices connected, and then > the other device is written to. Now the system is booted with > both devices connected again. Then a degraded array is assembled > at boot time. So much I found out in experiments. > > It remains the question _which_ of the two devices is chosen for > the degraded array? I observed different behavior, which partly > seems systematic, but partly random. Maybe someone can explain > what the general principle is. If the RAID1 is configured correctly, it should never write to the "degraded" part of the array. This is one of the tricky parts of software RAID: it still allows direct access to that part of the array from the normal operating system tools. If you corrupt it behind the back of software RAID, well, re-assembling it is gong to be a problem. Normally that "disconnected" drive would be marked as out of sync at boot time, and restoring the array would cause the active disk to be mirrored to the second disk. That's why restoring the array takes so long: it has to read all of one disk, and verify and potentially write to all of the second one. But that kind of problem is inevitable if you have removable drives in RAID1, such as USB drives. Why did the drive go offline, and when? And are you using software or hardware RAID? How did the second drive get written to?
From: Snyder on 31 Jan 2010 10:29 Nico Kadel-Garcia <nkadel(a)gmail.com> writes: > If the RAID1 is configured correctly, it should never write to the > "degraded" part of the array. This is one of the tricky parts of > software RAID: it still allows direct access to that part of the array > from the normal operating system tools. If you corrupt it behind the > back of software RAID, well, re-assembling it is gong to be a problem. > Normally that "disconnected" drive would be marked as out of sync at > boot time, and restoring the array would cause the active disk to be > mirrored to the second disk. That's why restoring the array takes so > long: it has to read all of one disk, and verify and potentially write > to all of the second one. But that kind of problem is inevitable if > you have removable drives in RAID1, such as USB drives. > Why did the drive go offline, and when? And are you using software or > hardware RAID? How did the second drive get written to? I am using software RAID on two USB drives. I know that re-syncing can take ages; but I am prepared for this. Yet I must prevent that one drive gets written to and then the other one gets also written to, so that concurrent versions emerge and none of the two drives is the "old" one which can be safely overwritten with a mirror of the "current" one. In other words: At each point in time, both drives must have the same content, or one of them must have only obsolete content. Lets call the drives A and B. Assume that I remove drive B by pulling the USB plug. Then I do "touch current-drive" to mark the remaining drive. Then I shutdown the system, re-connect drive B and boot again. In all my experiments, this lead to a degraded array being assembled with partitions from drive A. So far this is what I needed. I can then re-add the partitions from drive B with something like "mdadm /dev/mdX -a /dev/sdXX". However, I also did the following experiment: after pulling the plug on B, writing the file "current-drive" to A and finally shutting down, I booted with only B connected. The system got up and did its fsck (as expected, since the filesystems on B were not cleanly unmounted before). I then shut the system down, re-connected drive A and booted again. In some cases, drive A was used to build the degraded array, and in some cases drive B was used. I did not detect a pattern here. This is not very convincing. One must keep in mind that this series of events may also occur unprovoked: just think of an unreliable USB hub. You wrote that the "disconnected" drive would be marked as out of sync at boot time. I presume this looks like this: md: kicking non-fresh sda1 from array! But by what criteria is a drive being categorized as "non-fresh"?
|
Pages: 1 Prev: Not Kernel installable Next: Does anyone have pam_mount working well for RHEL 5? |