From: Tim Clewlow on
Hi there,

I'm getting ready to build a RAID 6 with 4 x 2TB drives to start,
but the intention is to add more drives as storage requirements
increase.

My research/googling suggests ext3 supports 16TB volumes if block
size is 4096 bytes, but some sites suggest the 32 bit arch means it
is restricted to 4TB no matter what block size I use. So, does ext3
(and relevent utilities, particularly resize2fs and e2fsck) on 32
bit i386 arch support 16TB volumes?

I intend to use mdadm to build / run the array. If an unrecoverable
read error (bad block that on disk circuitry cant resolve) is
discovered on a disk then how does mdadm handle this? It appears the
possibilities are:
1) the disk gets marked as failed in the array - ext3 does not get
notified of a bad block
2) mdadm uses free space to construct a new stripe (from remaining
raid data) to replace the bad one - ext3 does not get notified of a
bad block
3) mdadm passes the requested data (again reconstructed from
remaining good blocks) up to ext3 and then tells ext3 that all those
blocks (from the single stripe) are now bad, and you deal with it
(ext3 can mark and reallocate storage location if it is told of bad
blocks too).

I would really like to hear it is either 2 or 3 as I would prefer
not to have an entire disk immediately marked bad due to one
unrecoverable read error - I would prefer to be notified instead so
I can still have RAID 6 protecting "most" of the data until the disk
gets replaced.

Regards, Tim.


--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/6f1df414f4329ee27ada8e9b63a0c56d.squirrel(a)192.168.1.100
From: Tim Clewlow on

Ok, I found the answer to my second question - it fails the entire
disk. So the first question remains.

Does ext3 (and relevent utilities, particularly resize2fs and
e2fsck) on 32 bit i386 arch support 16TB volumes?

Regards, Tim.


--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/6f4fa734e37bf8efa066ae4152e01429.squirrel(a)192.168.1.100
From: Mark Allums on
On 4/26/2010 9:29 AM, Tim Clewlow wrote:
> Hi there,
>
> I'm getting ready to build a RAID 6 with 4 x 2TB drives to start,
> but the intention is to add more drives as storage requirements
> increase.
>
> My research/googling suggests ext3 supports 16TB volumes if block
> size is 4096 bytes, but some sites suggest the 32 bit arch means it
> is restricted to 4TB no matter what block size I use. So, does ext3
> (and relevent utilities, particularly resize2fs and e2fsck) on 32
> bit i386 arch support 16TB volumes?
>
> I intend to use mdadm to build / run the array. If an unrecoverable
> read error (bad block that on disk circuitry cant resolve) is
> discovered on a disk then how does mdadm handle this? It appears the
> possibilities are:
> 1) the disk gets marked as failed in the array - ext3 does not get
> notified of a bad block
> 2) mdadm uses free space to construct a new stripe (from remaining
> raid data) to replace the bad one - ext3 does not get notified of a
> bad block
> 3) mdadm passes the requested data (again reconstructed from
> remaining good blocks) up to ext3 and then tells ext3 that all those
> blocks (from the single stripe) are now bad, and you deal with it
> (ext3 can mark and reallocate storage location if it is told of bad
> blocks too).
>
> I would really like to hear it is either 2 or 3 as I would prefer
> not to have an entire disk immediately marked bad due to one
> unrecoverable read error - I would prefer to be notified instead so
> I can still have RAID 6 protecting "most" of the data until the disk
> gets replaced.
>
> Regards, Tim.
>
>

I'm afraid that opinions of RAID vary widely on this list (no surprise)
but you may be interested to note that we agree (a consensus) that
software-RAID 6 is an unfortunate choice.

I believe that the answer to your question is none of the above. The
closest is (2.). As I'm sure you know, RAID 6 uses block-level
striping. So, what happens is a matter of policy, but I believe that
data that is believed lost is recovered from parity, and rewritten to
the array.[0] The error is logged, and the status of the drive is
changed. If the drive doesn't fail outright, depending on policy[1],
the drive may be re-verified or dropped out. However, mdadm handles the
error, because it is a lower level failure than ext3.

The problem is when the drive is completely 100% in use (no spare
capacity). In that case, no new stripe is created, because there is no
room to put one. The data is moved to unused area[1], and the status of
the drive is changed. (your scenario 1.) ext3 is still unaware.

The file system is a logical layer on top of RAID, and will only become
aware of changes to the disk structure when it is unavoidable. RAID
guarantees a certain capacity. If you create a volume with 1 TB
capacity, the volume will always have that capacity.

If you set this up, be sure to also combine it with LVM2. Then you have
much greater flexibility about what to do when recovering from failures.


[0] This depends on the implementation, and I don't know what mdadm
does. Some implementations might do this automatically, but I think
most would require a rebuild.

[1] Again, I forget what mdadm does in this case. Anybody?



I'm sorry, I seem to have avoided answering a crucial part of your
question. I think that the md device documentation is what you want.


MAA






--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/4BD5B2A0.7060505(a)allums.com
From: Mark Allums on
On 4/26/2010 10:28 AM, Tim Clewlow wrote:
>
> Ok, I found the answer to my second question - it fails the entire
> disk. So the first question remains.


I just figured that out---and I see you have too.

The difference between what we would like it to do, and what it actually
does can be frustrating sometimes. I think you can tell mdadm to
(re-)verify the disk if you think it is okay, just has one bad block.
But I never trust failing hard disks. It's a losing game.

MAA





--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/4BD5B441.4080408(a)allums.com
From: Tim Clewlow on

> I'm afraid that opinions of RAID vary widely on this list (no
> surprise)
> but you may be interested to note that we agree (a consensus) that
> software-RAID 6 is an unfortunate choice.
>
..
Is this for performance reasons or potential data loss. I can live
with slow writes, reads should not be all that affected, but data
loss is something I'd really like to avoid.

Regards, Tim.


--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/c9c11079d529273f62a76ba3b0a00359.squirrel(a)192.168.1.100