From: Tim Clewlow on 26 Apr 2010 10:40 Hi there, I'm getting ready to build a RAID 6 with 4 x 2TB drives to start, but the intention is to add more drives as storage requirements increase. My research/googling suggests ext3 supports 16TB volumes if block size is 4096 bytes, but some sites suggest the 32 bit arch means it is restricted to 4TB no matter what block size I use. So, does ext3 (and relevent utilities, particularly resize2fs and e2fsck) on 32 bit i386 arch support 16TB volumes? I intend to use mdadm to build / run the array. If an unrecoverable read error (bad block that on disk circuitry cant resolve) is discovered on a disk then how does mdadm handle this? It appears the possibilities are: 1) the disk gets marked as failed in the array - ext3 does not get notified of a bad block 2) mdadm uses free space to construct a new stripe (from remaining raid data) to replace the bad one - ext3 does not get notified of a bad block 3) mdadm passes the requested data (again reconstructed from remaining good blocks) up to ext3 and then tells ext3 that all those blocks (from the single stripe) are now bad, and you deal with it (ext3 can mark and reallocate storage location if it is told of bad blocks too). I would really like to hear it is either 2 or 3 as I would prefer not to have an entire disk immediately marked bad due to one unrecoverable read error - I would prefer to be notified instead so I can still have RAID 6 protecting "most" of the data until the disk gets replaced. Regards, Tim. -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/6f1df414f4329ee27ada8e9b63a0c56d.squirrel(a)192.168.1.100
From: Tim Clewlow on 26 Apr 2010 11:30 Ok, I found the answer to my second question - it fails the entire disk. So the first question remains. Does ext3 (and relevent utilities, particularly resize2fs and e2fsck) on 32 bit i386 arch support 16TB volumes? Regards, Tim. -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/6f4fa734e37bf8efa066ae4152e01429.squirrel(a)192.168.1.100
From: Mark Allums on 26 Apr 2010 11:40 On 4/26/2010 9:29 AM, Tim Clewlow wrote: > Hi there, > > I'm getting ready to build a RAID 6 with 4 x 2TB drives to start, > but the intention is to add more drives as storage requirements > increase. > > My research/googling suggests ext3 supports 16TB volumes if block > size is 4096 bytes, but some sites suggest the 32 bit arch means it > is restricted to 4TB no matter what block size I use. So, does ext3 > (and relevent utilities, particularly resize2fs and e2fsck) on 32 > bit i386 arch support 16TB volumes? > > I intend to use mdadm to build / run the array. If an unrecoverable > read error (bad block that on disk circuitry cant resolve) is > discovered on a disk then how does mdadm handle this? It appears the > possibilities are: > 1) the disk gets marked as failed in the array - ext3 does not get > notified of a bad block > 2) mdadm uses free space to construct a new stripe (from remaining > raid data) to replace the bad one - ext3 does not get notified of a > bad block > 3) mdadm passes the requested data (again reconstructed from > remaining good blocks) up to ext3 and then tells ext3 that all those > blocks (from the single stripe) are now bad, and you deal with it > (ext3 can mark and reallocate storage location if it is told of bad > blocks too). > > I would really like to hear it is either 2 or 3 as I would prefer > not to have an entire disk immediately marked bad due to one > unrecoverable read error - I would prefer to be notified instead so > I can still have RAID 6 protecting "most" of the data until the disk > gets replaced. > > Regards, Tim. > > I'm afraid that opinions of RAID vary widely on this list (no surprise) but you may be interested to note that we agree (a consensus) that software-RAID 6 is an unfortunate choice. I believe that the answer to your question is none of the above. The closest is (2.). As I'm sure you know, RAID 6 uses block-level striping. So, what happens is a matter of policy, but I believe that data that is believed lost is recovered from parity, and rewritten to the array.[0] The error is logged, and the status of the drive is changed. If the drive doesn't fail outright, depending on policy[1], the drive may be re-verified or dropped out. However, mdadm handles the error, because it is a lower level failure than ext3. The problem is when the drive is completely 100% in use (no spare capacity). In that case, no new stripe is created, because there is no room to put one. The data is moved to unused area[1], and the status of the drive is changed. (your scenario 1.) ext3 is still unaware. The file system is a logical layer on top of RAID, and will only become aware of changes to the disk structure when it is unavoidable. RAID guarantees a certain capacity. If you create a volume with 1 TB capacity, the volume will always have that capacity. If you set this up, be sure to also combine it with LVM2. Then you have much greater flexibility about what to do when recovering from failures. [0] This depends on the implementation, and I don't know what mdadm does. Some implementations might do this automatically, but I think most would require a rebuild. [1] Again, I forget what mdadm does in this case. Anybody? I'm sorry, I seem to have avoided answering a crucial part of your question. I think that the md device documentation is what you want. MAA -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/4BD5B2A0.7060505(a)allums.com
From: Mark Allums on 26 Apr 2010 11:50 On 4/26/2010 10:28 AM, Tim Clewlow wrote: > > Ok, I found the answer to my second question - it fails the entire > disk. So the first question remains. I just figured that out---and I see you have too. The difference between what we would like it to do, and what it actually does can be frustrating sometimes. I think you can tell mdadm to (re-)verify the disk if you think it is okay, just has one bad block. But I never trust failing hard disks. It's a losing game. MAA -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/4BD5B441.4080408(a)allums.com
From: Tim Clewlow on 26 Apr 2010 13:00
> I'm afraid that opinions of RAID vary widely on this list (no > surprise) > but you may be interested to note that we agree (a consensus) that > software-RAID 6 is an unfortunate choice. > .. Is this for performance reasons or potential data loss. I can live with slow writes, reads should not be all that affected, but data loss is something I'd really like to avoid. Regards, Tim. -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/c9c11079d529273f62a76ba3b0a00359.squirrel(a)192.168.1.100 |