Questions about RAID 6 [Debian]

Prev: Linux compatible mainboards -- tortuous paths
Next: USB key accept data only as root

From: Tim Clewlow on 1 May 2010 03:50

> On 4/30/2010 6:39 PM, Ron Johnson wrote:
>> On 04/26/2010 09:29 AM, Tim Clewlow wrote:
>>> Hi there,
>>>
>>> I'm getting ready to build a RAID 6 with 4 x 2TB drives to start,
>>
>> Since two of the drives (yes, I know the parity is striped across
>> all
>> the drives, but "two drives" is still the effect) are used by
>> striping,
>> RAID 6 with 4 drives doesn't seem rational.
>
> We've taken OP to task already for this, but I guess it bears
> repeating.
>
> Use multiple HW controllers, and at least 7-8 drives, I believe was
> the
> consensus, given that SW RAID 6 is a performance loser and losing a
> controller during a rebuild is a real ruin-your-week kind of moment.
>
> But while some of us were skeptical about just how bad the
> performance
> of RAID 5 or 6 really is and wanted citation of references, more of
> us
> just questioned the perceived frugality. With four drives, wouldn't
> a
> RAID 10 be better use of resources, since you can migrate to bigger
> setups later? And there we were content to let it lie, until...
>
>
>
>>> but the intention is to add more drives as storage requirements
>>> increase.
>>>
>>> My research/googling suggests ext3 supports 16TB volumes if block
>>
>> Why ext3? My kids would graduate college before the fsck
>> completed.
>>
>> ext4 or xfs are the way to go.
>
> I have ceased to have an opinion on this, having been taken to task,
> myself, about it. I believe the discussion degenerated into a
> nit-picky
> banter over the general suitability of XFS, but I may be wrong about
> this...
>
> _____
>
>
> Seriously, ext4 is not suitable if you anticipate possible boot
> problems, unless you are experienced at these things. The same is
> true
> of XFS. If you *are* experienced, then more power to you.
> Although, I
> would have assumed a very experienced person would have no need to
> ask
> the question.
>
> Someone pointed out what I have come to regard as the best solution,
> and
> that is to make /boot and / (root) and the usual suspects ext3 for
> safety, and use ext4 or XFS or even btrfs for the data directories.
>
> (Unless OP were talking strictly about the data drives to begin
> with, a
> possibility I admit I may have overlooked.)
>
>
> Have I summarized adequately?
>
>
> MAA
..

First off, thank you all for the valuable information and experience
laden information. For clarity, the setup has always been intended
to be: one system/application drive, and, one array made of separate
drives; the array protects data, nothing else. The idea is for them
to be two clearly distinct entities, with very different levels of
protection, because the system and apps can be quite quickly
recreated if lost, the data cannot.

More clarity, the data is currently touching 4TB, and expected to
exceed that very soon, so I'll be using at least 5 drives, probably
6, in the near future. Yes, I know raid6 on 4 drives is not frugal,
I'm just planning ahead.

My reticence to use ext4 / xfs has been due to long cache before
write times being claimed as dangerous in the event of kernel lockup
/ power outage. There are also reports (albeit perhaps somewhat
dated) that ext4/xfs still have a few small but important bugs to be
ironed out - I'd be very happy to hear if people have experience
demonstrating this is no longer true. My preference would be ext4
instead of xfs as I believe (just my opinion) this is most likely to
become the successor to ext3 in the future.

I have been wanting to know if ext3 can handle >16TB fs. I now know
that delayed allocation / writes can be turned off in ext4 (among
other tuning options I'm looking at), and with ext4, fs sizes are no
longer a question. So I'm really hoping that ext4 is the way I can
go.

I'm also hoping that a cpu/motherboard with suitable grunt and fsb
bandwidth could reduce performance problems with software raid6. If
I'm seriously mistaken then I'd love to know beforehand. My
reticence to use hw raid is that it seems like adding one more point
of possible failure, but I could be easily be paranoid in dismissing
it for that reason.

Regards, Tim.

--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/8812562889f9881787e6378e770b269c.squirrel(a)192.168.1.100

From: Stan Hoeppner on 2 May 2010 07:10

Disclaimer: I'm partial to XFS

Tim Clewlow put forth on 5/1/2010 2:44 AM:

> My reticence to use ext4 / xfs has been due to long cache before
> write times being claimed as dangerous in the event of kernel lockup
> / power outage.

This is a problem with the Linux buffer cache implementation, not any one
filesystem. The problem isn't the code itself, but the fact it is a trade
off between performance and data integrity. No journaling filesystem will
prevent the loss of data in the Linux buffer cache when the machine crashes.
What they will do is zero out or delete any files that were not fully
written before the crash in order to keep the FS in a consistent state. You
will always lose data that's in flight, but your FS won't get corrupted due
to the journal replay after reboot. If you are seriously concerned about
loss of write data that is in the buffer cache when the system crashes, you
should mount your filesystems with "-o sync" in the fstab options so all
writes get flushed to disk without being queued in the buffer cache.

> There are also reports (albeit perhaps somewhat
> dated) that ext4/xfs still have a few small but important bugs to be
> ironed out - I'd be very happy to hear if people have experience
> demonstrating this is no longer true. My preference would be ext4
> instead of xfs as I believe (just my opinion) this is most likely to
> become the successor to ext3 in the future.

I can't speak well to EXT4, but XFS has been fully production quality for
many years, since 1993 on Irix when it was introduced, and since ~2001 on
Linux. There was a bug identified that resulted in fs inconsistency after a
crash which was fixed in 2007. All bug fix work since has dealt with minor
issues unrelated to data integrity. Most of the code fix work for quite
some time now has been cleanup work, optimizations, and writing better
documentation. Reading the posts to the XFS mailing list is very
informative as to the quality and performance of the code. XFS has some
really sharp devs. Most are current or former SGI engineers.

> I have been wanting to know if ext3 can handle >16TB fs. I now know
> that delayed allocation / writes can be turned off in ext4 (among
> other tuning options I'm looking at), and with ext4, fs sizes are no
> longer a question. So I'm really hoping that ext4 is the way I can
> go.

XFS has even more tuning options than EXT4--pretty much every FS for that
matter. With XFS on a 32 bit kernel the max FS and file size is 16TB. On a
64 bit kernel it is 9 exabytes each. XFS is a better solution than EXT4 at
this point. Ted T'so admits last week that one function call in EXT4 is in
terrible shape and will a lot of work to fix:

"On my todo list is to fix ext4 to not call write_cache_pages() at all.
We are seriously abusing that function ATM, since we're not actually
writing the pages when we call write_cache_pages(). I won't go into
what we're doing, because it's too embarassing, but suffice it to say
that we end up calling pagevec_lookup() or pagevec_lookup_tag()
*four*, count them *four* times while trying to do writeback.

I have a simple patch that gives ext4 our own copy of
write_cache_pages(), and then simplifies it a lot, and fixes a bunch
of problems, but then I discarded it in favor of fundamentally redoing
how we do writeback at all, but it's going to take a while to get
things completely right. But I am working to try to fix this."

> I'm also hoping that a cpu/motherboard with suitable grunt and fsb
> bandwidth could reduce performance problems with software raid6. If
> I'm seriously mistaken then I'd love to know beforehand. My
> reticence to use hw raid is that it seems like adding one more point
> of possible failure, but I could be easily be paranoid in dismissing
> it for that reason.

Good hardware RAID cards are really nice and give you some features you
can't really get with md raid such as true "just yank the drive tray out"
hot swap capability. I've not tried it, but I've read that md raid doesn't
like it when you just yank an active drive. Fault LED drive, audible
warnings, are also nice with HW RAID solutions. The other main advantage is
performance. Decent HW RAID is almost always faster than md raid, sometimes
by a factor of 5 or more depending on the disk count and RAID level.
Typically good HW RAID really trounces md raid performance at levels such as
5, 6, 50, 60, basically anything requiring parity calculations.

Sounds like you're more of a casual user who needs lots of protected disk
space but not necessarily absolute blazing speed. Linux RAID should be fine.

Take a closer look at XFS before making your decision on a FS for this
array. It's got a whole lot to like, and it has features to exactly tune
XFS to your mdadm RAID setup. In fact it's usually automatically done for
you as mkfs.xfs queries the block device device driver for stride and width
info, then matches it. (~$ man 8 mkfs.xfs)

http://oss.sgi.com/projects/xfs/
http://www.xfs.org/index.php/XFS_FAQ
http://www.debian-administration.org/articles/388
http://www.jejik.com/articles/2008/04/benchmarking_linux_filesystems_on_software_raid_1/
http://www.osnews.com/story/69
(note the date, and note the praise Hans Reiser lavishes upon XFS)
http://everything2.com/index.pl?node_id=1479435
http://erikugel.wordpress.com/2010/04/11/setting-up-linux-with-raid-faster-slackware-with-mdadm-and-xfs/
http://btrfs.boxacle.net/repository/raid/2010-04-14_2004/2.6.34-rc3/2.6.34-rc3.html
(2.6.34-rc3 benchmarks, all filesystems in tree)

XFS Users:

The Linux Kernel Archives

"A bit more than a year ago (as of October 2008) kernel.org, in an ever
increasing need to squeeze more performance out of it's machines, made the
leap of migrating the primary mirror machines (mirrors.kernel.org) to XFS.
We site a number of reasons including fscking 5.5T of disk is long and
painful, we were hitting various cache issues, and we were seeking better
performance out of our file system."

"After initial tests looked positive we made the jump, and have been quite
happy with the results. With an instant increase in performance and
throughput, as well as the worst xfs_check we've ever seen taking 10
minutes, we were quite happy. Subsequently we've moved all primary mirroring
file-systems to XFS, including www.kernel.org , and mirrors.kernel.org. With
an average constant movement of about 400mbps around the world, and with
peaks into the 3.1gbps range serving thousands of users simultaneously it's
been a file system that has taken the brunt we can throw at it and held up
spectacularly."

--
Stan

--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/4BDD5B56.8060705(a)hardwarefreak.com

From: Boyd Stephen Smith Jr. on 2 May 2010 15:50

On Friday 30 April 2010 19:10:52 Mark Allums wrote:
> or even btrfs for the data directories.

While I am beginning experimenting with btrfs, I wouldn't yet use it for data
you care about.

/boot, not until/if grub2 gets support for it. Even then, boot is generally
small and not often used, so you'll probably not notice any performance change
from ext2 to a more modern file system, but you might appreciate the fact that
ext2 is mature and well-understood.

/, maybe -- if mkinitramfs figures out how to mount it properly.

/usr, /opt, /var/cache and /var/tmp, maybe. They are easily discarded or
restored.

/home, /var, not yet. Data stored there might be irreplaceable. I'd wait
until the developers say it is stable, at least!
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss(a)iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/

From: Boyd Stephen Smith Jr. on 2 May 2010 16:00

On Sunday 02 May 2010 06:00:38 Stan Hoeppner wrote:
> Good hardware RAID cards are really nice and give you some features you
> can't really get with md raid such as true "just yank the drive tray out"
> hot swap capability. I've not tried it, but I've read that md raid doesn't
> like it when you just yank an active drive. Fault LED drive, audible
> warnings, are also nice with HW RAID solutions. The other main advantage
> is performance. Decent HW RAID is almost always faster than md raid,
> sometimes by a factor of 5 or more depending on the disk count and RAID
> level. Typically good HW RAID really trounces md raid performance at
> levels such as 5, 6, 50, 60, basically anything requiring parity
> calculations.

Speeds on my md-RAID devices were comparable to speeds with my Areca HW RAID
controller (16-port, PCI-X/SATA, battery powered 128MB cache). Number of
drives varied from 5 to 10. RAID levels 5 and 6 were both tested.

Read throughput for both were the expected (# drives - # parity drives) *
single drive throughput. Write throughput less than expected in both cases,
but I can't recall the exact figures.

Both support "just yank the drive out" if the (rest of) the hardware supports
hot plugging. Alerting about failure is probably a bit better with a HW RAID
controller, since it comes with visual and audible alarms.

It might be different when the system is under load, since the md-RAID depends
on the host CPU and the HW RAID does not. However, adding an additional
generic CPU (to reduce load) is both more useful and often less expensive than
buying a HW RAID controller that is only used for RAID operations.

> Sounds like you're more of a casual user who needs lots of protected disk
> space but not necessarily absolute blazing speed. Linux RAID should be
> fine.

I know I am.
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss(a)iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/

From: Alexander Samad on 2 May 2010 16:30

On Mon, May 3, 2010 at 6:02 AM, Boyd Stephen Smith Jr.
<bss(a)iguanasuicide.net> wrote:
> On Sunday 02 May 2010 06:00:38 Stan Hoeppner wrote:
[snip]
>
> Speeds on my md-RAID devices were comparable to speeds with my Areca HW RAID
> controller (16-port, PCI-X/SATA, battery powered 128MB cache). Number of
> drives varied from 5 to 10. RAID levels 5 and 6 were both tested.
>
> Read throughput for both were the expected (# drives - # parity drives) *
> single drive throughput. Write throughput less than expected in both cases,
> but I can't recall the exact figures.
>
[snip]
>
> It might be different when the system is under load, since the md-RAID depends
> on the host CPU and the HW RAID does not. However, adding an additional
> generic CPU (to reduce load) is both more useful and often less expensive than
> buying a HW RAID controller that is only used for RAID operations.

My system used to become close to unusable on the 1st sunday of the month when
mdadm did it resync, I had to write my own script so it did not do
mulitple at the
same time, turn off the hung process timer and set cpufreq to performance.

Now with hd ware raid I don't notice it

Alex

>
[snip]
> --
> Boyd Stephen Smith Jr. ,= ,-_-. =..
> bss(a)iguanasuicide.net ((_/)o o(\_))
> ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
> http://iguanasuicide.net/ \_/
>

--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/n2z836a6dcf1005021324k3da07720od5e5e24dc9fbf437(a)mail.gmail.com

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: Linux compatible mainboards -- tortuous paths
Next: USB key accept data only as root