From: Tim Clewlow on 1 May 2010 03:50 > On 4/30/2010 6:39 PM, Ron Johnson wrote: >> On 04/26/2010 09:29 AM, Tim Clewlow wrote: >>> Hi there, >>> >>> I'm getting ready to build a RAID 6 with 4 x 2TB drives to start, >> >> Since two of the drives (yes, I know the parity is striped across >> all >> the drives, but "two drives" is still the effect) are used by >> striping, >> RAID 6 with 4 drives doesn't seem rational. > > We've taken OP to task already for this, but I guess it bears > repeating. > > Use multiple HW controllers, and at least 7-8 drives, I believe was > the > consensus, given that SW RAID 6 is a performance loser and losing a > controller during a rebuild is a real ruin-your-week kind of moment. > > But while some of us were skeptical about just how bad the > performance > of RAID 5 or 6 really is and wanted citation of references, more of > us > just questioned the perceived frugality. With four drives, wouldn't > a > RAID 10 be better use of resources, since you can migrate to bigger > setups later? And there we were content to let it lie, until... > > > >>> but the intention is to add more drives as storage requirements >>> increase. >>> >>> My research/googling suggests ext3 supports 16TB volumes if block >> >> Why ext3? My kids would graduate college before the fsck >> completed. >> >> ext4 or xfs are the way to go. > > I have ceased to have an opinion on this, having been taken to task, > myself, about it. I believe the discussion degenerated into a > nit-picky > banter over the general suitability of XFS, but I may be wrong about > this... > > _____ > > > Seriously, ext4 is not suitable if you anticipate possible boot > problems, unless you are experienced at these things. The same is > true > of XFS. If you *are* experienced, then more power to you. > Although, I > would have assumed a very experienced person would have no need to > ask > the question. > > Someone pointed out what I have come to regard as the best solution, > and > that is to make /boot and / (root) and the usual suspects ext3 for > safety, and use ext4 or XFS or even btrfs for the data directories. > > (Unless OP were talking strictly about the data drives to begin > with, a > possibility I admit I may have overlooked.) > > > Have I summarized adequately? > > > MAA .. First off, thank you all for the valuable information and experience laden information. For clarity, the setup has always been intended to be: one system/application drive, and, one array made of separate drives; the array protects data, nothing else. The idea is for them to be two clearly distinct entities, with very different levels of protection, because the system and apps can be quite quickly recreated if lost, the data cannot. More clarity, the data is currently touching 4TB, and expected to exceed that very soon, so I'll be using at least 5 drives, probably 6, in the near future. Yes, I know raid6 on 4 drives is not frugal, I'm just planning ahead. My reticence to use ext4 / xfs has been due to long cache before write times being claimed as dangerous in the event of kernel lockup / power outage. There are also reports (albeit perhaps somewhat dated) that ext4/xfs still have a few small but important bugs to be ironed out - I'd be very happy to hear if people have experience demonstrating this is no longer true. My preference would be ext4 instead of xfs as I believe (just my opinion) this is most likely to become the successor to ext3 in the future. I have been wanting to know if ext3 can handle >16TB fs. I now know that delayed allocation / writes can be turned off in ext4 (among other tuning options I'm looking at), and with ext4, fs sizes are no longer a question. So I'm really hoping that ext4 is the way I can go. I'm also hoping that a cpu/motherboard with suitable grunt and fsb bandwidth could reduce performance problems with software raid6. If I'm seriously mistaken then I'd love to know beforehand. My reticence to use hw raid is that it seems like adding one more point of possible failure, but I could be easily be paranoid in dismissing it for that reason. Regards, Tim. -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/8812562889f9881787e6378e770b269c.squirrel(a)192.168.1.100
From: Stan Hoeppner on 2 May 2010 07:10 Disclaimer: I'm partial to XFS Tim Clewlow put forth on 5/1/2010 2:44 AM: > My reticence to use ext4 / xfs has been due to long cache before > write times being claimed as dangerous in the event of kernel lockup > / power outage. This is a problem with the Linux buffer cache implementation, not any one filesystem. The problem isn't the code itself, but the fact it is a trade off between performance and data integrity. No journaling filesystem will prevent the loss of data in the Linux buffer cache when the machine crashes. What they will do is zero out or delete any files that were not fully written before the crash in order to keep the FS in a consistent state. You will always lose data that's in flight, but your FS won't get corrupted due to the journal replay after reboot. If you are seriously concerned about loss of write data that is in the buffer cache when the system crashes, you should mount your filesystems with "-o sync" in the fstab options so all writes get flushed to disk without being queued in the buffer cache. > There are also reports (albeit perhaps somewhat > dated) that ext4/xfs still have a few small but important bugs to be > ironed out - I'd be very happy to hear if people have experience > demonstrating this is no longer true. My preference would be ext4 > instead of xfs as I believe (just my opinion) this is most likely to > become the successor to ext3 in the future. I can't speak well to EXT4, but XFS has been fully production quality for many years, since 1993 on Irix when it was introduced, and since ~2001 on Linux. There was a bug identified that resulted in fs inconsistency after a crash which was fixed in 2007. All bug fix work since has dealt with minor issues unrelated to data integrity. Most of the code fix work for quite some time now has been cleanup work, optimizations, and writing better documentation. Reading the posts to the XFS mailing list is very informative as to the quality and performance of the code. XFS has some really sharp devs. Most are current or former SGI engineers. > I have been wanting to know if ext3 can handle >16TB fs. I now know > that delayed allocation / writes can be turned off in ext4 (among > other tuning options I'm looking at), and with ext4, fs sizes are no > longer a question. So I'm really hoping that ext4 is the way I can > go. XFS has even more tuning options than EXT4--pretty much every FS for that matter. With XFS on a 32 bit kernel the max FS and file size is 16TB. On a 64 bit kernel it is 9 exabytes each. XFS is a better solution than EXT4 at this point. Ted T'so admits last week that one function call in EXT4 is in terrible shape and will a lot of work to fix: "On my todo list is to fix ext4 to not call write_cache_pages() at all. We are seriously abusing that function ATM, since we're not actually writing the pages when we call write_cache_pages(). I won't go into what we're doing, because it's too embarassing, but suffice it to say that we end up calling pagevec_lookup() or pagevec_lookup_tag() *four*, count them *four* times while trying to do writeback. I have a simple patch that gives ext4 our own copy of write_cache_pages(), and then simplifies it a lot, and fixes a bunch of problems, but then I discarded it in favor of fundamentally redoing how we do writeback at all, but it's going to take a while to get things completely right. But I am working to try to fix this." > I'm also hoping that a cpu/motherboard with suitable grunt and fsb > bandwidth could reduce performance problems with software raid6. If > I'm seriously mistaken then I'd love to know beforehand. My > reticence to use hw raid is that it seems like adding one more point > of possible failure, but I could be easily be paranoid in dismissing > it for that reason. Good hardware RAID cards are really nice and give you some features you can't really get with md raid such as true "just yank the drive tray out" hot swap capability. I've not tried it, but I've read that md raid doesn't like it when you just yank an active drive. Fault LED drive, audible warnings, are also nice with HW RAID solutions. The other main advantage is performance. Decent HW RAID is almost always faster than md raid, sometimes by a factor of 5 or more depending on the disk count and RAID level. Typically good HW RAID really trounces md raid performance at levels such as 5, 6, 50, 60, basically anything requiring parity calculations. Sounds like you're more of a casual user who needs lots of protected disk space but not necessarily absolute blazing speed. Linux RAID should be fine. Take a closer look at XFS before making your decision on a FS for this array. It's got a whole lot to like, and it has features to exactly tune XFS to your mdadm RAID setup. In fact it's usually automatically done for you as mkfs.xfs queries the block device device driver for stride and width info, then matches it. (~$ man 8 mkfs.xfs) http://oss.sgi.com/projects/xfs/ http://www.xfs.org/index.php/XFS_FAQ http://www.debian-administration.org/articles/388 http://www.jejik.com/articles/2008/04/benchmarking_linux_filesystems_on_software_raid_1/ http://www.osnews.com/story/69 (note the date, and note the praise Hans Reiser lavishes upon XFS) http://everything2.com/index.pl?node_id=1479435 http://erikugel.wordpress.com/2010/04/11/setting-up-linux-with-raid-faster-slackware-with-mdadm-and-xfs/ http://btrfs.boxacle.net/repository/raid/2010-04-14_2004/2.6.34-rc3/2.6.34-rc3.html (2.6.34-rc3 benchmarks, all filesystems in tree) XFS Users: The Linux Kernel Archives "A bit more than a year ago (as of October 2008) kernel.org, in an ever increasing need to squeeze more performance out of it's machines, made the leap of migrating the primary mirror machines (mirrors.kernel.org) to XFS. We site a number of reasons including fscking 5.5T of disk is long and painful, we were hitting various cache issues, and we were seeking better performance out of our file system." "After initial tests looked positive we made the jump, and have been quite happy with the results. With an instant increase in performance and throughput, as well as the worst xfs_check we've ever seen taking 10 minutes, we were quite happy. Subsequently we've moved all primary mirroring file-systems to XFS, including www.kernel.org , and mirrors.kernel.org. With an average constant movement of about 400mbps around the world, and with peaks into the 3.1gbps range serving thousands of users simultaneously it's been a file system that has taken the brunt we can throw at it and held up spectacularly." -- Stan -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/4BDD5B56.8060705(a)hardwarefreak.com
From: Boyd Stephen Smith Jr. on 2 May 2010 15:50 On Friday 30 April 2010 19:10:52 Mark Allums wrote: > or even btrfs for the data directories. While I am beginning experimenting with btrfs, I wouldn't yet use it for data you care about. /boot, not until/if grub2 gets support for it. Even then, boot is generally small and not often used, so you'll probably not notice any performance change from ext2 to a more modern file system, but you might appreciate the fact that ext2 is mature and well-understood. /, maybe -- if mkinitramfs figures out how to mount it properly. /usr, /opt, /var/cache and /var/tmp, maybe. They are easily discarded or restored. /home, /var, not yet. Data stored there might be irreplaceable. I'd wait until the developers say it is stable, at least! -- Boyd Stephen Smith Jr. ,= ,-_-. =. bss(a)iguanasuicide.net ((_/)o o(\_)) ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-' http://iguanasuicide.net/ \_/
From: Boyd Stephen Smith Jr. on 2 May 2010 16:00 On Sunday 02 May 2010 06:00:38 Stan Hoeppner wrote: > Good hardware RAID cards are really nice and give you some features you > can't really get with md raid such as true "just yank the drive tray out" > hot swap capability. I've not tried it, but I've read that md raid doesn't > like it when you just yank an active drive. Fault LED drive, audible > warnings, are also nice with HW RAID solutions. The other main advantage > is performance. Decent HW RAID is almost always faster than md raid, > sometimes by a factor of 5 or more depending on the disk count and RAID > level. Typically good HW RAID really trounces md raid performance at > levels such as 5, 6, 50, 60, basically anything requiring parity > calculations. Speeds on my md-RAID devices were comparable to speeds with my Areca HW RAID controller (16-port, PCI-X/SATA, battery powered 128MB cache). Number of drives varied from 5 to 10. RAID levels 5 and 6 were both tested. Read throughput for both were the expected (# drives - # parity drives) * single drive throughput. Write throughput less than expected in both cases, but I can't recall the exact figures. Both support "just yank the drive out" if the (rest of) the hardware supports hot plugging. Alerting about failure is probably a bit better with a HW RAID controller, since it comes with visual and audible alarms. It might be different when the system is under load, since the md-RAID depends on the host CPU and the HW RAID does not. However, adding an additional generic CPU (to reduce load) is both more useful and often less expensive than buying a HW RAID controller that is only used for RAID operations. > Sounds like you're more of a casual user who needs lots of protected disk > space but not necessarily absolute blazing speed. Linux RAID should be > fine. I know I am. -- Boyd Stephen Smith Jr. ,= ,-_-. =. bss(a)iguanasuicide.net ((_/)o o(\_)) ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-' http://iguanasuicide.net/ \_/
From: Alexander Samad on 2 May 2010 16:30
On Mon, May 3, 2010 at 6:02 AM, Boyd Stephen Smith Jr. <bss(a)iguanasuicide.net> wrote: > On Sunday 02 May 2010 06:00:38 Stan Hoeppner wrote: [snip] > > Speeds on my md-RAID devices were comparable to speeds with my Areca HW RAID > controller (16-port, PCI-X/SATA, battery powered 128MB cache). Number of > drives varied from 5 to 10. RAID levels 5 and 6 were both tested. > > Read throughput for both were the expected (# drives - # parity drives) * > single drive throughput. Write throughput less than expected in both cases, > but I can't recall the exact figures. > [snip] > > It might be different when the system is under load, since the md-RAID depends > on the host CPU and the HW RAID does not. However, adding an additional > generic CPU (to reduce load) is both more useful and often less expensive than > buying a HW RAID controller that is only used for RAID operations. My system used to become close to unusable on the 1st sunday of the month when mdadm did it resync, I had to write my own script so it did not do mulitple at the same time, turn off the hung process timer and set cpufreq to performance. Now with hd ware raid I don't notice it Alex > [snip] > -- > Boyd Stephen Smith Jr. ,= ,-_-. =.. > bss(a)iguanasuicide.net ((_/)o o(\_)) > ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-' > http://iguanasuicide.net/ \_/ > -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/n2z836a6dcf1005021324k3da07720od5e5e24dc9fbf437(a)mail.gmail.com |