From: Aragorn on
On Tuesday 19 January 2010 08:37 in comp.os.linux.misc, somebody
identifying as Rahul wrote...

> Aragorn <aragorn(a)chatfactory.invalid> wrote in news:hj1gta$2hp$5
> @news.eternal-september.org:
>
> Thanks for the great explaination!

Glad you appreciated it. ;-)

>> Writing to a RAID 5 is slower than writing to a single disk because
>> with each write, the parity block must be updated, which means
>> calculation of the parity data and writing that parity data to the
>> pertaining disk.
>
> This is where I get confused. Is writing to a RAID5 slower than a
> single disk irrespective of how many disks I throw at the RAID5?

Normally, yes, although it won't be *much* slower. But there is some
overhead in the calculation of the parity, yes. This is why RAID 6 is
even slower during writes: it stores *two* parity blocks per data
segment (and as such, it requires a minimum of 4 disks).

> I currently have a 7-disk RAID5. Will writing to this be slower than a
> single disk?

A little, yes. But reading from it will be significantly faster.

> Isn't the parity calculation a fairly fast process especially if one
> has a hardware based card?

Ah, but with a hardware-based RAID things are different. The actual
writing process will still be somewhat slower than writing to a single
disk, but considering that everything is taken care of by the hardware
and that such adapters have a very large cache - often backed by a
battery - this will not really have a noticeable performance impact.

With hardware RAID, the kernel treats the entire array as a single disk
and will simply write to the array. As far as the operating system is
concerned, that's where it ends, and the array takes care of everything
else from there, in a delayed fashion, but this is not something you
notice as your actual CPU(s) are freed up again as soon as the data is
transfered to the memory of the RAID adapter.

It is however advised if you have a hardware RAID adapter to disable the
write barriers. Write barriers are where the kernel forces the disks
drives to flush their caches. Since a hardware RAID adapter must be in
total control of the disk drives and has cache memory of its own, the
operating system should never force the disk drives to flush their
cache.

> And then if the write gets split into 6 parts shouldnt that speed up
> the process since each disk is writing only 1/6th of the chunk?

Yes, but the data has to be split up first - which is of course a lot
faster on hardware RAID since it is done by a dedicated processor on
the adapter itself then - and the parity has to be calculated. This is
overhead which you do not have with a single disk.

>> In this case, you don't have any redundancy. Writing to the
>> stripeset is faster than writing to a single disk, and the same
>> applies for reading. It's not a 2:1 performance boost due to the
>> overhead for splitting the data for writes and re-assembling it upon
>> reads, but there is a significant performance improvement, and
>> especially so if you use more than two disks.
>
> Why doesn;t a similar boost come out of a RAID5 with a large number of
> disks? Merely because of the parity calculation overhead?

Yes, that is the main difference. Like I said, RAID 6 is even slower
during writes (and has equal performance during reads).

>> There are however a few considerations you should take into account
>> with both of these approaches, i.e. that you should not put the
>> filesystem which holds the kernels and /initrd/ - and preferably not
>> the root filesystem either[1] - on a stripe, because the bootloader
>> recognizes
>
> Luckily that is not needed. I have a seperate drive to boot from. The
> RAID is intended only for user /home dirs.

Ah but wait a minute. As I understand it, you have a hardware RAID
adapter card. In that case - assuming that it is a real hardware RAID
adapter and not one of those on-board fake-RAID things - it doesn't
matter, because to the operating system (and even to the BIOS), the
entire array will be seen as a single disk. So then it is perfectly
possible to have your bootloader, your "/boot" and your "/" living on
the RAID array. (I am doing that myself on one of my machines, which
has two RAID 5 arrays of four disks each.)

And in this case - i.e. if you have a hardware RAID array - then your
original question regarding software RAID 0 versus striping via LVM is
also answered, because hardware RAID will always be a bit faster than
software RAID or striped LVM. Additionally, since you mention seven
disks, you could even opt for RAID 10 or 51 and even have a "hot spare"
or "standby spare". (Or you could use the extra disk as an individual,
standalone disk.)

RAID 10 is where you have a mirror (i.e. RAID 1) which is striped to
another mirror - you could instead also use RAID 01, which is a stripe
which is mirrored on another stripe. RAID 10 is better than RAID 01
though - there's a good article on Wikipedia about it. RAID 10 or 01
require four disks in total. Performance is very good for both reading
and writing *and* you have redundancy.

Similarly, RAID 51 is where you have a RAID 5 which is mirrored onto
another RAID 5. Or you could use RAID 15, which is a RAID 5 comprised
of mirrors. RAID 51 and 15 require a minimum of six disks.
(Similarly, there is RAID 61 and 16, which require a minimum of eight
disks.)

There is of course a trade-off. Except for RAID 0, which isn't really
RAID because it has no redundancy, all RAID solutions are expensive in
diskspace, and how expensive exactly depends on the chosen RAID type.
In RAID 1, RAID 10 or RAID 01 set-up, you lose 50% of your storage
capacity.

With RAID 5, your storage capacity is reduced by the capacity of one
disk in the array, and with RAID 6 by the capacity of two disks in the
array. So, with a single RAID 5 array comprised of seven disks without
a standby or hot spare, your total storage capacity is that of six
disks.

And then there's the lost capacity of the hot spare or standby spare - a
hot spare is spinning but otherwise unused until one of the other disks
starts to fail, while a standby spare is spun down until one of the
other disks fails. Upon such failure, the array will be automatically
rebuilt using the parity blocks to write the missing data to the spare
disk.

The bottom line...: A seven-disk RAID 0 would be faster than a RAID 5
during writes, but not really significantly faster during reads, and
you would have the full storage capacity of all disks in the array, but
there would be no redundancy at all. So, considering that you have
seven disks, I think you really should consider building in redundancy.
After all, with RAID 0, if a single disk in the array fails, then
you'll have lost all of your data. A RAID 5 would upon failure of a
single disk run slower, but at least you'd still have access to your
data.

--
*Aragorn*
(registered GNU/Linux user #223157)
From: Rahul on
Aragorn <aragorn(a)chatfactory.invalid> wrote in news:hj3u07$etq$4
@news.eternal-september.org:


>> Thanks for the great explaination!
>
> Glad you appreciated it. ;-)

Of course I did! Your comments have been infinitely more helpful than the
hours I spent with my vendors stupid helpdesk! :)

>
> Ah but wait a minute. As I understand it, you have a hardware RAID
> adapter card. In that case - assuming that it is a real hardware RAID
> adapter and not one of those on-board fake-RAID things - it doesn't

Ah yes. Fake-RAID. I have been trying to figure out if mine is real or fake.
I have a Dell PERC-e card and I hope it is "real", Is there a way to tell
"fake RAID" apart?

>
> And in this case - i.e. if you have a hardware RAID array - then your
> original question regarding software RAID 0 versus striping via LVM is
> also answered, because hardware RAID will always be a bit faster than
> software RAID or striped LVM.

Ok that's good to know. The reason I ask is I wasn't sure if in-spite of
having a hardware card I ought to export individual drives and then use mdadm
to manage.

With my 45 drives total there are a lot of options and its hard to calculate
which is the best one....

--
Rahul
From: Aragorn on
On Tuesday 19 January 2010 19:43 in comp.os.linux.misc, somebody
identifying as Rahul wrote...

> Aragorn <aragorn(a)chatfactory.invalid> wrote in news:hj3u07$etq$4
> @news.eternal-september.org:
>
>>> Thanks for the great explaination!
>>
>> Glad you appreciated it. ;-)
>
> Of course I did! Your comments have been infinitely more helpful than
> the hours I spent with my vendors stupid helpdesk! :)

Oh, I have come to experience still quite recently that the people who
populate a helpdesk are nothing but clerks with a FAQ in front of their
nose. Ask them anything that's not listed in that FAQ and they're
clueless. ;-)

>> Ah but wait a minute. As I understand it, you have a hardware RAID
>> adapter card. In that case - assuming that it is a real hardware
>> RAID adapter and not one of those on-board fake-RAID things - it
>> doesn't
>
> Ah yes. Fake-RAID. I have been trying to figure out if mine is real or
> fake. I have a Dell PERC-e card and I hope it is "real", Is there a
> way to tell "fake RAID" apart?

Well, since you are speaking of a separate plug-in card and since you've
mentioned elsewhere that you're using SAS drives, I would be inclined
to think that it is a genuine hardware RAID adapter. I do seem to
remember that the DEll PERC cards are based upon an LSI, Qlogic or
Adaptec adapter.

If each of the arrays of disks is seen by the kernel as a single disk,
then it's hardware RAID. Fake-RAID is generally reserved for IDE and
SATA disks, but I haven't encountered it for SAS or SCSI yet.

>> And in this case - i.e. if you have a hardware RAID array - then your
>> original question regarding software RAID 0 versus striping via LVM
>> is also answered, because hardware RAID will always be a bit faster
>> than software RAID or striped LVM.
>
> Ok that's good to know. The reason I ask is I wasn't sure if in-spite
> of having a hardware card I ought to export individual drives and then
> use mdadm to manage.

No no, that's totally unnecessary. With true hardware RAID, the
operating system will see an entire RAID array as being a single disk
and will treat it accordingly. Everything else is done by the RAID
adapter itself. Just make sure - as I made reference to earlier - that
you disable write barriers - this is done at mount time, via a mount
option.

> With my 45 drives total there are a lot of options and its hard to
> calculate which is the best one....

I would personally not use all of them for "/home". You mention three
arrays, so I would suggest the following...:

° First array:
- /boot
- /
- /usr
- /usr/local
- /opt
- an optional rescue/emergency root filesystem

° Second array:
- /var
- /tmp (Note: you can also make this a /tmpfs/ instead.)
- /srv (Note: use at your own discretion.)

° Third array:
- /home

--
*Aragorn*
(registered GNU/Linux user #223157)
From: Rahul on
Aragorn <aragorn(a)chatfactory.invalid> wrote in
news:hj52h6$lr7$2(a)news.eternal-september.org:

>
> I would personally not use all of them for "/home". You mention three
> arrays, so I would suggest the following...:
>
> ° First array:
> - /boot
> - /
> - /usr
> - /usr/local
> - /opt
> - an optional rescue/emergency root filesystem
>
> ° Second array:
> - /var
> - /tmp (Note: you can also make this a /tmpfs/
> instead.) - /srv (Note: use at your own discretion.)
>
> ° Third array:
> - /home
>

Sorry, I should have clarified. For /boot /usr etc. all I have a
seperate mirrored SAS drive. So those are taken care of. Besides 15x
300GB would be too much storage for any of those trees.

I have all 45 drives bought just to provide a high performance /home.
The question is how best to configure them:

1. What RAID pattern?
2. Do I add LVM on top? THis is cleaner than arbitrarily mounting /home1
/home2 etc. But the overhead of LVM worries me 3. Do I use LVM striping
or not? etc.


--
Rahul
From: Aragorn on
On Tuesday 19 January 2010 20:57 in comp.os.linux.misc, somebody
identifying as Rahul wrote...

> Aragorn <aragorn(a)chatfactory.invalid> wrote in
> news:hj52h6$lr7$2(a)news.eternal-september.org:
>
>> I would personally not use all of them for "/home". You mention
>> three arrays, so I would suggest the following...:
>>
>> ° First array:
>> - /boot
>> - /
>> - /usr
>> - /usr/local
>> - /opt
>> - an optional rescue/emergency root filesystem
>>
>> ° Second array:
>> - /var
>> - /tmp (Note: you can also make this a /tmpfs/
>> instead.) - /srv (Note: use at your own discretion.)
>>
>> ° Third array:
>> - /home
>>
>
> Sorry, I should have clarified. For /boot /usr etc. all I have a
> seperate mirrored SAS drive. So those are taken care of. Besides 15x
> 300GB would be too much storage for any of those trees.

Oh, okay then.

> I have all 45 drives bought just to provide a high performance /home.
> The question is how best to configure them:
>
> 1. What RAID pattern?

Since it is hardware RAID and you have a whole farm of disks, I'd set
them up as RAID 10 or 01 - i.e. a striped mirror or a mirrored stripe -
with a couple of standby spares. Saves up on some electricity too. ;-)

> 2. Do I add LVM on top? THis is cleaner than arbitrarily mounting
> /home1 /home2 etc. But the overhead of LVM worries me 3. Do I use LVM
> striping or not? etc.

Well, you can go two ways. Once everything is set up hardware-wise, you
can do either of the following:

° Use LVM with striping across the three arrays.

° Use LVM and combine the three partitions - one on each array -
into a single linear volume. The volume will then fill up
one array at the time.

° Use /mdadm/ and create a stripe with that, similar to LVM
striping but using regular partitions. You won't need LVM
anymore then.

° Use /mdadm/ and create a JBOD (alias "linear array"). This
is similar to the approach where you fill up one array at the
time, but now you're not using LVM.

The above range of suggestions is only for usability, mind you. Since
it is a hardware RAID, you will already be maximizing on performance
without any striping implemented via /mdadm/ or LVM. ;-)

--
*Aragorn*
(registered GNU/Linux user #223157)