From: Rahul on
I find that I could in theory get a performance boost either by using a
RAID5 via mdadm or by striping via LVM. Let's assume redundancy is not a
concern merely performance boosting.

What's the difference in these two approaches and is one better than the
other?

--
Rahul
From: Aragorn on
On Sunday 17 January 2010 21:46 in comp.os.linux.misc, somebody
identifying as Rahul wrote...

> I find that I could in theory get a performance boost either by using
> a RAID5 via mdadm or by striping via LVM. Let's assume redundancy is
> not a concern merely performance boosting.
>
> What's the difference in these two approaches and is one better than
> the other?

A true RAID 5 means that you need at least three disks, in which case
the data will, per data segment, be striped over two disks, and the
third disk will hold a parity block. Distribution of the parity blocks
is staircased, meaning that the parity block will be put on a different
disk in the array per data segment, like so...

Data segment Disk 1 Disk 2 Disk 3

A A-1 A-2 A-parity
B B-1 B-parity B-2
C C-parity C-1 C-2
D D-1 D-2 D-parity
E E-1 E-parity E-2
F F-parity F-1 F-2
... ... ... ...

Writing to a RAID 5 is slower than writing to a single disk because with
each write, the parity block must be updated, which means calculation
of the parity data and writing that parity data to the pertaining disk.

Reading from a (non-degraded) RAID 5 however is fast and comparable to
RAID 0, also known as "striping", because the parity block need not be
read, unless the array is running in degraded mode, i.e. with one of
the disks failing and the missing data is recalculated using the parity
block.

A plain stripeset on the other hand only requires two disks, and simply
does what the above does, but without parity blocks. So you'd have a
set up like this...

Data segment Disk 1 Disk 2

A A-1 A-2
B B-1 B-2
C C-1 C-2
... ... ...

In this case, you don't have any redundancy. Writing to the stripeset
is faster than writing to a single disk, and the same applies for
reading. It's not a 2:1 performance boost due to the overhead for
splitting the data for writes and re-assembling it upon reads, but
there is a significant performance improvement, and especially so if
you use more than two disks.

Now, you can use virtually any kind of software RAID set-up
with /mdadm/ - including RAID 0 - and things like LVM can offer you a
similar set-up - you don't even need either of them if you want to
apply striping to the swap partition because this can be achieved by
simply giving two swap partitions on separate disks an equal priority
in "/etc/fstab".

If striping without redundancy is what you want, then you can go either
way, i.e. RAID 0 via /mdadm/ or via the older /dmraid/ or striping
implemented at the logical volume management level. The only
difference is in the layer of the kernel in which this will be handled,
so whether you set it up via /mdadm/ - or even via /dmraid/ - versus
setting it up via the logical volume manager, it is still software
RAID, and I don't think there would be any significant - i.e. humanly
noticeable - difference in performance.

There are however a few considerations you should take into account with
both of these approaches, i.e. that you should not put the filesystem
which holds the kernels and /initrd/ - and preferably not the root
filesystem either[1] - on a stripe, because the bootloader recognizes
neither software RAID nor logical volume management. It's a
chicken-and-egg thing, i.e. the drivers for LVM and software RAID are
in the Linux kernel, so you have to be able load the Linux kernel first
before you can make use of those drivers. GRUB does not have any
drivers for that, and the way LILO works it would also not be able to
load a kernel off of a striped filesystem.


[1] Having the root filesystem on a software RAID stripeset will work
only if you have an initrd which contains *all* the required driver
modules, since there is no control over the order of the automatic
module loading by the kernel itself. It loads the modules according
to the hardware it finds, and if it needs a module off of the root
filesystem before the RAID or LVM modules have been loaded, then
you're foobarred.

--
*Aragorn*
(registered GNU/Linux user #223157)
From: David Brown on
Rahul wrote:
> I find that I could in theory get a performance boost either by using a
> RAID5 via mdadm or by striping via LVM. Let's assume redundancy is not a
> concern merely performance boosting.
>
> What's the difference in these two approaches and is one better than the
> other?
>

LVM is for logical volume management, mdadm is for administering
multiple disk setups (i.e., software raid). LVM /can/ do basic
striping, in that if you have two physical volumes allocated to the same
volume group, then a logical volume can be striped across the two
physical volumes. As another poster has said, you won't notice a
performance difference between striping via LVM or mdadm. But you
/will/ notice a difference in the administration and commands used - it
is more convenient to use mdadm for raid than LVM.

My recommendation is that you use mdadm to create a raid from the raw
drives or partitions on the drives, and if you want the volume
management features of LVM (I find it very useful), put LVM on top of
mdadm raid.

As for the type of raid to use, that depends on the number of disks you
have and the redundancy you want. raid5 is well-known to be slower for
writing, especially for smaller writes, and it can be risky for large
disks in critical applications (since rebuilding takes so long, and
wears the other disks). Mirroring is safer, and mdadmin can happily do
a raid10 (roughly a stripe of mirrors) on any number of disks for high
speed and mirrored redundancy.

Booting from raids is complicated, but not as difficult as suggested by
another poster. Modern grub can handle a /boot partition on a raid1 or
raid0 mdadmin setup, although it's a little inconvenient to install -
you typically have to manually run grub to install the first stage
bootloader on each disk's boot sector individually.

The last server I configured had three disks. I partitioned each into a
small partion (1G) and a big partition (the rest of the disks). The
small partitions I joined in an mdadm raid1 (mirror), and use for /boot.
The big partitions are in an raid10 mdadm block, used as an LVM
physical drive, with logical partitions for various parts of the system
and virtual machines. It will happily run and boot with any one of the
drives removed.
From: Rahul on
Aragorn <aragorn(a)chatfactory.invalid> wrote in news:hj1gta$2hp$5
@news.eternal-september.org:

Thanks for the great explaination!

> Writing to a RAID 5 is slower than writing to a single disk because with
> each write, the parity block must be updated, which means calculation
> of the parity data and writing that parity data to the pertaining disk.

This is where I get confused. Is writing to a RAID5 slower than a single
disk irrespective of how many disks I throw at the RAID5? I currently have
a 7-disk RAID5. Will writing to this be slower than a single disk? Isn't
the parity calculation a fairly fast process especially if one has a
hardware based card? And then if the write gets split into 6 parts shouldnt
that speed up the process since each disk is writing only 1/6th of the
chunk?

>
> In this case, you don't have any redundancy. Writing to the stripeset
> is faster than writing to a single disk, and the same applies for
> reading. It's not a 2:1 performance boost due to the overhead for
> splitting the data for writes and re-assembling it upon reads, but
> there is a significant performance improvement, and especially so if
> you use more than two disks.

Why doesn;t a similar boost come out of a RAID5 with a large number of
disks? Merely because of the parity calculation overhead?


>
> There are however a few considerations you should take into account with
> both of these approaches, i.e. that you should not put the filesystem
> which holds the kernels and /initrd/ - and preferably not the root
> filesystem either[1] - on a stripe, because the bootloader recognizes

Luckily that is not needed. I have a seperate drive to boot from. The RAID
is intended only for user /home dirs.

--
Rahul
From: Rahul on
David Brown <david.brown(a)hesbynett.removethisbit.no> wrote in
news:BtOdnakm8taiUsnWnZ2dnUVZ8qOdnZ2d(a)lyse.net:

Thanks David!

> Rahul wrote:
>
> LVM is for logical volume management, mdadm is for administering
> multiple disk setups (i.e., software raid). LVM /can/ do basic

> striping, in that if you have two physical volumes allocated to the
> same volume group, then a logical volume can be striped across the two
> physical volumes. As another poster has said, you won't notice a
> performance difference between striping via LVM or mdadm. But you

Will putting LVM on top of mdadm slow things down? Or does LVM not have a
significant performance penalty?
>
> My recommendation is that you use mdadm to create a raid from the raw
> drives or partitions on the drives, and if you want the volume
> management features of LVM (I find it very useful), put LVM on top of
> mdadm raid.

This is exactly what I was trying to do. BUt LVM asks "stripe" or :no
stripe". THat I wasn;t sure about.


> As for the type of raid to use, that depends on the number of disks
> you have and the redundancy you want. raid5 is well-known to be
> slower for writing, especially for smaller writes, and it can be risky
> for large disks in critical applications

Maybe if I explain my situation you can have some more comments.

I have 3 physical "storage boxes" (MD-1000's from Dell). Each takes 15
SAS 15k drives of 300 GB each. i.e. I have a total of 45 drives of 300 GB
each. Redundancy is important but not critical. Performance was more
imporntant.

My original plan was to split each box into two RAID5 arrays of 7 disks
each and leave 1 as a hot spare. Thus I get 6 RAID5 arrays in all. They
are visible as /dev/sdb /dev/sdc etc. but I want to mount a single /home
on it. That's where I introduced LVM. But then LVM again introduces a
striping option. Should I be striping or not?

That's where I am confuesd about what my best option is. It's hard to
balance redundancy, performance and disk capacity.


Any other creative options that come to mind?



>(since rebuilding takes so
> long, and wears the other disks). Mirroring is safer, and mdadmin can
> happily do a raid10 (roughly a stripe of mirrors) on any number of
> disks for high speed and mirrored redundancy.
>
> Booting from raids is complicated, but not as difficult as suggested

Luckily I don't have to go down that path; I have a seperate drive to
boot from.

--
Rahul