From: Arcady Genkin on
I'm seeing a 10-fold performance hit when using an LVM2 logical volume
that sits on top of a RAID0 stripe. Using dd to read directly from
the stripe (i.e. a large sequential read) I get speeds over 600MB/s.
Reading from the logical volume using the same method only gives
around 57MB/s. I am new to LVM and I need to for the snapshots.
Would anyone suggest where to start looking for the problem?

The server runs the amd64 version of Lenny. Most packages (including
lvm2) are stock from Lenny, but we had to upgrade the kernel to the
one from lenny-backports (2.6.32).

There are ten RAID1 triplets: md0 through md9 (that's 30 physical
disks arranged into ten 3-way mirrors), connected over iSCSI from six
targets. The ten triplets are then striped together into a RAID0
stripe /dev/md10. I don't think we have any issues with the MD
layers, because each of them seems to perform fairly well; it's when
we add LVM into the soup the speeds start getting slow.

test4:~# uname -a
Linux test4 2.6.32-bpo.4-amd64 #1 SMP Thu Apr 8 10:20:24 UTC 2010
x86_64 GNU/Linux

test4:~# dd of=/dev/null bs=8K count=2500000 if=/dev/md10
2500000+0 records in
2500000+0 records out
20480000000 bytes (20 GB) copied, 33.4619 s, 612 MB/s

test4:~# dd of=/dev/null bs=8K count=2500000 if=/dev/vg0/lvol0
2500000+0 records in
2500000+0 records out
20480000000 bytes (20 GB) copied, 354.951 s, 57.7 MB/s

I used the following commands to create the volume group:

pvcreate /dev/md10
vgcreate vg0 /dev/md10
lvcreate -l 102389 vg0

Here's what LVM reports of its devices:

test4:~# pvdisplay
--- Physical volume ---
PV Name /dev/md10
VG Name vg0
PV Size 399.96 GB / not usable 4.00 MB
Allocatable yes (but full)
PE Size (KByte) 4096
Total PE 102389
Free PE 0
Allocated PE 102389
PV UUID ocIGdd-cqcy-GNQl-jxRo-FHmW-THMi-fqofbd

test4:~# vgdisplay
--- Volume group ---
VG Name vg0
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 2
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 399.96 GB
PE Size 4.00 MB
Total PE 102389
Alloc PE / Size 102389 / 399.96 GB
Free PE / Size 0 / 0
VG UUID o2TeAm-gPmZ-VvJc-OSfU-quvW-OB3a-y1pQaB

test4:~# lvdisplay
--- Logical volume ---
LV Name /dev/vg0/lvol0
VG Name vg0
LV UUID Q3nA6w-0jgw-ImWY-IYJK-kvMJ-aybW-GAdoOs
LV Write Access read/write
LV Status available
# open 0
LV Size 399.96 GB
Current LE 102389
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:0

Many thanks in advance for any pointers!
--
Arcady Genkin


--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/AANLkTiksMhwitDv1_iji72TaK_1iRx9DxPj2McCAhs3z(a)mail.gmail.com
From: Stan Hoeppner on
Arcady Genkin put forth on 7/11/2010 10:46 PM:

> lvcreate -l 102389 vg0

Should be:

lvcreate -i 10 -I [stripe_size] -l 102389 vg0

I believe you're losing 10x performance because you have a 10 "disk" mdadm
stripe but you didn't inform lvcreate about this fact. Delete the vg, and
then recreate the vg with the above command line, specifying 64 for the stripe
size (the mdadm default). If performance is still lacking, recreate it again
with 640 for the stripe size. (I'm not exactly sure of the relationship
between mdadm chunk size and lvm stripe size--it's either equal, or it's mdadm
stripe width * mdadm chunk size)

If you specified a chunk size when you created the mdadm RAID 0 stripe, then
use that chunk size for the lvcreate stripe_size. Again, if performance is
still lacking, recreate with whatever chunk size you specified in mdadm and
multiply that by 10.

Hope this helps. Let us know.

--
Stan


--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/4C3AB09A.4090304(a)hardwarefreak.com
From: Arcady Genkin on
On Mon, Jul 12, 2010 at 02:05, Stan Hoeppner <stan(a)hardwarefreak.com> wrote:

> lvcreate -i 10 -I [stripe_size] -l 102389 vg0
>
> I believe you're losing 10x performance because you have a 10 "disk" mdadm
> stripe but you didn't inform lvcreate about this fact.

Hi, Stan:

I believe that the -i and -I options are for using *LVM* to do the
striping, am I wrong? In our case (when LVM sits on top of one RAID0
MD stripe) the option -i does not seem to make sense:

test4:~# lvcreate -i 10 -I 1024 -l 102380 vg0
Number of stripes (10) must not exceed number of physical volumes (1)

My understanding is that LVM should be agnostic of what's underlying
it as the physical storage, so it should treat the MD stripe as one
large disk, and thus let the MD device to handle the load balancing
(which it seems to be doing fine).

Besides, the speed we are getting from the LVM volume is more than
twice slower than an individual component of the RAID10 stripe. Even
if we assume that LVM manages somehow distribute its data so that it
always hits only one physical disk (a disk triplet in our case), there
would still be the question why it is doing it *that* slow. It's 57
MB/s vs 134 MB/s that an individual triplet can do:

test4:~# dd of=/dev/null bs=8K count=2500000 if=/dev/md0
2500000+0 records in
2500000+0 records out
20480000000 bytes (20 GB) copied, 153.084 s, 134 MB/s

> If you specified a chunk size when you created the mdadm RAID 0 stripe, then
> use that chunk size for the lvcreate stripe_size.  Again, if performance is
> still lacking, recreate with whatever chunk size you specified in mdadm and
> multiply that by 10.

We are using chunk size of 1024 (i.e. 1MB) with the MD devices. For
the record, we used the following commands to create the md devices:

For N in 0 through 9:
mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10 \
--layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048 \
--chunk=1024 /dev/sdX /dev/sdY /dev/sdZ

Then the big stripe:
mdadm --create /dev/md10 -v --raid-devices=10 --level=stripe \
--metadata=1.0 --chunk=1024 /dev/md{0,5,1,6,2,7,3,8,4,9}

Thanks,
--
Arcady Genkin


--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/AANLkTilK5for3GQ2W9kVAjFe7VgZVQmagyjjBkvFlnHY(a)mail.gmail.com
From: Arcady Genkin on
I just tried to use LVM for striping the RAID1 triplets together
(instead of MD). Using the following three commands to create the
logical volume, I get 550 MB/s sequential read speed, which is quite
faster than before, but is still 10% slower than what plain MD RAID0
stripe can do with the same disks (612 MB/s).

pvcreate /dev/md{0,5,1,6,2,7,3,8,4,9}
vgcreate vg0 /dev/md{0,5,1,6,2,7,3,8,4,9}
lvcreate -i 10 -I 1024 -l 102390 vg0

test4:~# dd of=/dev/null bs=8K count=2500000 if=/dev/vg0/lvol0
2500000+0 records in
2500000+0 records out
20480000000 bytes (20 GB) copied, 37.2381 s, 550 MB/s

I would still like to know why LVM on top of RAID0 performs so poorly
in our case.
--
Arcady Genkin


--
To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
Archive: http://lists.debian.org/AANLkTilCdXiUEXHnMB7JF9cxz9K_2TkvI_2QSJtLdfDe(a)mail.gmail.com
From: Aaron Toponce on
On 7/12/2010 11:45 AM, Arcady Genkin wrote:
> I would still like to know why LVM on top of RAID0 performs so poorly
> in our case.

Can you provide the commands from start to finish when building the volume?

fdisk ...
mdadm ...
pvcreate ...
vgcreate ...
lvcreate ...

etc.

My experience has been that LVM will introduce about a 1-2% performance
hit compared to not using it, in many different situations, whether it
be on top of software/hardware RAID, on plain disk/partitions. So, I'm
curious what commandline options you're passing to each of your
commands, how your partitioned/built your disks, and so forth. Might
help troubleshoot why you're seeing such a hit.

On a side note, I've never seen any reason to increase or decrease the
chunk size with software RAID. However, you may want to match your chunk
size with '-c' for 'lvcreate'.

--
. O . O . O . . O O . . . O .
. . O . O O O . O . O O . . O
O O O . O . . O O O O . O O O