From: Arcady Genkin on 12 Jul 2010 15:50 On Mon, Jul 12, 2010 at 14:54, Aaron Toponce <aaron.toponce(a)gmail.com> wrote: > Can you provide the commands from start to finish when building the volume? > > fdisk ... > mdadm ... > pvcreate ... > vgcreate ... > lvcreate ... Hi, Aaron, I already provided all of the above commands in earlier messages (except for fdisk, since we are giving the entire disks to MD, not partitions). I'll repeat them here for your convenience: Creating the ten 3-way RAID1 triplets - for N in 0 through 9: mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10 \ --layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048 \ --chunk=1024 /dev/sdX /dev/sdY /dev/sdZ Then the big stripe: mdadm --create /dev/md10 -v --raid-devices=10 --level=stripe \ --metadata=1.0 --chunk=1024 /dev/md{0,5,1,6,2,7,3,8,4,9} Then the LVM business: pvcreate /dev/md10 vgcreate vg0 /dev/md10 lvcreate -l 102389 vg0 Note that the file system is not being created on top of LVM at this point, and I ran the test by simply dd-ing /dev/vg0/lvol0. > My experience has been that LVM will introduce about a 1-2% performance > hit compared to not using it This is what we were expecting, it's encouraging. > On a side note, I've never seen any reason to increase or decrease the > chunk size with software RAID. However, you may want to match your chunk > size with '-c' for 'lvcreate'. We have tested a variety of chunk sizes (from 64K to 4MB) with bonnie++ and found that 1MB chunks worked the best for our usage, which is a general purpose NFS server, so it's mainly small random reads. In this scenario it's best to tune the chunk size to increase the probability that a small read from the stripe would result in only one read from the disk. If the chunk size is too small, then a 1KB read has a pretty high chance to be fragmented between two chunks, and, thus, require two I/Os to service instead of one I/O (and, thus, most likely two drive head seeks instead of just one). Modern commodity drives can do about only 100-120 seeks per second. But this is a side note for your side note. :)) >From the man page to 'lvcreate' it seems that the -c option sets the chunk size for something snapshot-related, so it should have no bearing in our performance testing, which involved no snapshots. Am I misreading the man page? Thanks! -- Arcady Genkin -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/AANLkTild4UMO3vAQ7h2FOkbsNT8XL2FYI-8VtnPFMEoW(a)mail.gmail.com
From: Aaron Toponce on 12 Jul 2010 16:50 On 7/12/2010 1:45 PM, Arcady Genkin wrote: > Creating the ten 3-way RAID1 triplets - for N in 0 through 9: > mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10 \ > --layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048 \ > --chunk=1024 /dev/sdX /dev/sdY /dev/sdZ > > Then the big stripe: > mdadm --create /dev/md10 -v --raid-devices=10 --level=stripe \ > --metadata=1.0 --chunk=1024 /dev/md{0,5,1,6,2,7,3,8,4,9} I must admit, that I haven't seen a software RAID implementation where you create multiple devices from the same set of disks, then stripe across those devices. As such, when using LVM, I'm not exactly sure how the kernel will handle that- mostly if it will see the appropriate amount of disk, and what physical extents it will use to place the data. So for me, this is uncharted territory. But, your commands look sound. I might suggest changing the default PE size from 4MB to 1MB. That might help. Worth testing anyway. The PE size can be changed with 'vgcreate -s 1M'. However, do you really want --bitmap with your mdadm command? I understand the benefits, but using 'internal' does come with a performance hit. > From the man page to 'lvcreate' it seems that the -c option sets the > chunk size for something snapshot-related, so it should have no > bearing in our performance testing, which involved no snapshots. Am I > misreading the man page? Ah yes, you are correct. I should probably pull up the man page before replying. :) -- . O . O . O . . O O . . . O . . . O . O O O . O . O O . . O O O O . O . . O O O O . O O O
From: Mike Bird on 12 Jul 2010 17:20 On Mon July 12 2010 12:45:57 Arcady Genkin wrote: > Creating the ten 3-way RAID1 triplets - for N in 0 through 9: > mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10 \ > --layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048 \ > --chunk=1024 /dev/sdX /dev/sdY /dev/sdZ RAID 10 with three devices? --Mike Bird -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/201007121400.42233.mgb-debian(a)yosemite.net
From: Aaron Toponce on 12 Jul 2010 18:20 On 7/12/2010 4:13 PM, Stan Hoeppner wrote: > Is that a typo, or are you turning those 3 disk mdadm sets into RAID10 as > shown above, instead of the 3-way mirror sets you stated previously? RAID 10 > requires a minimum of 4 disks, you have 3. Something isn't right here... Incorrect. The Linux RAID implementation can do level 10 across 3 disks. In fact, it can even do it across 2 disks. http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10 -- . O . O . O . . O O . . . O . . . O . O O O . O . O O . . O O O O . O . . O O O O . O O O
From: Stan Hoeppner on 12 Jul 2010 18:20 Arcady Genkin put forth on 7/12/2010 11:52 AM: > On Mon, Jul 12, 2010 at 02:05, Stan Hoeppner <stan(a)hardwarefreak.com> wrote: > >> lvcreate -i 10 -I [stripe_size] -l 102389 vg0 >> >> I believe you're losing 10x performance because you have a 10 "disk" mdadm >> stripe but you didn't inform lvcreate about this fact. > > Hi, Stan: > > I believe that the -i and -I options are for using *LVM* to do the > striping, am I wrong? If this were the case, lvcreate would require the set of physical or pseudo (mdadm) device IDs to stripe across wouldn't it? There are no options in lvcreate to specify physical or pseudo devices. The only input to lvcreate is a volume group ID. Therefor, lvcreate is ignorant of the physical devices underlying it, is it not? > In our case (when LVM sits on top of one RAID0 > MD stripe) the option -i does not seem to make sense: > > test4:~# lvcreate -i 10 -I 1024 -l 102380 vg0 > Number of stripes (10) must not exceed number of physical volumes (1) It makes sense once you accept the fact that lvcreate is ignorant of the underlying disk device count/configuration. Once you accept that fact, you will realize the -i option is what allows one to educate lvcreate that there are, in your case, 10 devices underlying it which one desires to stripe data across. I believe the -i option exists merely to educate lvcreate about the underlying device structure. > My understanding is that LVM should be agnostic of what's underlying > it as the physical storage, so it should treat the MD stripe as one > large disk, and thus let the MD device to handle the load balancing > (which it seems to be doing fine). If lvcreate is agnostic of the underlying structure, why does it have stripe width and stripe size options at all? As a parallel example of this, filesystems such as XFS are ignorant of underlying disk structure as well. mkfs.xfs has no less than 4 sub options to optimize its performance atop RAID stripes. One of it's options, sw, specifies stripe width, which is the number of physical or logical devices in the RAID stripe. In your case, if you use xfs, this would be "-d sw=10". These options in lvcreate serve the same function as those in mkfs.xfs, which is to optimize their performance atop a RAID stripe. > Besides, the speed we are getting from the LVM volume is more than > twice slower than an individual component of the RAID10 stripe. Even > if we assume that LVM manages somehow distribute its data so that it > always hits only one physical disk (a disk triplet in our case), there > would still be the question why it is doing it *that* slow. It's 57 > MB/s vs 134 MB/s that an individual triplet can do: Forget comparing performance to one of your single mdadm mirror sets. What's key here, and why I suggested "lvcreate -i 10 .." to begin with, is the fact that your lvm performance is almost exactly 10 times lower than the underlying mdadm device, which has exactly 10 physical stripes. Isn't that more than just a bit coincidental? The 10x drop only occurs when talking to the lvm device. Put on your Sherlock Holmes hat for a minute. > We are using chunk size of 1024 (i.e. 1MB) with the MD devices. For > the record, we used the following commands to create the md devices: > > For N in 0 through 9: > mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10 \ > --layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048 \ > --chunk=1024 /dev/sdX /dev/sdY /dev/sdZ Is that a typo, or are you turning those 3 disk mdadm sets into RAID10 as shown above, instead of the 3-way mirror sets you stated previously? RAID 10 requires a minimum of 4 disks, you have 3. Something isn't right here... > Then the big stripe: > mdadm --create /dev/md10 -v --raid-devices=10 --level=stripe \ > --metadata=1.0 --chunk=1024 /dev/md{0,5,1,6,2,7,3,8,4,9} And I'm pretty sure this is the stripe lvcreate needs to know about to fix the 10x performance drop issue. Create a new lvm test volume with the lvcreate options I've mentioned, and see how it performs against the current 400GB test volume that's running slow. -- Stan -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/4C3B937C.1080808(a)hardwarefreak.com
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: lenny boot with grub2 Next: Why does not update-grub detect Fedora 13 installation? |