Prev: [PATCH 08/11] ocfs2: Pass the locking protocol into ocfs2_cluster_connect().
Next: [PATCH tip/core/rcu 1/3] rcu: fixes for accelerated grace periods for last non-dynticked CPU
From: Eric Sandeen on 26 Feb 2010 20:30 Eric Sandeen wrote: > Justin Piszcz wrote: > ... > >>> Were the filesystems created to align with raid geometry? >> Only default options were used except the mount options. If that is the >> culprit, I have some more testing to do, thanks, will look into it. >> >>> mkfs.xfs has done that forever; mkfs.ext4 only will do so (automatically) >>> with recent kernel+e2fsprogs. >> How recent? > > You're recent enough. :) Oh, you need very recent util-linux-ng as well, and use libblkid from there with: [e2fsprogs] # ./configure --disable-libblkid Otherwise you can just feed mkfs.ext4 stripe & stride manually. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Justin Piszcz on 27 Feb 2010 05:20 On Fri, 26 Feb 2010, Eric Sandeen wrote: > Eric Sandeen wrote: > > Oh, you need very recent util-linux-ng as well, and use libblkid from there > with: > > [e2fsprogs] # ./configure --disable-libblkid > > Otherwise you can just feed mkfs.ext4 stripe & stride manually. > > -Eric > Hi, Even when set, there is still poor performance: http://busybox.net/~aldot/mkfs_stride.html Raid Level: 0 Number of Physical Disks: 11 RAID chunk size (in KiB): 1024 number of filesystem blocks (in KiB) mkfs.ext4 -b 4096 -E stride=256,stripe-width=2816 p63:~# /usr/bin/time mkfs.ext4 -b 4096 -E stride=256,stripe-width=2816 /dev/md0 mke2fs 1.41.10 (10-Feb-2009) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=256 blocks, Stripe width=2816 blocks 335765504 inodes, 1343055824 blocks 67152791 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 40987 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848, 512000000, 550731776, 644972544 Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 38 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. p63:~# p63:~# mount /dev/md0 /r1 -o nobarrier,data=writeback p63:/r1# dd if=/dev/zero of=file bs=1M count=10240 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 39.3674 s, 273 MB/s p63:/r1# Still very slow? Let's try with some optimizations: p63:/r1# mount /dev/md0 /r1 -o noatime,barrier=0,data=writeback,nobh,commit=100,nouser_xattr,nodelalloc,max_batch_time=0^C Still not anywhere near 500-600MiB/s of XFS: p63:/r1# dd if=/dev/zero of=file bs=1M count=10240 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 30.4824 s, 352 MB/s p63:/r1# Am I doing something wrong/is there a flag I am missing that will speed it up? Or is this performance for sequential writes on EXT4? Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Justin Piszcz on 27 Feb 2010 06:00 On Sat, 27 Feb 2010, Justin Piszcz wrote: > > > On Fri, 26 Feb 2010, Eric Sandeen wrote: > > > Eric Sandeen wrote: > > > > Oh, you need very recent util-linux-ng as well, and use libblkid from there > > with: > > > > [e2fsprogs] # ./configure --disable-libblkid > > > > Otherwise you can just feed mkfs.ext4 stripe & stride manually. > > > > -Eric > > I also tried with the default chunk size (64KiB) incase ext4 had a problem with chunk sizes > 64KiB, the results were the same for ext4, I also tried ext2 & ext3 as well just to see what their performance would be: p63:~# mkfs.ext2 -b 4096 -E stride=16,stripe-width=176 /dev/md0 p63:~# mount /dev/md0 /r1 p63:/r1# dd if=/dev/zero of=file bs=1M count=10240 10737418240 bytes (11 GB) copied, 19.9434 s, 538 MB/s p63:/r1# p63:~# mkfs.ext3 -b 4096 -E stride=16,stripe-width=176 /dev/md0 p63:~# mount /dev/md0 /r1 p63:/r1# dd if=/dev/zero of=file bs=1M count=10240 10737418240 bytes (11 GB) copied, 31.0195 s, 346 MB/s p63:~# mkfs.ext4 -b 4096 -E stride=16,stripe-width=176 /dev/md0 p63:~# mount /dev/md0 /r1 p63:/r1# dd if=/dev/zero of=file bs=1M count=10240 10737418240 bytes (11 GB) copied, 35.3866 s, 303 MB/s And, for comparison, XFS: p63:~# mkfs.xfs -f /dev/md0 > /dev/null 2>&1 p63:~# mount /dev/md0 /r1 p63:~# cd /r1 p63:/r1# dd if=/dev/zero of=file bs=1M count=10240 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 18.1527 s, 592 MB/s p63:/r1# -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Justin Piszcz on 27 Feb 2010 06:10 On Sat, 27 Feb 2010, Justin Piszcz wrote: > > > On Sat, 27 Feb 2010, Justin Piszcz wrote: > > > > > > > On Fri, 26 Feb 2010, Eric Sandeen wrote: > > Hi, I have found the same results on 2 different systems: It seems to peak at ~350MiB/s performance on mdadm raid, whether a RAID-5 or RAID-0 (two separate machines): The only option I found that allows it to go from: 10737418240 bytes (11 GB) copied, 48.7335 s, 220 MB/s to 10737418240 bytes (11 GB) copied, 30.5425 s, 352 MB/s Is the -o nodelalloc option. How come it is not breaking the 350MiB/s barrier is the question? Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Justin Piszcz on 27 Feb 2010 06:40
On Sat, 27 Feb 2010, Justin Piszcz wrote: > > > On Sat, 27 Feb 2010, Justin Piszcz wrote: > > > > > > > On Sat, 27 Feb 2010, Justin Piszcz wrote: > > > > > > > > > > > On Fri, 26 Feb 2010, Eric Sandeen wrote: > > > > > Hi, > > I have found the same results on 2 different systems: > > It seems to peak at ~350MiB/s performance on mdadm raid, whether > a RAID-5 or RAID-0 (two separate machines): > > The only option I found that allows it to go from: > 10737418240 bytes (11 GB) copied, 48.7335 s, 220 MB/s > to > 10737418240 bytes (11 GB) copied, 30.5425 s, 352 MB/s > > Is the -o nodelalloc option. > > How come it is not breaking the 350MiB/s barrier is the question? > > Justin. > > Besides large sequential I/O, ext4 seems to be MUCH faster than XFS when working with many small files. EXT4 p63:/r1# sync; /usr/bin/time bash -c 'tar xf linux-2.6.33.tar; sync' 0.18user 2.43system 0:02.86elapsed 91%CPU (0avgtext+0avgdata 5216maxresident)k 0inputs+0outputs (0major+971minor)pagefaults 0swaps linux-2.6.33 linux-2.6.33.tar p63:/r1# sync; /usr/bin/time bash -c 'rm -rf linux-2.6.33; sync' 0.02user 0.98system 0:01.03elapsed 97%CPU (0avgtext+0avgdata 5216maxresident)k 0inputs+0outputs (0major+865minor)pagefaults 0swaps XFS p63:/r1# sync; /usr/bin/time bash -c 'tar xf linux-2.6.33.tar; sync' 0.20user 2.62system 1:03.90elapsed 4%CPU (0avgtext+0avgdata 5200maxresident)k 0inputs+0outputs (0major+970minor)pagefaults 0swaps p63:/r1# sync; /usr/bin/time bash -c 'rm -rf linux-2.6.33; sync' 0.03user 2.02system 0:29.04elapsed 7%CPU (0avgtext+0avgdata 5200maxresident)k 0inputs+0outputs (0major+864minor)pagefaults 0swaps So I guess that's the tradeoff, for massive I/O you should use XFS, else, use EXT4? I still would like to know however, why 350MiB/s seems to be the maximum performance I can get from two different md raids (that easily do 600MiB/s with XFS). Is this a performance issue within ext4 and md-raid? The problem does not exist with xfs and md-raid. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |