Prev: [PATCH 08/11] ocfs2: Pass the locking protocol into ocfs2_cluster_connect().
Next: [PATCH tip/core/rcu 1/3] rcu: fixes for accelerated grace periods for last non-dynticked CPU
From: Andreas Dilger on 1 Mar 2010 03:40 On 2010-02-28, at 07:55, Justin Piszcz wrote: > === CREATE RAID-0 WITH 11 DISKS Have you tried testing with "nice" numbers of disks in your RAID set (e.g. 8 disks for RAID-0, 9 for RAID-5, 10 for RAID-6)? The mballoc code is really much better tuned for power-of-two sized allocations. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Justin Piszcz on 1 Mar 2010 04:30 On Mon, 1 Mar 2010, Andreas Dilger wrote: > On 2010-02-28, at 07:55, Justin Piszcz wrote: >> === CREATE RAID-0 WITH 11 DISKS > > > Have you tried testing with "nice" numbers of disks in your RAID set (e.g. 8 > disks for RAID-0, 9 for RAID-5, 10 for RAID-6)? The mballoc code is really > much better tuned for power-of-two sized allocations. Hi, Yes, the second system (RAID-5) has 8 disks and it shows the same performance problems with ext4 and not XFS (as shown from previous e-mail), where XFS usually got 500-600MiB/s for writes. http://groups.google.com/group/linux.kernel/browse_thread/thread/e7b189bcaa2c1cb4/ad6c2a54b678cf5f?show_docid=ad6c2a54b678cf5f&pli=1 For the RAID-5 (from earlier testing): <- This one has 8 disks. -o data=writeback,nobarrier: 10737418240 bytes (11 GB) copied, 48.7335 s, 220 MB/s -o data=writeback,nobarrier,nodelalloc: 10737418240 bytes (11 GB) copied, 30.5425 s, 352 MB/s An increase of 132MiB/s. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Michael Tokarev on 1 Mar 2010 09:50 Justin Piszcz wrote: > > On Mon, 1 Mar 2010, Andreas Dilger wrote: > >> On 2010-02-28, at 07:55, Justin Piszcz wrote: >>> === CREATE RAID-0 WITH 11 DISKS >> >> Have you tried testing with "nice" numbers of disks in your RAID set >> (e.g. 8 disks for RAID-0, 9 for RAID-5, 10 for RAID-6)? The mballoc >> code is really much better tuned for power-of-two sized allocations. > > Hi, > > Yes, the second system (RAID-5) has 8 disks and it shows the same > performance problems with ext4 and not XFS (as shown from previous > e-mail), where XFS usually got 500-600MiB/s for writes. > > http://groups.google.com/group/linux.kernel/browse_thread/thread/e7b189bcaa2c1cb4/ad6c2a54b678cf5f?show_docid=ad6c2a54b678cf5f&pli=1 > > > For the RAID-5 (from earlier testing): <- This one has 8 disks. Note that for RAID-5, the "nice" number of disks is 9 as Andreas said, not 8 as in your example. /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Justin Piszcz on 1 Mar 2010 10:10 On Mon, 1 Mar 2010, Michael Tokarev wrote: > Justin Piszcz wrote: >> >> On Mon, 1 Mar 2010, Andreas Dilger wrote: >> >>> On 2010-02-28, at 07:55, Justin Piszcz wrote: >>>> === CREATE RAID-0 WITH 11 DISKS >>> >>> Have you tried testing with "nice" numbers of disks in your RAID set >>> (e.g. 8 disks for RAID-0, 9 for RAID-5, 10 for RAID-6)? The mballoc >>> code is really much better tuned for power-of-two sized allocations. >> >> Hi, >> >> Yes, the second system (RAID-5) has 8 disks and it shows the same >> performance problems with ext4 and not XFS (as shown from previous >> e-mail), where XFS usually got 500-600MiB/s for writes. >> >> http://groups.google.com/group/linux.kernel/browse_thread/thread/e7b189bcaa2c1cb4/ad6c2a54b678cf5f?show_docid=ad6c2a54b678cf5f&pli=1 >> >> >> For the RAID-5 (from earlier testing): <- This one has 8 disks. > > Note that for RAID-5, the "nice" number of disks is 9 as Andreas > said, not 8 as in your example. > > /mjt > Hi, thanks for this. RAID-0 with 12 disks: p63:~# mdadm --create -e 0.90 /dev/md0 /dev/sd[b-m]1 --level=0 -n 12 -c 64 mdadm: /dev/sdb1 appears to contain an ext2fs file system size=1077256000K mtime=Sun Feb 28 08:35:47 2010 mdadm: /dev/sdb1 appears to be part of a raid array: level=raid0 devices=11 ctime=Sun Feb 28 08:11:10 2010 mdadm: /dev/sdc1 appears to be part of a raid array: level=raid0 devices=11 ctime=Sun Feb 28 08:11:10 2010 mdadm: /dev/sdd1 appears to be part of a raid array: level=raid0 devices=11 ctime=Sun Feb 28 08:11:10 2010 mdadm: /dev/sde1 appears to be part of a raid array: level=raid0 devices=11 ctime=Sun Feb 28 08:11:10 2010 mdadm: /dev/sdf1 appears to be part of a raid array: level=raid0 devices=11 ctime=Sun Feb 28 08:11:10 2010 mdadm: /dev/sdg1 appears to be part of a raid array: level=raid0 devices=11 ctime=Sun Feb 28 08:11:10 2010 mdadm: /dev/sdh1 appears to be part of a raid array: level=raid0 devices=11 ctime=Sun Feb 28 08:11:10 2010 mdadm: /dev/sdi1 appears to be part of a raid array: level=raid0 devices=11 ctime=Sun Feb 28 08:11:10 2010 mdadm: /dev/sdj1 appears to be part of a raid array: level=raid0 devices=11 ctime=Sun Feb 28 08:11:10 2010 mdadm: /dev/sdk1 appears to be part of a raid array: level=raid0 devices=11 ctime=Sun Feb 28 08:11:10 2010 mdadm: /dev/sdl1 appears to be part of a raid array: level=raid0 devices=11 ctime=Sun Feb 28 08:11:10 2010 mdadm: /dev/sdm1 appears to be part of a raid array: level=raid6 devices=11 ctime=Sat Feb 27 06:57:29 2010 Continue creating array? y mdadm: array /dev/md0 started. p63:~# mkfs.ext4 /dev/md0 mke2fs 1.41.10 (10-Feb-2009) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 366288896 inodes, 1465151808 blocks 73257590 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 44713 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848, 512000000, 550731776, 644972544 Writing inode tables: 28936/44713..etc p63:~# mount -o nobarrier /dev/md0 /r1 p63:~# cd /r1 p63:/r1# dd if=/dev/zero of=bigfile bs=1M count=10240 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 34.9723 s, 307 MB/s p63:/r1# Same issue for EXT4, with XFS, it gets faster: p63:~# mkfs.xfs /dev/md0 -f meta-data=/dev/md0 isize=256 agcount=32, agsize=45786000 blks = sectsz=512 attr=2 data = bsize=4096 blocks=1465151808, imaxpct=5 = sunit=16 swidth=192 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=512 sunit=16 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 mount /dev/md0 /r1 p63:~# mount /dev/md0 /r1 p63:~# cd /r1 p63:/r1# dd if=/dev/zero of=bigfile bs=1M count=10240 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 17.6473 s, 608 MB/s p63:/r1# Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Eric Sandeen on 1 Mar 2010 11:20
Justin Piszcz wrote: > > > On Sun, 28 Feb 2010, tytso(a)mit.edu wrote: > >> On Sat, Feb 27, 2010 at 06:36:37AM -0500, Justin Piszcz wrote: >>> >>> I still would like to know however, why 350MiB/s seems to be the maximum >>> performance I can get from two different md raids (that easily do >>> 600MiB/s >>> with XFS). > >> Can you run "filefrag -v <filename>" on the large file you created >> using dd? Part of the problem may be the block allocator simply not >> being well optimized super large writes. To be honest, that's not >> something we've tried (at all) to optimize, mainly because for most >> users of ext4 they're more interested in much more reasonable sized >> files, and we only have so many hours in a day to hack on ext4. :-) >> XFS in contrast has in the past had plenty of paying customers >> interested in writing really large scientific data sets, so this is >> something Irix *has* spent time optimizing. > Yes, this is shown at the bottom of the e-mail both with -o data=ordered > and data=writeback. .... > === SHOW FILEFRAG OUTPUT (NOBARRIER,ORDERED) > > p63:/r1# filefrag -v /r1/bigfile Filesystem type is: ef53 > File size of /r1/bigfile is 10737418240 (2621440 blocks, blocksize 4096) > ext logical physical expected length flags > 0 0 34816 32768 > 1 32768 67584 30720 > 2 63488 100352 98303 32768 > 3 96256 133120 30720 > 4 126976 165888 163839 32768 > 5 159744 198656 30720 .... That looks pretty good. I think Dave's suggesting of seeing what cpu usage looks like is a good one. Running blktrace on xfs vs. ext4 could possibly also shed some light. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |