Prev: [PATCH rfc] firewire: cdev: improve FW_CDEV_IOC_ALLOCATE
Next: tmio_mmc: Make ack_mmc_irqs() write-only
From: Christoph Hellwig on 23 Jul 2010 07:20 I might sound like a broken record, but if you want to make forward progress with this split it into smaller series. What would be useful for example would be one series each to split the global inode_lock and dcache_lock, without introducing all the fancy new locking primitives, per-bucket locks and lru schemes for a start. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Dave Chinner on 23 Jul 2010 07:20 On Fri, Jul 23, 2010 at 05:01:00AM +1000, Nick Piggin wrote: > I'm pleased to announce I have a git tree up of my vfs scalability work. > > git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git > http://git.kernel.org/?p=linux/kernel/git/npiggin/linux-npiggin.git > > Branch vfs-scale-working I've got a couple of patches needed to build XFS - they shrinker merge left some bad fragments - I'll post them in a minute. This email is for the longest ever lockdep warning I've seen that occurred on boot. Cheers, Dave. [ 6.368707] ====================================================== [ 6.369773] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ] [ 6.370379] 2.6.35-rc5-dgc+ #58 [ 6.370882] ------------------------------------------------------ [ 6.371475] pmcd/2124 [HC0[0]:SC0[1]:HE1:SE0] is trying to acquire: [ 6.372062] (&sb->s_type->i_lock_key#6){+.+...}, at: [<ffffffff81736f8c>] socket_get_id+0x3c/0x60 [ 6.372268] [ 6.372268] and this task is already holding: [ 6.372268] (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff81791750>] established_get_first+0x60/0x120 [ 6.372268] which would create a new lock dependency: [ 6.372268] (&(&hashinfo->ehash_locks[i])->rlock){+.-...} -> (&sb->s_type->i_lock_key#6){+.+...} [ 6.372268] [ 6.372268] but this new dependency connects a SOFTIRQ-irq-safe lock: [ 6.372268] (&(&hashinfo->ehash_locks[i])->rlock){+.-...} [ 6.372268] ... which became SOFTIRQ-irq-safe at: [ 6.372268] [<ffffffff810b3b26>] __lock_acquire+0x576/0x1450 [ 6.372268] [<ffffffff810b4aa6>] lock_acquire+0xa6/0x160 [ 6.372268] [<ffffffff8182bb26>] _raw_spin_lock+0x36/0x70 [ 6.372268] [<ffffffff8177a1ba>] __inet_hash_nolisten+0xfa/0x180 [ 6.372268] [<ffffffff8179392a>] tcp_v4_syn_recv_sock+0x1aa/0x2d0 [ 6.372268] [<ffffffff81795502>] tcp_check_req+0x202/0x440 [ 6.372268] [<ffffffff817948c4>] tcp_v4_do_rcv+0x304/0x4f0 [ 6.372268] [<ffffffff81795134>] tcp_v4_rcv+0x684/0x7e0 [ 6.372268] [<ffffffff81771512>] ip_local_deliver+0xe2/0x1c0 [ 6.372268] [<ffffffff81771af7>] ip_rcv+0x397/0x760 [ 6.372268] [<ffffffff8174d067>] __netif_receive_skb+0x277/0x330 [ 6.372268] [<ffffffff8174d1f4>] process_backlog+0xd4/0x1e0 [ 6.372268] [<ffffffff8174dc38>] net_rx_action+0x188/0x2b0 [ 6.372268] [<ffffffff81084cc2>] __do_softirq+0xd2/0x260 [ 6.372268] [<ffffffff81035edc>] call_softirq+0x1c/0x50 [ 6.372268] [<ffffffff8108551b>] local_bh_enable_ip+0xeb/0xf0 [ 6.372268] [<ffffffff8182c544>] _raw_spin_unlock_bh+0x34/0x40 [ 6.372268] [<ffffffff8173c59e>] release_sock+0x14e/0x1a0 [ 6.372268] [<ffffffff817a3975>] inet_stream_connect+0x75/0x320 [ 6.372268] [<ffffffff81737917>] sys_connect+0xa7/0xc0 [ 6.372268] [<ffffffff81034ff2>] system_call_fastpath+0x16/0x1b [ 6.372268] [ 6.372268] to a SOFTIRQ-irq-unsafe lock: [ 6.372268] (&sb->s_type->i_lock_key#6){+.+...} [ 6.372268] ... which became SOFTIRQ-irq-unsafe at: [ 6.372268] ... [<ffffffff810b3b73>] __lock_acquire+0x5c3/0x1450 [ 6.372268] [<ffffffff810b4aa6>] lock_acquire+0xa6/0x160 [ 6.372268] [<ffffffff8182bb26>] _raw_spin_lock+0x36/0x70 [ 6.372268] [<ffffffff8116af72>] new_inode+0x52/0xd0 [ 6.372268] [<ffffffff81174a40>] get_sb_pseudo+0xb0/0x180 [ 6.372268] [<ffffffff81735a41>] sockfs_get_sb+0x21/0x30 [ 6.372268] [<ffffffff81152dba>] vfs_kern_mount+0x8a/0x1e0 [ 6.372268] [<ffffffff81152f29>] kern_mount_data+0x19/0x20 [ 6.372268] [<ffffffff81e1c075>] sock_init+0x4e/0x59 [ 6.372268] [<ffffffff810001dc>] do_one_initcall+0x3c/0x1a0 [ 6.372268] [<ffffffff81de5767>] kernel_init+0x17a/0x204 [ 6.372268] [<ffffffff81035de4>] kernel_thread_helper+0x4/0x10 [ 6.372268] [ 6.372268] other info that might help us debug this: [ 6.372268] [ 6.372268] 3 locks held by pmcd/2124: [ 6.372268] #0: (&p->lock){+.+.+.}, at: [<ffffffff81171dae>] seq_read+0x3e/0x430 [ 6.372268] #1: (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff81791750>] established_get_first+0x60/0x120 [ 6.372268] #2: (clock-AF_INET){++....}, at: [<ffffffff8173b6ae>] sock_i_ino+0x2e/0x70 [ 6.372268] [ 6.372268] the dependencies between SOFTIRQ-irq-safe lock and the holding lock: [ 6.372268] -> (&(&hashinfo->ehash_locks[i])->rlock){+.-...} ops: 3 { [ 6.372268] HARDIRQ-ON-W at: [ 6.372268] [<ffffffff810b3b47>] __lock_acquire+0x597/0x1450 [ 6.372268] [<ffffffff810b4aa6>] lock_acquire+0xa6/0x160 [ 6.372268] [<ffffffff8182bb26>] _raw_spin_lock+0x36/0x70 [ 6.372268] [<ffffffff8177a1ba>] __inet_hash_nolisten+0xfa/0x180 [ 6.372268] [<ffffffff8177ab6a>] __inet_hash_connect+0x33a/0x3d0 [ 6.372268] [<ffffffff8177ac4f>] inet_hash_connect+0x4f/0x60 [ 6.372268] [<ffffffff81792522>] tcp_v4_connect+0x272/0x4f0 [ 6.372268] [<ffffffff817a3b8e>] inet_stream_connect+0x28e/0x320 [ 6.372268] [<ffffffff81737917>] sys_connect+0xa7/0xc0 [ 6.372268] [<ffffffff81034ff2>] system_call_fastpath+0x16/0x1b [ 6.372268] IN-SOFTIRQ-W at: [ 6.372268] [<ffffffff810b3b26>] __lock_acquire+0x576/0x1450 [ 6.372268] [<ffffffff810b4aa6>] lock_acquire+0xa6/0x160 [ 6.372268] [<ffffffff8182bb26>] _raw_spin_lock+0x36/0x70 [ 6.372268] [<ffffffff8177a1ba>] __inet_hash_nolisten+0xfa/0x180 [ 6.372268] [<ffffffff8179392a>] tcp_v4_syn_recv_sock+0x1aa/0x2d0 [ 6.372268] [<ffffffff81795502>] tcp_check_req+0x202/0x440 [ 6.372268] [<ffffffff817948c4>] tcp_v4_do_rcv+0x304/0x4f0 [ 6.372268] [<ffffffff81795134>] tcp_v4_rcv+0x684/0x7e0 [ 6.372268] [<ffffffff81771512>] ip_local_deliver+0xe2/0x1c0 [ 6.372268] [<ffffffff81771af7>] ip_rcv+0x397/0x760 [ 6.372268] [<ffffffff8174d067>] __netif_receive_skb+0x277/0x330 [ 6.372268] [<ffffffff8174d1f4>] process_backlog+0xd4/0x1e0 [ 6.372268] [<ffffffff8174dc38>] net_rx_action+0x188/0x2b0 [ 6.372268] [<ffffffff81084cc2>] __do_softirq+0xd2/0x260 [ 6.372268] [<ffffffff81035edc>] call_softirq+0x1c/0x50 [ 6.372268] [<ffffffff8108551b>] local_bh_enable_ip+0xeb/0xf0 [ 6.372268] [<ffffffff8182c544>] _raw_spin_unlock_bh+0x34/0x40 [ 6.372268] [<ffffffff8173c59e>] release_sock+0x14e/0x1a0 [ 6.372268] [<ffffffff817a3975>] inet_stream_connect+0x75/0x320 [ 6.372268] [<ffffffff81737917>] sys_connect+0xa7/0xc0 [ 6.372268] [<ffffffff81034ff2>] system_call_fastpath+0x16/0x1b [ 6.372268] INITIAL USE at: [ 6.372268] [<ffffffff810b37e2>] __lock_acquire+0x232/0x1450 [ 6.372268] [<ffffffff810b4aa6>] lock_acquire+0xa6/0x160 [ 6.372268] [<ffffffff8182bb26>] _raw_spin_lock+0x36/0x70 [ 6.372268] [<ffffffff8177a1ba>] __inet_hash_nolisten+0xfa/0x180 [ 6.372268] [<ffffffff8177ab6a>] __inet_hash_connect+0x33a/0x3d0 [ 6.372268] [<ffffffff8177ac4f>] inet_hash_connect+0x4f/0x60 [ 6.372268] [<ffffffff81792522>] tcp_v4_connect+0x272/0x4f0 [ 6.372268] [<ffffffff817a3b8e>] inet_stream_connect+0x28e/0x320 [ 6.372268] [<ffffffff81737917>] sys_connect+0xa7/0xc0 [ 6.372268] [<ffffffff81034ff2>] system_call_fastpath+0x16/0x1b [ 6.372268] } [ 6.372268] ... key at: [<ffffffff8285ddf8>] __key.47027+0x0/0x8 [ 6.372268] ... acquired at: [ 6.372268] [<ffffffff810b2940>] check_irq_usage+0x60/0xf0 [ 6.372268] [<ffffffff810b41ff>] __lock_acquire+0xc4f/0x1450 [ 6.372268] [<ffffffff810b4aa6>] lock_acquire+0xa6/0x160 [ 6.372268] [<ffffffff8182bb26>] _raw_spin_lock+0x36/0x70 [ 6.372268] [<ffffffff81736f8c>] socket_get_id+0x3c/0x60 [ 6.372268] [<ffffffff8173b6c3>] sock_i_ino+0x43/0x70 [ 6.372268] [<ffffffff81790fc9>] tcp4_seq_show+0x1a9/0x520 [ 6.372268] [<ffffffff81172005>] seq_read+0x295/0x430 [ 6.372268] [<ffffffff811ad9f4>] proc_reg_read+0x84/0xc0 [ 6.372268] [<ffffffff81150165>] vfs_read+0xb5/0x170 [ 6.372268] [<ffffffff81150274>] sys_read+0x54/0x90 [ 6.372268] [<ffffffff81034ff2>] system_call_fastpath+0x16/0x1b [ 6.372268] [ 6.372268] [ 6.372268] the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock: [ 6.372268] -> (&sb->s_type->i_lock_key#6){+.+...} ops: 1185 { [ 6.372268] HARDIRQ-ON-W at: [ 6.372268] [<ffffffff810b3b47>] __lock_acquire+0x597/0x1450 [ 6.372268] [<ffffffff810b4aa6>] lock_acquire+0xa6/0x160 [ 6.372268] [<ffffffff8182bb26>] _raw_spin_lock+0x36/0x70 [ 6.372268] [<ffffffff8116af72>] new_inode+0x52/0xd0 [ 6.372268] [<ffffffff81174a40>] get_sb_pseudo+0xb0/0x180 [ 6.372268] [<ffffffff81735a41>] sockfs_get_sb+0x21/0x30 [ 6.372268] [<ffffffff81152dba>] vfs_kern_mount+0x8a/0x1e0 [ 6.372268] [<ffffffff81152f29>] kern_mount_data+0x19/0x20 [ 6.372268] [<ffffffff81e1c075>] sock_init+0x4e/0x59 [ 6.372268] [<ffffffff810001dc>] do_one_initcall+0x3c/0x1a0 [ 6.372268] [<ffffffff81de5767>] kernel_init+0x17a/0x204 [ 6.372268] [<ffffffff81035de4>] kernel_thread_helper+0x4/0x10 [ 6.372268] SOFTIRQ-ON-W at: [ 6.372268] [<ffffffff810b3b73>] __lock_acquire+0x5c3/0x1450 [ 6.372268] [<ffffffff810b4aa6>] lock_acquire+0xa6/0x160 [ 6.372268] [<ffffffff8182bb26>] _raw_spin_lock+0x36/0x70 [ 6.372268] [<ffffffff8116af72>] new_inode+0x52/0xd0 [ 6.372268] [<ffffffff81174a40>] get_sb_pseudo+0xb0/0x180 [ 6.372268] [<ffffffff81735a41>] sockfs_get_sb+0x21/0x30 [ 6.372268] [<ffffffff81152dba>] vfs_kern_mount+0x8a/0x1e0 [ 6.372268] [<ffffffff81152f29>] kern_mount_data+0x19/0x20 [ 6.372268] [<ffffffff81e1c075>] sock_init+0x4e/0x59 [ 6.372268] [<ffffffff810001dc>] do_one_initcall+0x3c/0x1a0 [ 6.372268] [<ffffffff81de5767>] kernel_init+0x17a/0x204 [ 6.372268] [<ffffffff81035de4>] kernel_thread_helper+0x4/0x10 [ 6.372268] INITIAL USE at: [ 6.372268] [<ffffffff810b37e2>] __lock_acquire+0x232/0x1450 [ 6.372268] [<ffffffff810b4aa6>] lock_acquire+0xa6/0x160 [ 6.372268] [<ffffffff8182bb26>] _raw_spin_lock+0x36/0x70 [ 6.372268] [<ffffffff8116af72>] new_inode+0x52/0xd0 [ 6.372268] [<ffffffff81174a40>] get_sb_pseudo+0xb0/0x180 [ 6.372268] [<f [<ffffffff81152dba>] vfs_kern_mount+0x8a/0x1e0 [ 6.372268] [<ffffffff81152f29>] kern_mount_data+0x19/0x20 [ 6.372268] [<ffffffff81e1c075>] sock_init+0x4e/0x59 [ 6.372268] [<ffffffff810001dc>] do_one_initcall+0x3c/0x1a0 [ 6.372268] [<ffffffff81de5767>] kernel_init+0x17a/0x204 [ 6.372268] [<ffffffff81035de4>] kernel_thread_helper+0x4/0x10 [ 6.372268] } [ 6.372268] ... key at: [<ffffffff81bd5bd8>] sock_fs_type+0x58/0x80 [ 6.372268] ... acquired at: [ 6.372268] [<ffffffff810b2940>] check_irq_usage+0x60/0xf0 [ 6.372268] [<ffffffff810b41ff>] __lock_acquire+0xc4f/0x1450 [ 6.372268] [<ffffffff810b4aa6>] lock_acquire+0xa6/0x160 [ 6.372268] [<ffffffff8182bb26>] _raw_spin_lock+0x36/0x70 [ 6.372268] [<ffffffff81736f8c>] socket_get_id+0x3c/0x60 [ 6.372268] [<ffffffff8173b6c3>] sock_i_ino+0x43/0x70 [ 6.372268] [<ffffffff81790fc9>] tcp4_seq_show+0x1a9/0x520 [ 6.372268] [<ffffffff81172005>] seq_read+0x295/0x430 [ 6.372268] [<ffffffff811ad9f4>] proc_reg_read+0x84/0xc0 [ 6.372268] [<ffffffff81150165>] vfs_read+0xb5/0x170 [ 6.372268] [<ffffffff81150274>] sys_read+0x54/0x90 [ 6.372268] [<ffffffff81034ff2>] system_call_fastpath+0x16/0x1b [ 6.372268] [ 6.372268] [ 6.372268] stack backtrace: [ 6.372268] Pid: 2124, comm: pmcd Not tainted 2.6.35-rc5-dgc+ #58 [ 6.372268] Call Trace: [ 6.372268] [<ffffffff810b28d9>] check_usage+0x499/0x4a0 [ 6.372268] [<ffffffff810b24c6>] ? check_usage+0x86/0x4a0 [ 6.372268] [<ffffffff810af729>] ? __bfs+0x129/0x260 [ 6.372268] [<ffffffff810b2940>] check_irq_usage+0x60/0xf0 [ 6.372268] [<ffffffff810b41ff>] __lock_acquire+0xc4f/0x1450 [ 6.372268] [<ffffffff810b4aa6>] lock_acquire+0xa6/0x160 [ 6.372268] [<ffffffff81736f8c>] ? socket_get_id+0x3c/0x60 [ 6.372268] [<ffffffff8182bb26>] _raw_spin_lock+0x36/0x70 [ 6.372268] [<ffffffff81736f8c>] ? socket_get_id+0x3c/0x60 [ 6.372268] [<ffffffff81736f8c>] socket_get_id+0x3c/0x60 [ 6.372268] [<ffffffff8173b6c3>] sock_i_ino+0x43/0x70 [ 6.372268] [<ffffffff81790fc9>] tcp4_seq_show+0x1a9/0x520 [ 6.372268] [<ffffffff81791750>] ? established_get_first+0x60/0x120 [ 6.372268] [<ffffffff8182beb7>] ? _raw_spin_lock_bh+0x67/0x70 [ 6.372268] [<ffffffff81172005>] seq_read+0x295/0x430 [ 6.372268] [<ffffffff81171d70>] ? seq_read+0x0/0x430 [ 6.372268] [<ffffffff811ad9f4>] proc_reg_read+0x84/0xc0 [ 6.372268] [<ffffffff81150165>] vfs_read+0xb5/0x170 [ 6.372268] [<ffffffff81150274>] sys_read+0x54/0x90 [ 6.372268] [<ffffffff81034ff2>] system_call_fastpath+0x16/0x1b -- Dave Chinner david(a)fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Dave Chinner on 23 Jul 2010 10:00 On Fri, Jul 23, 2010 at 05:01:00AM +1000, Nick Piggin wrote: > I'm pleased to announce I have a git tree up of my vfs scalability work. > > git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git > http://git.kernel.org/?p=linux/kernel/git/npiggin/linux-npiggin.git > > Branch vfs-scale-working Bug's I've noticed so far: - Using XFS, the existing vfs inode count statistic does not decrease as inodes are free. - the existing vfs dentry count remains at zero - the existing vfs free inode count remains at zero $ pminfo -f vfs.inodes vfs.dentry vfs.inodes.count value 7472612 vfs.inodes.free value 0 vfs.dentry.count value 0 vfs.dentry.free value 0 Performance Summary: With lockdep and CONFIG_XFS_DEBUG enabled, a 16 thread parallel sequential create/unlink workload on an 8p/4GB RAM VM with a virtio block device sitting on a short-stroked 12x2TB SAS array w/ 512MB BBWC in RAID0 via dm and using the noop elevator in the guest VM: $ sudo mkfs.xfs -f -l size=128m -d agcount=16 /dev/vdb meta-data=/dev/vdb isize=256 agcount=16, agsize=1638400 blks = sectsz=512 attr=2 data = bsize=4096 blocks=26214400, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=32768, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 $ sudo mount -o delaylog,logbsize=262144,nobarrier /dev/vdb /mnt/scratch $ sudo chmod 777 /mnt/scratch $ cd ~/src/fs_mark-3.3/ $ ./fs_mark -S0 -n 500000 -s 0 -d /mnt/scratch/0 -d /mnt/scratch/1 -d /mnt/scratch/3 -d /mnt/scratch/2 -d /mnt/scratch/4 -d /mnt/scratch/5 -d /mnt/scratch/6 -d /mnt/scratch/7 -d /mnt/scratch/8 -d /mnt/scratch/9 -d /mnt/scratch/10 -d /mnt/scratch/11 -d /mnt/scratch/12 -d /mnt/scratch/13 -d /mnt/scratch/14 -d /mnt/scratch/15 files/s 2.6.34-rc4 12550 2.6.35-rc5+scale 12285 So the same within the error margins of the benchmark. Screenshot of monitoring graphs - you can see the effect of the broken stats: http://userweb.kernel.org/~dgc/shrinker-2.6.36/fs_mark-2.6.35-rc4-16x500-xfs.png http://userweb.kernel.org/~dgc/shrinker-2.6.36/fs_mark-2.6.35-rc5-npiggin-scale-lockdep-16x500-xfs.png With a production build (i.e. no lockdep, no xfs debug), I'll run the same fs_mark parallel create/unlink workload to show scalability as I ran here: http://oss.sgi.com/archives/xfs/2010-05/msg00329.html The numbers can't be directly compared, but the test and the setup is the same. The XFS numbers below are with delayed logging enabled. ext4 is using default mkfs and mount parameters except for barrier=0. All numbers are averages of three runs. fs_mark rate (thousands of files/second) 2.6.35-rc5 2.6.35-rc5-scale threads xfs ext4 xfs ext4 1 20 39 20 39 2 35 55 35 57 4 60 41 57 42 8 79 9 75 9 ext4 is getting IO bound at more than 2 threads, so apart from pointing out that XFS is 8-9x faster than ext4 at 8 thread, I'm going to ignore ext4 for the purposes of testing scalability here. For XFS w/ delayed logging, 2.6.35-rc5 is only getting to about 600% CPU and with Nick's patches it's about 650% (10% higher) for slightly lower throughput. So at this class of machine for this workload, the changes result in a slight reduction in scalability. I looked at dbench on XFS as well, but didn't see any significant change in the numbers at up to 200 load threads, so not much to talk about there. Sometime over the weekend I'll build a 16p VM and see what I get from that... Cheers, Dave. -- Dave Chinner david(a)fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on 23 Jul 2010 11:40 On Fri, Jul 23, 2010 at 05:01:00AM +1000, Nick Piggin wrote: > I'm pleased to announce I have a git tree up of my vfs scalability work. > > git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git > http://git.kernel.org/?p=linux/kernel/git/npiggin/linux-npiggin.git > Summary of a few numbers I've run. google's socket teardown workload > runs 3-4x faster on my 2 socket Opteron. Single thread git diff runs 20% > on same machine. 32 node Altix runs dbench on ramfs 150x faster (100MB/s > up to 15GB/s). Following post just contains some preliminary benchmark numbers on a POWER7. Boring if you're not interested in this stuff. IBM and Mikey kindly allowed me to do some test runs on a big POWER7 system today. Very is the only word I'm authorized to describe how big is big. We tested the vfs-scale-working and master branches from my git tree as of today. I'll stick with relative numbers to be safe. All tests were run on ramfs. First and very important is single threaded performance of basic code. POWER7 is obviously vastly different from a Barcelona or Nehalem. and store-free path walk uses a lot of seqlocks, which are cheap on x86, a little more epensive on others. Test case time difference, vanilla to vfs-scale (negative is better) stat() -10.8% +/- 0.3% close(open()) 4.3% +/- 0.3% unlink(creat()) 36.8% +/- 0.3% stat is significantly faster which is really good. open/close is a bit slower which we didn't get time to analyse. There are one or two seqlock checks which might be avoided, which could make up the difference. It's not horrible, but I hope to get POWER7 open/close more competitive (on x86 open/close is even a bit faster). Note this is a worst case for rcu-path-walk: lookup of "./file", because it has to take refcount on the final element. With more elements, rcu walk should gain the advantage. creat/unlink is showing the big RCU penalty. However I have penciled out a working design with Linus of how to do SLAB_DESTROY_BY_RCU. However it makes the store-free path walking and some inode RCU list walking a little bit trickier, so I prefer not to dump too much on at once. There is something that can be done if regressions show up. I don't anticipate many regressions outside microbenchmarks, and this is about the absolute worst case. On to parallel tests. Firstly, the google socket workload. Running with "NR_THREADS" children, vfs-scale patches do this: root(a)p7ih06:~/google# time ./google --files_per_cpu 10000 > /dev/null real 0m4.976s user 8m38.925s sys 6m45.236s root(a)p7ih06:~/google# time ./google --files_per_cpu 20000 > /dev/null real 0m7.816s user 11m21.034s sys 14m38.258s root(a)p7ih06:~/google# time ./google --files_per_cpu 40000 > /dev/null real 0m11.358s user 11m37.955s sys 28m44.911s Reducing to NR_THREADS/4 children allows vanilla to complete: root(a)p7ih06:~/google# time ./google --files_per_cpu 10000 real 1m23.118s user 3m31.820s sys 81m10.405s I was actually surprised it did that well. Dbench was an interesting one. We didn't manage to stretch the box's legs, unfortunately! dbench with 1 proc gave about 500MB/s, 64 procs gave 21GB/s, 128 and throughput dropped dramatically. Turns out that weird things start happening with rename seqlock versus d_lookup, and d_move contention (dbench does a sprinkle of renaming). That can be improved I think, but noth worth bothering with for the time being. It's not really worth testing vanilla at high dbench parallelism. Parallel git diff workload looked OK. It seemed to be scaling fine in the vfs, but it hit a bottlneck in powerpc's tlb invalidation, so numbers may not be so interesting. Lastly, some parallel syscall microbenchmarks: procs vanilla vfs-scale open-close, seperate-cwd 1 384557.70 355923.82 op/s/proc NR_CORES 86.63 164054.64 op/s/proc NR_THREADS 18.68 (ouch!) open-close, same-cwd 1 381074.32 339161.25 NR_CORES 104.16 107653.05 creat-unlink, seperate-cwd 1 145891.05 104301.06 NR_CORES 29.81 10061.66 creat-unlink, same-cwd 1 129681.27 104301.06 NR_CORES 12.68 181.24 So we can see the single thread performance regressions here, but the vanilla case really chokes at high CPU counts. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on 23 Jul 2010 11:50 On Fri, Jul 23, 2010 at 07:17:46AM -0400, Christoph Hellwig wrote: > I might sound like a broken record, but if you want to make forward > progress with this split it into smaller series. No I appreciate the advice. I put this tree up for people to fetch without posting patches all the time. I think it is important to test and to see the big picture when reviewing the patches, but you are right about how to actually submit patches on the ML. > What would be useful for example would be one series each to split > the global inode_lock and dcache_lock, without introducing all the > fancy new locking primitives, per-bucket locks and lru schemes for > a start. I've kept the series fairly well structured like that. Basically it is in these parts: 1. files lock 2. vfsmount lock 3. mnt refcount 4a. put several new global spinlocks around different parts of dcache 4b. remove dcache_lock after the above protect everything 4c. start doing fine grained locking of hash, inode alias, lru, etc etc 5a, 5b, 5c. same for inodes 6. some further optimisations and cleanups 7. store-free path walking This kind of sequence. I will again try to submit a first couple of things to Al soon. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Next
|
Last
Pages: 1 2 3 4 Prev: [PATCH rfc] firewire: cdev: improve FW_CDEV_IOC_ALLOCATE Next: tmio_mmc: Make ack_mmc_irqs() write-only |