Prev: [PATCH rfc] firewire: cdev: improve FW_CDEV_IOC_ALLOCATE
Next: tmio_mmc: Make ack_mmc_irqs() write-only
From: Nick Piggin on 23 Jul 2010 12:00 On Fri, Jul 23, 2010 at 09:13:10PM +1000, Dave Chinner wrote: > On Fri, Jul 23, 2010 at 05:01:00AM +1000, Nick Piggin wrote: > > I'm pleased to announce I have a git tree up of my vfs scalability work. > > > > git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git > > http://git.kernel.org/?p=linux/kernel/git/npiggin/linux-npiggin.git > > > > Branch vfs-scale-working > > I've got a couple of patches needed to build XFS - they shrinker > merge left some bad fragments - I'll post them in a minute. This OK cool. > email is for the longest ever lockdep warning I've seen that > occurred on boot. Ah thanks. OK that was one of my attempts to keep sockets out of hidding the vfs as much as possible (lazy inode number evaluation). Not a big problem, but I'll drop the patch for now. I have just got one for you too, btw :) (on vanilla kernel but it is messing up my lockdep stress testing on xfs). Real or false? [ INFO: possible circular locking dependency detected ] 2.6.35-rc5-00064-ga9f7f2e #334 ------------------------------------------------------- kswapd0/605 is trying to acquire lock: (&(&ip->i_lock)->mr_lock){++++--}, at: [<ffffffff8125500c>] xfs_ilock+0x7c/0xa0 but task is already holding lock: (&xfs_mount_list_lock){++++.-}, at: [<ffffffff81281a76>] xfs_reclaim_inode_shrink+0xc6/0x140 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&xfs_mount_list_lock){++++.-}: [<ffffffff8106ef9a>] lock_acquire+0x5a/0x70 [<ffffffff815aa646>] _raw_spin_lock+0x36/0x50 [<ffffffff810fabf3>] try_to_free_buffers+0x43/0xb0 [<ffffffff812763b2>] xfs_vm_releasepage+0x92/0xe0 [<ffffffff810908ee>] try_to_release_page+0x2e/0x50 [<ffffffff8109ef56>] shrink_page_list+0x486/0x5a0 [<ffffffff8109f35d>] shrink_inactive_list+0x2ed/0x700 [<ffffffff8109fda0>] shrink_zone+0x3b0/0x460 [<ffffffff810a0f41>] try_to_free_pages+0x241/0x3a0 [<ffffffff810999e2>] __alloc_pages_nodemask+0x4c2/0x6b0 [<ffffffff810c52c6>] alloc_pages_current+0x76/0xf0 [<ffffffff8109205b>] __page_cache_alloc+0xb/0x10 [<ffffffff81092a2a>] find_or_create_page+0x4a/0xa0 [<ffffffff812780cc>] _xfs_buf_lookup_pages+0x14c/0x360 [<ffffffff81279122>] xfs_buf_get+0x72/0x160 [<ffffffff8126eb68>] xfs_trans_get_buf+0xc8/0xf0 [<ffffffff8124439f>] xfs_da_do_buf+0x3df/0x6d0 [<ffffffff81244825>] xfs_da_get_buf+0x25/0x30 [<ffffffff8124a076>] xfs_dir2_data_init+0x46/0xe0 [<ffffffff81247f89>] xfs_dir2_sf_to_block+0xb9/0x5a0 [<ffffffff812501c8>] xfs_dir2_sf_addname+0x418/0x5c0 [<ffffffff81247d7c>] xfs_dir_createname+0x14c/0x1a0 [<ffffffff81271d49>] xfs_create+0x449/0x5d0 [<ffffffff8127d802>] xfs_vn_mknod+0xa2/0x1b0 [<ffffffff8127d92b>] xfs_vn_create+0xb/0x10 [<ffffffff810ddc81>] vfs_create+0x81/0xd0 [<ffffffff810df1a5>] do_last+0x535/0x690 [<ffffffff810e11fd>] do_filp_open+0x21d/0x660 [<ffffffff810d16b4>] do_sys_open+0x64/0x140 [<ffffffff810d17bb>] sys_open+0x1b/0x20 [<ffffffff810023eb>] system_call_fastpath+0x16/0x1b :-> #0 (&(&ip->i_lock)->mr_lock){++++--}: [<ffffffff8106ef10>] __lock_acquire+0x1be0/0x1c10 [<ffffffff8106ef9a>] lock_acquire+0x5a/0x70 [<ffffffff8105dfba>] down_write_nested+0x4a/0x70 [<ffffffff8125500c>] xfs_ilock+0x7c/0xa0 [<ffffffff81280c98>] xfs_reclaim_inode+0x98/0x250 [<ffffffff81281824>] xfs_inode_ag_walk+0x74/0x120 [<ffffffff81281953>] xfs_inode_ag_iterator+0x83/0xe0 [<ffffffff81281aa4>] xfs_reclaim_inode_shrink+0xf4/0x140 [<ffffffff8109ff7d>] shrink_slab+0x12d/0x190 [<ffffffff810a07ad>] balance_pgdat+0x43d/0x6f0 [<ffffffff810a0b1e>] kswapd+0xbe/0x2a0 [<ffffffff810592ae>] kthread+0x8e/0xa0 [<ffffffff81003194>] kernel_thread_helper+0x4/0x10 other info that might help us debug this: 2 locks held by kswapd0/605: #0: (shrinker_rwsem){++++..}, at: [<ffffffff8109fe88>] shrink_slab+0x38/0x190 #1: (&xfs_mount_list_lock){++++.-}, at: [<ffffffff81281a76>] xfs_reclaim_inode_shrink+0xc6/0x140 stack backtrace: Pid: 605, comm: kswapd0 Not tainted 2.6.35-rc5-00064-ga9f7f2e #334 Call Trace: [<ffffffff8106c5d9>] print_circular_bug+0xe9/0xf0 [<ffffffff8106ef10>] __lock_acquire+0x1be0/0x1c10 [<ffffffff8106e3c2>] ? __lock_acquire+0x1092/0x1c10 [<ffffffff8106ef9a>] lock_acquire+0x5a/0x70 [<ffffffff8125500c>] ? xfs_ilock+0x7c/0xa0 [<ffffffff8105dfba>] down_write_nested+0x4a/0x70 [<ffffffff8125500c>] ? xfs_ilock+0x7c/0xa0 [<ffffffff815ae795>] ? sub_preempt_count+0x95/0xd0 [<ffffffff8125500c>] xfs_ilock+0x7c/0xa0 [<ffffffff81280c98>] xfs_reclaim_inode+0x98/0x250 [<ffffffff81281824>] xfs_inode_ag_walk+0x74/0x120 [<ffffffff81280c00>] ? xfs_reclaim_inode+0x0/0x250 [<ffffffff81281953>] xfs_inode_ag_iterator+0x83/0xe0 [<ffffffff81280c00>] ? xfs_reclaim_inode+0x0/0x250 [<ffffffff81281aa4>] xfs_reclaim_inode_shrink+0xf4/0x140 [<ffffffff8109ff7d>] shrink_slab+0x12d/0x190 [<ffffffff810a07ad>] balance_pgdat+0x43d/0x6f0 [<ffffffff810a0b1e>] kswapd+0xbe/0x2a0 [<ffffffff81059700>] ? autoremove_wake_function+0x0/0x40 [<ffffffff815aaf3d>] ? _raw_spin_unlock_irqrestore+0x3d/0x70 [<ffffffff810a0a60>] ? kswapd+0x0/0x2a0 [<ffffffff810592ae>] kthread+0x8e/0xa0 [<ffffffff81003194>] kernel_thread_helper+0x4/0x10 [<ffffffff815ab400>] ? restore_args+0x0/0x30 [<ffffffff81059220>] ? kthread+0x0/0xa0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on 23 Jul 2010 12:20 On Fri, Jul 23, 2010 at 11:55:14PM +1000, Dave Chinner wrote: > On Fri, Jul 23, 2010 at 05:01:00AM +1000, Nick Piggin wrote: > > I'm pleased to announce I have a git tree up of my vfs scalability work. > > > > git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git > > http://git.kernel.org/?p=linux/kernel/git/npiggin/linux-npiggin.git > > > > Branch vfs-scale-working > > Bug's I've noticed so far: > > - Using XFS, the existing vfs inode count statistic does not decrease > as inodes are free. > - the existing vfs dentry count remains at zero > - the existing vfs free inode count remains at zero > > $ pminfo -f vfs.inodes vfs.dentry > > vfs.inodes.count > value 7472612 > > vfs.inodes.free > value 0 > > vfs.dentry.count > value 0 > > vfs.dentry.free > value 0 Hm, I must have broken it along the way and not noticed. Thanks for pointing that out. > With a production build (i.e. no lockdep, no xfs debug), I'll > run the same fs_mark parallel create/unlink workload to show > scalability as I ran here: > > http://oss.sgi.com/archives/xfs/2010-05/msg00329.html > > The numbers can't be directly compared, but the test and the setup > is the same. The XFS numbers below are with delayed logging > enabled. ext4 is using default mkfs and mount parameters except for > barrier=0. All numbers are averages of three runs. > > fs_mark rate (thousands of files/second) > 2.6.35-rc5 2.6.35-rc5-scale > threads xfs ext4 xfs ext4 > 1 20 39 20 39 > 2 35 55 35 57 > 4 60 41 57 42 > 8 79 9 75 9 > > ext4 is getting IO bound at more than 2 threads, so apart from > pointing out that XFS is 8-9x faster than ext4 at 8 thread, I'm > going to ignore ext4 for the purposes of testing scalability here. > > For XFS w/ delayed logging, 2.6.35-rc5 is only getting to about 600% > CPU and with Nick's patches it's about 650% (10% higher) for > slightly lower throughput. So at this class of machine for this > workload, the changes result in a slight reduction in scalability. That's a good test case, thanks. I'll see if I can find where this is coming from. I will suspect RCU-inodes I suppose. Hm, may have to make them DESTROY_BY_RCU afterall. Thanks, Nick -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Dave Chinner on 23 Jul 2010 20:30 On Sat, Jul 24, 2010 at 01:51:18AM +1000, Nick Piggin wrote: > On Fri, Jul 23, 2010 at 09:13:10PM +1000, Dave Chinner wrote: > > On Fri, Jul 23, 2010 at 05:01:00AM +1000, Nick Piggin wrote: > > > I'm pleased to announce I have a git tree up of my vfs scalability work. > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git > > > http://git.kernel.org/?p=linux/kernel/git/npiggin/linux-npiggin.git > > > > > > Branch vfs-scale-working > > > > I've got a couple of patches needed to build XFS - they shrinker > > merge left some bad fragments - I'll post them in a minute. This > > OK cool. > > > > email is for the longest ever lockdep warning I've seen that > > occurred on boot. > > Ah thanks. OK that was one of my attempts to keep sockets out of > hidding the vfs as much as possible (lazy inode number evaluation). > Not a big problem, but I'll drop the patch for now. > > I have just got one for you too, btw :) (on vanilla kernel but it is > messing up my lockdep stress testing on xfs). Real or false? > > [ INFO: possible circular locking dependency detected ] > 2.6.35-rc5-00064-ga9f7f2e #334 > ------------------------------------------------------- > kswapd0/605 is trying to acquire lock: > (&(&ip->i_lock)->mr_lock){++++--}, at: [<ffffffff8125500c>] > xfs_ilock+0x7c/0xa0 > > but task is already holding lock: > (&xfs_mount_list_lock){++++.-}, at: [<ffffffff81281a76>] > xfs_reclaim_inode_shrink+0xc6/0x140 False positive, but the xfs_mount_list_lock is gone in 2.6.35-rc6 - the shrinker context change has fixed that - so you can ignore it anyway. Cheers, Dave. -- Dave Chinner david(a)fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on 24 Jul 2010 04:50 > At this point, I would be very interested in reviewing, correctness > testing on different configurations, and of course benchmarking. I haven't review this series so long time. but I've found one misterious shrink_slab() usage. can you please see my patch? (I will send it as another mail) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on 24 Jul 2010 07:00 > > At this point, I would be very interested in reviewing, correctness > > testing on different configurations, and of course benchmarking. > > I haven't review this series so long time. but I've found one misterious > shrink_slab() usage. can you please see my patch? (I will send it as > another mail) Plus, I have one question. upstream shrink_slab() calculation and your calculation have bigger change rather than your patch description explained. upstream: shrink_slab() lru_scanned max_pass basic_scan_objects = 4 x ------------- x ----------------------------- lru_pages shrinker->seeks (default:2) scan_objects = min(basic_scan_objects, max_pass * 2) shrink_icache_memory() sysctl_vfs_cache_pressure max_pass = inodes_stat.nr_unused x -------------------------- 100 That said, higher sysctl_vfs_cache_pressure makes higher slab reclaim. In the other hand, your code: shrinker_add_scan() scanned objects scan_objects = 4 x ------------- x ----------- x SHRINK_FACTOR x SHRINK_FACTOR total ratio shrink_icache_memory() ratio = DEFAULT_SEEKS * sysctl_vfs_cache_pressure / 100 That said, higher sysctl_vfs_cache_pressure makes smaller slab reclaim. So, I guess following change honorly refrect your original intention. New calculation is, shrinker_add_scan() scanned scan_objects = ------------- x objects x ratio total shrink_icache_memory() ratio = DEFAULT_SEEKS * sysctl_vfs_cache_pressure / 100 This has the same behavior as upstream. because upstream's 4/shrinker->seeks = 2. also the above has DEFAULT_SEEKS = SHRINK_FACTORx2. =============== o move 'ratio' from denominator to numerator o adapt kvm/mmu_shrink o SHRINK_FACTOR / 2 (default seek) x 4 (unknown shrink slab modifier) -> (SHRINK_FACTOR*2) == DEFAULT_SEEKS --- arch/x86/kvm/mmu.c | 2 +- mm/vmscan.c | 10 ++-------- 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ae5a038..cea1e92 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2942,7 +2942,7 @@ static int mmu_shrink(struct shrinker *shrink, } shrinker_add_scan(&nr_to_scan, scanned, global, cache_count, - DEFAULT_SEEKS*10); + DEFAULT_SEEKS/10); done: cache_count = shrinker_do_scan(&nr_to_scan, SHRINK_BATCH); diff --git a/mm/vmscan.c b/mm/vmscan.c index 89b593e..2d8e9ab 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -208,14 +208,8 @@ void shrinker_add_scan(unsigned long *dst, { unsigned long long delta; - /* - * The constant 4 comes from old code. Who knows why. - * This could all use a good tune up with some decent - * benchmarks and numbers. - */ - delta = (unsigned long long)scanned * objects - * SHRINK_FACTOR * SHRINK_FACTOR * 4UL; - do_div(delta, (ratio * total + 1)); + delta = (unsigned long long)scanned * objects * ratio; + do_div(delta, total+ 1); /* * Avoid risking looping forever due to too large nr value: -- 1.6.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: [PATCH rfc] firewire: cdev: improve FW_CDEV_IOC_ALLOCATE Next: tmio_mmc: Make ack_mmc_irqs() write-only |