Prev: mm,migration: Allow the migration of PageSwapCache pages
Next: [PATCH 4/7] tracing: Fix compile error in module tracepoints when MODULE_UNLOAD not set
From: Tejun Heo on 1 Apr 2010 01:10 Hello, On 04/01/2010 01:28 PM, Cong Wang wrote: >> Hmmm... can you please try to see whether this circular locking >> warning involving wq->lockdep_map is reproducible w/ the bonding >> locking fixed? I still can't see where wq -> cpu_add_remove_lock >> dependency is created. >> > > I thought this is obvious. > > Here it is: > > void destroy_workqueue(struct workqueue_struct *wq) > { > const struct cpumask *cpu_map = wq_cpu_map(wq); > int cpu; > > cpu_maps_update_begin(); <----------------- Hold > cpu_add_remove_lock here > spin_lock(&workqueue_lock); > list_del(&wq->list); > spin_unlock(&workqueue_lock); > > for_each_cpu(cpu, cpu_map) > cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, cpu)); > <------ See below > cpu_maps_update_done(); <----------------- Release > cpu_add_remove_lock here > > ... > static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq) > { > /* > * Our caller is either destroy_workqueue() or CPU_POST_DEAD, > * cpu_add_remove_lock protects cwq->thread. > */ > if (cwq->thread == NULL) > return; > > lock_map_acquire(&cwq->wq->lockdep_map); <-------------- Lockdep > complains here. > lock_map_release(&cwq->wq->lockdep_map); > ... Yeap, the above is cpu_add_remove_lock -> wq->lockdep_map dependency. I can see that but I'm failing to see where the dependency the other direction is created. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on 1 Apr 2010 01:20 Tejun Heo wrote: > Hello, > > On 04/01/2010 01:28 PM, Cong Wang wrote: >>> Hmmm... can you please try to see whether this circular locking >>> warning involving wq->lockdep_map is reproducible w/ the bonding >>> locking fixed? I still can't see where wq -> cpu_add_remove_lock >>> dependency is created. >>> >> I thought this is obvious. >> >> Here it is: >> >> void destroy_workqueue(struct workqueue_struct *wq) >> { >> const struct cpumask *cpu_map = wq_cpu_map(wq); >> int cpu; >> >> cpu_maps_update_begin(); <----------------- Hold >> cpu_add_remove_lock here >> spin_lock(&workqueue_lock); >> list_del(&wq->list); >> spin_unlock(&workqueue_lock); >> >> for_each_cpu(cpu, cpu_map) >> cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, cpu)); >> <------ See below >> cpu_maps_update_done(); <----------------- Release >> cpu_add_remove_lock here >> >> ... >> static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq) >> { >> /* >> * Our caller is either destroy_workqueue() or CPU_POST_DEAD, >> * cpu_add_remove_lock protects cwq->thread. >> */ >> if (cwq->thread == NULL) >> return; >> >> lock_map_acquire(&cwq->wq->lockdep_map); <-------------- Lockdep >> complains here. >> lock_map_release(&cwq->wq->lockdep_map); >> ... > > Yeap, the above is cpu_add_remove_lock -> wq->lockdep_map dependency. > I can see that but I'm failing to see where the dependency the other > direction is created. > Hmm, it looks like I misunderstand lock_map_acquire()? From the changelog, I thought it was added to complain its caller is holding a lock when invoking it, thus cpu_add_remove_lock is not an exception. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on 1 Apr 2010 02:10 Cong Wang wrote: > Tejun Heo wrote: >> Hello, >> >> On 04/01/2010 01:28 PM, Cong Wang wrote: >>>> Hmmm... can you please try to see whether this circular locking >>>> warning involving wq->lockdep_map is reproducible w/ the bonding >>>> locking fixed? I still can't see where wq -> cpu_add_remove_lock >>>> dependency is created. >>>> >>> I thought this is obvious. >>> >>> Here it is: >>> >>> void destroy_workqueue(struct workqueue_struct *wq) >>> { >>> const struct cpumask *cpu_map = wq_cpu_map(wq); >>> int cpu; >>> >>> cpu_maps_update_begin(); <----------------- Hold >>> cpu_add_remove_lock here >>> spin_lock(&workqueue_lock); >>> list_del(&wq->list); >>> spin_unlock(&workqueue_lock); >>> >>> for_each_cpu(cpu, cpu_map) >>> cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, >>> cpu)); <------ See below >>> cpu_maps_update_done(); <----------------- Release >>> cpu_add_remove_lock here >>> >>> ... >>> static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq) >>> { >>> /* >>> * Our caller is either destroy_workqueue() or CPU_POST_DEAD, >>> * cpu_add_remove_lock protects cwq->thread. >>> */ >>> if (cwq->thread == NULL) >>> return; >>> >>> lock_map_acquire(&cwq->wq->lockdep_map); <-------------- Lockdep >>> complains here. >>> lock_map_release(&cwq->wq->lockdep_map); >>> ... >> >> Yeap, the above is cpu_add_remove_lock -> wq->lockdep_map dependency. >> I can see that but I'm failing to see where the dependency the other >> direction is created. >> > > Hmm, it looks like I misunderstand lock_map_acquire()? From the changelog, > I thought it was added to complain its caller is holding a lock when > invoking > it, thus cpu_add_remove_lock is not an exception. > Oh, I see, wq->lockdep_map is acquired again in run_workqueue(), so I was wrong. :) I think you and Oleg are right, the lockdep warning is not irrelevant. Sorry for the noise, ignore this patch please. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Tejun Heo on 1 Apr 2010 02:30 Hello, On 04/01/2010 03:05 PM, Cong Wang wrote: >> Hmm, it looks like I misunderstand lock_map_acquire()? From the >> changelog, I thought it was added to complain its caller is holding >> a lock when invoking it, thus cpu_add_remove_lock is not an >> exception. Oh, that just tells the code is trying to grab a pseudo lock. It's not really a lock but to lockdep it looks like one and lockdep can use it to compute problem cases. > Oh, I see, wq->lockdep_map is acquired again in run_workqueue(), so > I was wrong. :) I think you and Oleg are right, the lockdep warning > is not irrelevant. Yeah, I think the circular dependency you reported on wq->lockdep_map is completed only through dependency through rtnl_mutex. If you fix rtnl_mutex locking, it should go away too. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on 2 Apr 2010 01:00
Oleg Nesterov wrote: > On 04/01, Cong Wang wrote: >>> I must have missed something, but it seems to me this patch tries to >>> supress the valid warning. >>> >>> Could you please clarify? >> Sure, below is the whole warning. Please teach me how this is valid. > > Oh, I can never understand the output from lockdep, it is much more > clever than me ;) > > But at first glance, > >> Mar 31 16:15:02 dhcp-66-70-5 kernel: -> #2 (rtnl_mutex){+.+.+.}: >> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff810a6bc1>] validate_chain+0x1019/0x1540 >> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff810a7e75>] __lock_acquire+0xd8d/0xe55 >> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff810aa3a4>] lock_acquire+0x160/0x1af >> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff815523f8>] mutex_lock_nested+0x64/0x4e9 >> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff8147af16>] rtnl_lock+0x1e/0x27 >> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffffa0836779>] bond_mii_monitor+0x39f/0x74b [bonding] >> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff8108654f>] worker_thread+0x2da/0x46c >> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff8108b1ea>] kthread+0xdd/0xec >> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff81004894>] kernel_thread_helper+0x4/0x10 > > OK, so work->func() takes rtnl_mutex. > > This means it is not safe to do flush_workqueue() or destroy_workqueue() > under rtnl_lock(). This is known fact. > >> Mar 31 16:15:03 dhcp-66-70-5 kernel: -> #0 ((bond_dev->name)){+.+...}: >> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff810a6696>] validate_chain+0xaee/0x1540 >> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff810a7e75>] __lock_acquire+0xd8d/0xe55 >> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff810aa3a4>] lock_acquire+0x160/0x1af >> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff81085278>] cleanup_workqueue_thread+0x59/0x10b >> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff81085428>] destroy_workqueue+0x9c/0x107 >> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffffa0839d32>] bond_uninit+0x524/0x58a [bonding] >> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff8146967b>] rollback_registered_many+0x205/0x2e3 >> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff81469783>] unregister_netdevice_many+0x2a/0x75 >> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff8147ada3>] __rtnl_kill_links+0x8b/0x9d >> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff8147adea>] __rtnl_link_unregister+0x35/0x72 >> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff8147b293>] rtnl_link_unregister+0x2c/0x43 > > However, rtnl_link_unregister() takes rtnl_mutex and then bond_uninit() > does cleanup_workqueue_thread(). > > So, looks like this warning is valid, this path can deadlock if > destroy_workqueue() is called when bond->mii_work is queued. Yeah, this is right. > > > Lockdep decided to blaim cpu_add_remove_lock in this chain. > Yes, this is what makes me confused. ;) Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |