Weird deadlock when shutting down [Kernel]

Prev: acpi: Fix regression where _PPC is not read at boot even when ignore_ppc=0
Next: input/touchscreen: Synaptics Touchscreen Driver

From: Dave Young on 20 Feb 2010 02:50

On Fri, Feb 19, 2010 at 2:45 AM, Johannes Berg
<johannes(a)sipsolutions.net> wrote:
> On Thu, 2010-02-18 at 08:31 -0800, Linus Torvalds wrote:
>
>> > > halt/4071 is trying to acquire lock:
>> > > (s_active){++++.+}, at: [<c0000000001ef868>]
>> > > .sysfs_addrm_finish+0x58/0xc0
>> > >
>> > > but task is already holding lock:
>> > > (&per_cpu(cpu_policy_rwsem, cpu)){+.+.+.}, at:
>> [<c0000000004cd6ac>]
>> > > .lock_policy_rwsem_write+0x84/0xf4
>> > >
>> > > which lock already depends on the new lock.
>>
>> You don't have a full backtrace for these things?
>
> No, it deadlocks right there, unfortunately.
>
>> We've had lots of trouble with the cpu governors, and I suspect the
>> problem isn't new, but the lockdep warning is likely new (see commit
>> 846f99749ab68bbc7f75c74fec305de675b1a1bf: "sysfs: Add lockdep
>> annotations
>> for the sysfs active reference").
>>
>> So it is likely to be an old issue that (a) now gets warned about and
>> (b) might have had timing changes enough to trigger it.
>
> Well, it used to not deadlock and actually shut down the machine :) So
> in that sense it's definitely new. It might have printed a lockdep
> warning before, which you wouldn't normally see since the machine turns
> off right after this.

before shutdown, you can:
echo N > /proc/sys/kernel/printk_delay
to see the printk messages, N is 0-10000 in milliseconds

>
>> I suspect it is G5-specific (or specific to whatever CPU frequency
>> code
>> that gets used there), since I think we'd have had lots of reports if
>> this
>> happened on x86.
>
> Yeah, that's puzzling me as well.
>
> johannes
>

--
Regards
dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Johannes Berg on 20 Feb 2010 03:50

On Sat, 2010-02-20 at 12:46 +0800, AmÃ©rico Wang wrote:

> This is a real deadlock case found by lockdep added to s_active.
>
> The problem is that we did kobject_put(&data->kobj) while holding
> policy_rwsem
> which is used to protect 'data'. It is not so easy to fix this,
> probably we need to do more work on cpufreq code.

But it doesn't make sense that it's just an existing real deadlock that
is now found -- it never occurred previously!

Anyway, I'll try your patch, thanks for that.

johannes

From: Johannes Berg on 20 Feb 2010 04:00

On Sat, 2010-02-20 at 15:13 +0800, AmÃ©rico Wang wrote:

> Does my following untested patch help?

Sorry, no. I'll hook up a screen to the box after I return from the
fresh market.

johannes

From: Américo Wang on 20 Feb 2010 04:10

On Sat, Feb 20, 2010 at 4:56 PM, Johannes Berg
<johannes(a)sipsolutions.net> wrote:
> On Sat, 2010-02-20 at 15:13 +0800, Américo Wang wrote:
>
>> Does my following untested patch help?
>
> Sorry, no. I'll hook up a screen to the box after I return from the
> fresh market.
>

Are you sure there is no difference? :-/

Also, could you please also apply the 4 patches from Eric?

You can get them here:
http://lkml.org/lkml/2010/2/11/334

Thanks much for your testing!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Américo Wang on 20 Feb 2010 04:40

On Thu, Feb 18, 2010 at 5:36 PM, Johannes Berg
<johannes(a)sipsolutions.net> wrote:
> On Fri, 2010-01-29 at 15:41 +1100, Benjamin Herrenschmidt wrote:
>> Hi Ingo !
>>
>> Johannes and I see this on our quad G5s... it -could- be similar to
>> one reported a short while ago by Xiaotian Feng <xtfeng(a)gmail.com>
>> under the subject [2.6.33-rc4] sysfs lockdep warnings on cpu hotplug.
>>
>> Basically, the machine deadlocks right after printing the following
>> when doing a shutdown:
>>
>> halt/4071 is trying to acquire lock:
>> (s_active){++++.+}, at: [<c0000000001ef868>]
>> .sysfs_addrm_finish+0x58/0xc0
>>
>> but task is already holding lock:
>> (&per_cpu(cpu_policy_rwsem, cpu)){+.+.+.}, at: [<c0000000004cd6ac>]
>> .lock_policy_rwsem_write+0x84/0xf4
>>
>> which lock already depends on the new lock.
>>
>> the existing dependency chain (in reverse order) is:
>
> This is still happening with -rc8. Any news?
>

Hey, johannes

Not sure if you made some mistake here, the one you report here [1]
is _not_ the same with this one reported by Benjamin.

Please make sure what you are talking about here is the same one.

Thanks.

1. http://lkml.org/lkml/2010/2/18/33
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: acpi: Fix regression where _PPC is not read at boot even when ignore_ppc=0
Next: input/touchscreen: Synaptics Touchscreen Driver