From: Jack Steiner on

We see an X86_64 regression that started a few days ago. The kernel is booted
via EFI & panics in the pat.c code trying to deref a NULL pointer.

I didn't debug the problem but am suspicious of
x86, pat: Migrate to rbtree only backend for pat memtype management x86/pat
author Pallipadi, Venkatesh <venkatesh.pallipadi(a)intel.com>
Wed, 10 Feb 2010 23:26:07 +0000 (15:26 -0800)
committer H. Peter Anvin <hpa(a)zytor.com>
Thu, 18 Feb 2010 23:41:36 +0000 (15:41 -0800)



Has anyone seen this? If not, I can debug further....

Problem is in the git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git tree.



Pid: 0, comm: swapper Not tainted 2.6.33-rc8-tip-medusa+ #2 /
RIP: 0010:[<ffffffff810304b0>] [<ffffffff810304b0>] rbt_memtype_check_insert+0x1b2/0x232
RSP: 0000:ffffffff81601df8 EFLAGS: 00000256
RAX: 00000000000b0000 RBX: ffff88000f840100 RCX: 00000000000001c1
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88000f8244c0
RBP: ffffffff81601e38 R08: 0000000000000001 R09: ffffffff8152cd79
R10: ffffffff8152cd79 R11: 0000000000018620 R12: ffff88000f8244c0
R13: 0000000000000010 R14: 0000000000000000 R15: 00000000fffffff4
FS: 0000000000000000(0000) GS:ffff880001c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005007b
CR2: 0000000000000000 CR3: 0000000001604000 CR4: 00000000000006b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
Process swapper (pid: 0, threadinfo ffffffff81600000, task ffffffff8160c020)
Stack:
ffffffff81601e38 00000000000b0000 0000000000006000 ffff88000f840100
<0> ffff88000f8244c0 0000000000000000 0000000000000010 00000000fffffff4
<0> ffffffff81601e88 ffffffff8102edff 00000000000b0000 0000000000006000
Call Trace:
[<ffffffff8102edff>] reserve_memtype+0x2ce/0x4c9
[<ffffffff8102e0d0>] set_memory_uc+0x41/0x89
[<ffffffff816b92be>] efi_enter_virtual_mode+0xc9/0x269
[<ffffffff816aada0>] start_kernel+0x3b8/0x42b
[<ffffffff816aa140>] ? early_idt_handler+0x0/0x71
[<ffffffff816aa29e>] x86_64_start_reservations+0xa5/0xa9
[<ffffffff816aa3ed>] x86_64_start_kernel+0x14b/0x15a



Source of the NULL pointer is:

int set_memory_uc(unsigned long addr, int numpages)
{
...
ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
_PAGE_CACHE_UC_MINUS, NULL);



--- jack
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Pallipadi, Venkatesh on

On Wed, 2010-02-24 at 12:22 -0800, Jack Steiner wrote:
> We see an X86_64 regression that started a few days ago. The kernel is booted
> via EFI & panics in the pat.c code trying to deref a NULL pointer.
>
> I didn't debug the problem but am suspicious of
> x86, pat: Migrate to rbtree only backend for pat memtype management x86/pat
> author Pallipadi, Venkatesh <venkatesh.pallipadi(a)intel.com>
> Wed, 10 Feb 2010 23:26:07 +0000 (15:26 -0800)
> committer H. Peter Anvin <hpa(a)zytor.com>
> Thu, 18 Feb 2010 23:41:36 +0000 (15:41 -0800)
>
>
>
> Has anyone seen this? If not, I can debug further....
>


Haven't seen this on my test systems here, but I haven't tested with EFI
boot either.

I assume this is repeatable, and you always see this panic. I am looking
at the code right now. Can you rollback this particular patch and see
whether it goes away?

Thanks,
Venki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jack Steiner on
On Wed, Feb 24, 2010 at 01:09:24PM -0800, Pallipadi, Venkatesh wrote:
>
> On Wed, 2010-02-24 at 12:22 -0800, Jack Steiner wrote:
> > We see an X86_64 regression that started a few days ago. The kernel is booted
> > via EFI & panics in the pat.c code trying to deref a NULL pointer.
> >
> > I didn't debug the problem but am suspicious of
> > x86, pat: Migrate to rbtree only backend for pat memtype management x86/pat
> > author Pallipadi, Venkatesh <venkatesh.pallipadi(a)intel.com>
> > Wed, 10 Feb 2010 23:26:07 +0000 (15:26 -0800)
> > committer H. Peter Anvin <hpa(a)zytor.com>
> > Thu, 18 Feb 2010 23:41:36 +0000 (15:41 -0800)
> >
> >
> >
> > Has anyone seen this? If not, I can debug further....
> >
>
>
> Haven't seen this on my test systems here, but I haven't tested with EFI
> boot either.
>
> I assume this is repeatable, and you always see this panic. I am looking
> at the code right now. Can you rollback this particular patch and see
> whether it goes away?

The problem is very repeatible.

FWIW, we have a nightly regression test that builds/tests the x86 tree
everynight at 1 AM. The failure started on the morning of Feb 22.

The build on Feb 21 (& all of Feb before then) passed w/o errors.
I can't rule out other errors but I don't see anything else that changed.

The linux-next tree appears to have the same problem.



--- jack
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Pallipadi, Venkatesh on
On Wed, Feb 24, 2010 at 01:37:29PM -0800, Jack Steiner wrote:
> On Wed, Feb 24, 2010 at 01:09:24PM -0800, Pallipadi, Venkatesh wrote:
> >
> > On Wed, 2010-02-24 at 12:22 -0800, Jack Steiner wrote:
> > > We see an X86_64 regression that started a few days ago. The kernel is booted
> > > via EFI & panics in the pat.c code trying to deref a NULL pointer.
> > >
> > > I didn't debug the problem but am suspicious of
> > > x86, pat: Migrate to rbtree only backend for pat memtype management x86/pat
> > > author Pallipadi, Venkatesh <venkatesh.pallipadi(a)intel.com>
> > > Wed, 10 Feb 2010 23:26:07 +0000 (15:26 -0800)
> > > committer H. Peter Anvin <hpa(a)zytor.com>
> > > Thu, 18 Feb 2010 23:41:36 +0000 (15:41 -0800)
> > >
> > >
> > >
> > > Has anyone seen this? If not, I can debug further....
> > >
> >
> >
> > Haven't seen this on my test systems here, but I haven't tested with EFI
> > boot either.
> >
> > I assume this is repeatable, and you always see this panic. I am looking
> > at the code right now. Can you rollback this particular patch and see
> > whether it goes away?
>
> The problem is very repeatible.
>
> FWIW, we have a nightly regression test that builds/tests the x86 tree
> everynight at 1 AM. The failure started on the morning of Feb 22.
>
> The build on Feb 21 (& all of Feb before then) passed w/o errors.
> I can't rule out other errors but I don't see anything else that changed.
>
> The linux-next tree appears to have the same problem.
>

I guess I found an obvious problem in the code. Can you check whether the
below patch resolves the panic you are seeing.

Thanks,
Venki


new->type should only change when there is a valid ret_type. Otherwise
requested type and return type should be same.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi(a)intel.com>
---
arch/x86/mm/pat_rbtree.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
index e4cd229..58b6de1 100644
--- a/arch/x86/mm/pat_rbtree.c
+++ b/arch/x86/mm/pat_rbtree.c
@@ -223,7 +223,9 @@ int rbt_memtype_check_insert(struct memtype *new, unsigned long *ret_type)
new->type, ret_type);

if (!err) {
- new->type = *ret_type;
+ if (ret_type)
+ new->type = *ret_type;
+
memtype_rb_insert(&memtype_rbroot, new);
}
return err;
--
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jack Steiner on
>
> I guess I found an obvious problem in the code. Can you check whether the
> below patch resolves the panic you are seeing.
>
> Thanks,
> Venki


Works great!! Thanks...


>
>
> new->type should only change when there is a valid ret_type. Otherwise
> requested type and return type should be same.
>
> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi(a)intel.com>
> ---
> arch/x86/mm/pat_rbtree.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
> index e4cd229..58b6de1 100644
> --- a/arch/x86/mm/pat_rbtree.c
> +++ b/arch/x86/mm/pat_rbtree.c
> @@ -223,7 +223,9 @@ int rbt_memtype_check_insert(struct memtype *new, unsigned long *ret_type)
> new->type, ret_type);
>
> if (!err) {
> - new->type = *ret_type;
> + if (ret_type)
> + new->type = *ret_type;
> +
> memtype_rb_insert(&memtype_rbroot, new);
> }
> return err;
> --
> 1.6.0.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/