Prev: [PATCH] gen_nand: Fix support for multiple chips
Next: mmotm 2010-07-27 - WARNING in ec_parse_io_ports()
From: H. Peter Anvin on 28 Jul 2010 14:40 On 07/28/2010 11:10 AM, James Bottomley wrote: > > So I don't understand the problem. Proper shutdown of the old kernel > will halt all the DMA engines (by design ... we can't have DMA ongoing > if the next action might be power off). The only case I know where DMA > engines may be active is the crash kernel case. > I'm not sure I fully understand the exact problem, either; not being familiar with this putative "logging" facility of the Qlogic devices. My point was largely that if a device causes failures because of the choice of the allocation order, then we have a much bigger problem and papering over it by trying to muck with the allocation order is just wrong. This logging facility of Qlogic is DMA, no more, no less. It needs to be shut down on a "overwrite" kexec, where we replace one kernel with another, as opposed to a crash dump kexec, where we use a reserved chunk of virgin memory. What I don't know/understand at the moment is if there is something "special" about this particular logging facility, e.g. if the Qlogic card ignore the bus mastering control bit -- which would be reckless but I can see someone having the bright idea to do that. Yinghai, do you have any more detail, or know who would? Also copying the Qlogic Infinipath maintainer email... -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Yinghai Lu on 28 Jul 2010 15:30 On 07/28/2010 11:30 AM, H. Peter Anvin wrote: > On 07/28/2010 11:10 AM, James Bottomley wrote: >> >> So I don't understand the problem. Proper shutdown of the old kernel >> will halt all the DMA engines (by design ... we can't have DMA ongoing >> if the next action might be power off). The only case I know where DMA >> engines may be active is the crash kernel case. >> > > I'm not sure I fully understand the exact problem, either; not being > familiar with this putative "logging" facility of the Qlogic devices. > My point was largely that if a device causes failures because of the > choice of the allocation order, then we have a much bigger problem and > papering over it by trying to muck with the allocation order is just wrong. > > This logging facility of Qlogic is DMA, no more, no less. It needs to > be shut down on a "overwrite" kexec, where we replace one kernel with > another, as opposed to a crash dump kexec, where we use a reserved chunk > of virgin memory. What I don't know/understand at the moment is if > there is something "special" about this particular logging facility, > e.g. if the Qlogic card ignore the bus mastering control bit -- which > would be reckless but I can see someone having the bright idea to do that. > > Yinghai, do you have any more detail, or know who would? Also copying > the Qlogic Infinipath maintainer email... when I was debug memblock with x86, found the strange crash when high/low. then use kexec with "memtest" in command line, and the early memtest does find some bad memory. then I add more print about EPT physical address for first kernel, it does show that range is used by qla driver in first kernel. I built all needed drivers in kernel so can pxeboot the kernel on all test platforms easily. Thanks Yinghai --- drivers/scsi/qla2xxx/qla_init.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) Index: linux-2.6/drivers/scsi/qla2xxx/qla_init.c =================================================================== --- linux-2.6.orig/drivers/scsi/qla2xxx/qla_init.c +++ linux-2.6/drivers/scsi/qla2xxx/qla_init.c @@ -1327,8 +1327,8 @@ qla2x00_alloc_fw_dump(scsi_qla_host_t *v goto try_eft; } - qla_printk(KERN_INFO, ha, "Allocated (%d KB) for FCE...\n", - FCE_SIZE / 1024); + qla_printk(KERN_INFO, ha, "Allocated (%d KB) at %p for FCE...\n", + FCE_SIZE / 1024, tc); fce_size = sizeof(struct qla2xxx_fce_chain) + FCE_SIZE; ha->flags.fce_enabled = 1; @@ -1354,8 +1354,8 @@ try_eft: goto cont_alloc; } - qla_printk(KERN_INFO, ha, "Allocated (%d KB) for EFT...\n", - EFT_SIZE / 1024); + qla_printk(KERN_INFO, ha, "Allocated (%d KB) at %p for EFT...\n", + EFT_SIZE / 1024, tc); eft_size = EFT_SIZE; ha->eft_dma = tc_dma; @@ -1383,8 +1383,8 @@ cont_alloc: } return; } - qla_printk(KERN_INFO, ha, "Allocated (%d KB) for firmware dump...\n", - dump_size / 1024); + qla_printk(KERN_INFO, ha, "Allocated (%d KB) at %p for firmware dump...\n", + dump_size / 1024, ha->fw_dump); ha->fw_dump_len = dump_size; ha->fw_dump->signature[0] = 'Q'; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on 28 Jul 2010 16:10 On 07/28/2010 12:27 PM, Yinghai Lu wrote: >> >> Yinghai, do you have any more detail, or know who would? Also copying >> the Qlogic Infinipath maintainer email... > > when I was debug memblock with x86, found the strange crash when high/low. > then use kexec with "memtest" in command line, and the early memtest does find > some bad memory. > > then I add more print about EPT physical address for first kernel, > it does show that range is used by qla driver in first kernel. > I built all needed drivers in kernel so can pxeboot the kernel on all test platforms easily. > [Cc: Andrew Vasquez, who seems to have written the offending code, checkin df613b96077cee826b14089ae6e75eeabf71faa3.] The question is still open why this particular DMA activity was not shut down before the kexec. I'm not familiar with how non-crashdump kexec idles the hardware, but it obviously better do so. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Ralph Campbell on 28 Jul 2010 19:00 On Wed, 2010-07-28 at 11:30 -0700, H. Peter Anvin wrote: > On 07/28/2010 11:10 AM, James Bottomley wrote: > > > > So I don't understand the problem. Proper shutdown of the old kernel > > will halt all the DMA engines (by design ... we can't have DMA ongoing > > if the next action might be power off). The only case I know where DMA > > engines may be active is the crash kernel case. > > > > I'm not sure I fully understand the exact problem, either; not being > familiar with this putative "logging" facility of the Qlogic devices. > My point was largely that if a device causes failures because of the > choice of the allocation order, then we have a much bigger problem and > papering over it by trying to muck with the allocation order is just wrong. > > This logging facility of Qlogic is DMA, no more, no less. It needs to > be shut down on a "overwrite" kexec, where we replace one kernel with > another, as opposed to a crash dump kexec, where we use a reserved chunk > of virgin memory. What I don't know/understand at the moment is if > there is something "special" about this particular logging facility, > e.g. if the Qlogic card ignore the bus mastering control bit -- which > would be reckless but I can see someone having the bright idea to do that. > > Yinghai, do you have any more detail, or know who would? Also copying > the Qlogic Infinipath maintainer email... > > -hpa I read the messages in this thread but I don't understand what the problem is. Something to do with logging, DMA and crash dumps but it also sounds like the original discussion may be confused about how the Infiniband HCA cards work. Can someone summarize what is going on... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on 28 Jul 2010 19:50 On 07/28/2010 03:58 PM, Ralph Campbell wrote: > > I read the messages in this thread but I don't understand what the > problem is. Something to do with logging, DMA and crash dumps but > it also sounds like the original discussion may be confused about > how the Infiniband HCA cards work. > > Can someone summarize what is going on... > Sorry, I was confused... this had to do with the qla driver, not Infinipath. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Pages: 1 2 3 Prev: [PATCH] gen_nand: Fix support for multiple chips Next: mmotm 2010-07-27 - WARNING in ec_parse_io_ports() |