Prev: cpuidle: add cpuidle_unregister_driver() error check
Next: [PATCH 2/2] Staging: crystalhd: removed kfree(NULL) checks
From: Huang Ying on 26 May 2010 23:30 On Thu, 2010-05-27 at 10:40 +0800, Jin Dongming wrote: > This patch fixes do_machine_check() failure caused by DIE_NMI. > > I do MCE tests on my machine. When I inject Uncorrected Error(UE) into > kernel, the messages of test failure are always gotten. This problem > is caused by the notification of DIE_NMI in the front of do_machine_check(). > Because there are some notifications used DIE_NMI, and when they finish their > own work and return NOTIFY_STOP as a result. The result makes > do_machine_check() return at that time. > > So we decide to delete the notification of DIE_NMI. It is because when UE error > happens, if one of the cpu is down caused by the error of hook function of > DIE_NMI, the error type of UE may be different with the real one. For example, > > CPU0 CPU1 > UE do_machine_check() do_machine_check() > | | > cpu down(hook error of DIE_NMI) cpu OK(no hook error of DIE_NMI) > | > wait CPU0 timeout > | > Fatal Error > (Timeout synchronizing machine > check over CPUs) Fatal error will only occur if tolerant = 0, which is not the common case. But I think the notify_die can be an issue here. For example UE is on CPU0, and the MCE is consumed by notify_die; MCE on CPU1 will detect nothing. I have heard about that on some machine, some hardware error output pin of chipset may be linked with some input pin of CPU which can cause MCE. That is, MCE is used to report some chipset errors too. I think that is why notify_die is called in do_machine_check. Simply removing notify_die is not good for these machines. Maybe we should fix the notifier user instead. Which notifier user consumes the DIE_NMI notification? Best Regards, Huang Ying -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Hidetoshi Seto on 27 May 2010 02:10 (2010/05/27 12:21), Huang Ying wrote: > I have heard about that on some machine, some hardware error output pin > of chipset may be linked with some input pin of CPU which can cause MCE. > That is, MCE is used to report some chipset errors too. I think that is > why notify_die is called in do_machine_check. Simply removing notify_die > is not good for these machines. Hum, it sounds like "notify_die here is hook for proprietary chipset driver". Anyone who have such machine and driver in real? But if my understanding is correct the notify_die here will call all registered callbacks and let them process if "DIE_NMI" is an event what the callback really interested in. Problems are (1) many callbacks will behave wrongly since they don't aware that DIE_NMI event can be posted from Machine Check, and (2) if the machine is not such special hardware it is just waste of time in critical context where quick page-poisoning might be required. > Maybe we should fix the notifier user instead. Which notifier user > consumes the DIE_NMI notification? What I found at a glance is: [arch/x86/kernel/cpu/perf_event.c] 1183 static int __kprobes 1184 perf_event_nmi_handler(struct notifier_block *self, 1185 unsigned long cmd, void *__args) 1186 { 1187 struct die_args *args = __args; 1188 struct pt_regs *regs; 1189 1190 if (!atomic_read(&active_events)) 1191 return NOTIFY_DONE; 1192 1193 switch (cmd) { 1194 case DIE_NMI: 1195 case DIE_NMI_IPI: 1196 break; 1197 1198 default: 1199 return NOTIFY_DONE; 1200 } 1201 1202 regs = args->regs; 1203 1204 apic_write(APIC_LVTPC, APIC_DM_NMI); 1205 /* 1206 * Can't rely on the handled return value to say it was our NMI, two 1207 * events could trigger 'simultaneously' raising two back-to-back NMIs. 1208 * 1209 * If the first NMI handles both, the latter will be empty and daze 1210 * the CPU. 1211 */ 1212 x86_pmu.handle_irq(regs); 1213 1214 return NOTIFY_STOP; 1215 } However I think fixing the notifier users is wrong direction. (At least I have no idea how many ISVs will be affected) One quick alternative is define "DIE_MCE" and use it instead, but if special hook like this is really required, I suppose we should invent some special interface for external plug-in like a chipset's LLHEH (low-level hardware error handler) etc., to allow additional platform-specific error handling in critical context. So I think simply removing it is good to start. If there are no complaints and no users in these days, we are done. Otherwise we will get fresh real requirement and will be able to do proper things. Thanks, H.Seto -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on 27 May 2010 03:00 > And I test this patch on x86_64. It works well. > > Signed-off-by: Jin Dongming<jin.dongming(a)np.css.fujitsu.com> > Signed-off-by: Hidetoshi Seto<seto.hidetoshi(a)jp.fujitsu.com> Reviewed-by: Andi Kleen <ak(a)linux.intel.com> -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on 27 May 2010 03:00 > I have heard about that on some machine, some hardware error output pin > of chipset may be linked with some input pin of CPU which can cause MCE. Yes that happens. > That is, MCE is used to report some chipset errors too. I think that is > why notify_die is called in do_machine_check. Simply removing notify_die > is not good for these machines. In general deciding what to do on a MCE is rather complicated and probably too much for any die handler. > Maybe we should fix the notifier user instead. Which notifier user > consumes the DIE_NMI notification? Yes. It would be good to find out which user it is. Perhaps gdb? One approach would be to give it a different type (DIE_MCE) But today we don't really need it. notify_die() is primarily for debuggers of all kinds, and I never liked the idea to call a debugger on a machine check. So I would be ok with just removing the call. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Alan Cox on 27 May 2010 07:00
> In general deciding what to do on a MCE is rather complicated > and probably too much for any die handler. True enough > But today we don't really need it. notify_die() is primarily for debuggers > of all kinds, and I never liked the idea to call a debugger on a machine > check. That would be because you don't do driver work I suspect. If you are doing driver work then its extremely useful ending up in the debugger when you get an MCE because some random bit of hardware on the bus decided to throw a tantrum. This is particularly the case with AMD/ATI and AMD/Nvidia chipset systems which tend to throw this kind of error if you prod some of the chipset controllers (eg the Nvidia SATA) in them in just the wrong way. So NAK simply removing it. As a driver writer I want to end up in the debugger when this happens so I can work out what led up to the MCE. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |