Prev: cpuidle: add cpuidle_unregister_driver() error check
Next: [PATCH 2/2] Staging: crystalhd: removed kfree(NULL) checks
From: Andi Kleen on 27 May 2010 09:30 Hi Alan, > That would be because you don't do driver work I suspect. If you are > doing driver work then its extremely useful ending up in the debugger > when you get an MCE because some random bit of hardware on the bus > decided to throw a tantrum. > > This is particularly the case with AMD/ATI and AMD/Nvidia chipset systems > which tend to throw this kind of error if you prod some of the chipset > controllers (eg the Nvidia SATA) in them in just the wrong way. > > So NAK simply removing it. As a driver writer I want to end up in the > debugger when this happens so I can work out what led up to the MCE. Have you ever tried that? It does not sound like it to be honest :) You have no chance to figure out why the MCE happened either, unless you run through the handler first. Unless you want to do all the work the MCE handler does manually somehow in the debugger (reading all banks on all CPUs, parsing all the bits, doing all the other work). I wrote the code to do that and even I am a bit scared of doing all the manually. Also if the MCE is recoverable you'll just get a log entry with all the information and if it's not recoverable you get a panic which ends up entering the debugger anyways. In addition you won't get a single debugger entry, but a parallel entry on all CPUs because a MCE is broadcast. So overall I still think handling MCEs in debuggers does not make sense. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |