Prev: [PATCH 1/6] fsnotify: kill FSNOTIFY_EVENT_FILE
Next: vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
From: Andi Kleen on 28 Oct 2009 01:30 Hidetoshi Seto wrote: > Mike Travis wrote: >> Mike Travis wrote: >>> Hi Roland, >>> >>> I've found that I'm getting one of these lines for every cpu: >>> >>> mce: CPU supports 0 MCE banks That message can be just removed I think. I don't see much value in it because the value is in sysfs and when you see the CPU type you can easily determine it anyways. I don't think the patch below really solves the problem because they would have the same noise problem back once they switch from the simulator to a real box which has banks. > Hum, I suppose the line for CPU 0 was slightly different from others, > because SHD means "this bank is shared bank and controlled by other". > Maybe: > CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21 > > But I agree that we could some work for this messages... > Is it better to change the message level to debug from info? Can be made INFO yes, but I would prefer not removing them from the dmesg for now. Perhaps they could be also compressed a bit like SRAT. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Hidetoshi Seto on 28 Oct 2009 02:30 Andi Kleen wrote: > Hidetoshi Seto wrote: >> Mike Travis wrote: >>> Mike Travis wrote: >>>> Hi Roland, >>>> >>>> I've found that I'm getting one of these lines for every cpu: >>>> >>>> mce: CPU supports 0 MCE banks > > That message can be just removed I think. I don't see much value in it > because the value is in sysfs and when you see the CPU type you can easily > determine it anyways. > > I don't think the patch below really solves the problem because they > would have the same noise problem back once they switch from the simulator > to a real box which has banks. If box has any banks more than 0, then the line above will be appeared only once for CPU 0. Only on the simulator, with MCE-capable processor with no bank, this message becomes unacceptable noise because it appears for every cpu. Anyway I think my patch is nice to have, to avoid unexpected behavior on uncertain environment. Without disabling, what can we do on MCE with no bank? I found that do_machine_check() does nothing if banks==0 ... it is better to let system to panic with "Machine check from unknown source"? >> Hum, I suppose the line for CPU 0 was slightly different from others, >> because SHD means "this bank is shared bank and controlled by other". >> Maybe: >> CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21 >> >> But I agree that we could some work for this messages... >> Is it better to change the message level to debug from info? > > Can be made INFO yes, but I would prefer not removing them > from the dmesg for now. > > Perhaps they could be also compressed a bit like SRAT. Like SRAT? I could not catch the meaning ... For example? Thanks, H.Seto -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on 28 Oct 2009 02:50 Hidetoshi Seto wrote: > > Without disabling, what can we do on MCE with no bank? Nothing, but is it really worth adding a special case? > I found that do_machine_check() does nothing if banks==0 ... it is better > to let system to panic with "Machine check from unknown source"? IMHO yes. In this case the system must be very confused and panic is the best you can do. Otherwise it won't do anything interesting anyways. > >>> Hum, I suppose the line for CPU 0 was slightly different from others, >>> because SHD means "this bank is shared bank and controlled by other". >>> Maybe: >>> CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21 >>> >>> But I agree that we could some work for this messages... >>> Is it better to change the message level to debug from info? >> Can be made INFO yes, but I would prefer not removing them >> from the dmesg for now. >> >> Perhaps they could be also compressed a bit like SRAT. > > Like SRAT? I could not catch the meaning ... For example? See the recent patches from David Rientjes in the same original thread. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Hidetoshi Seto on 28 Oct 2009 04:20 Andi Kleen wrote: > Hidetoshi Seto wrote: >> Without disabling, what can we do on MCE with no bank? > > Nothing, but is it really worth adding a special case? If question were: - is it really worth to support this special environment, "MCE-capable but no MCE banks" ? then I'd like to say no. So I suggested to disable MCE on this uncertain environment. Or we will end up adding more codes for special cases... >> I found that do_machine_check() does nothing if banks==0 ... it is better >> to let system to panic with "Machine check from unknown source"? > > IMHO yes. In this case the system must be very confused and panic is the > best you can do. Otherwise it won't do anything interesting anyways. Agreed, but this is also a special case. Not depending on the real number of banks, confused system could fail to get the value from memory... Humm, in theory MCE handler must be implemented carefully, but I bet the confused value will not be always 0, .... is it worth to do? >>>> Hum, I suppose the line for CPU 0 was slightly different from others, >>>> because SHD means "this bank is shared bank and controlled by other". >>>> Maybe: >>>> CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21 >>>> >>>> But I agree that we could some work for this messages... >>>> Is it better to change the message level to debug from info? >>> Can be made INFO yes, but I would prefer not removing them >>> from the dmesg for now. >>> >>> Perhaps they could be also compressed a bit like SRAT. >> >> Like SRAT? I could not catch the meaning ... For example? > > See the recent patches from David Rientjes in the same original thread. I found it, thanks. So I suppose your idea is like: CPU 0 MCA banks CMCI:{0-3,5-9,12-21} POLL:{4,10,11} CPU 1 MCA banks SHD:{0,1,6-9,12-21} CMCI:{2,3,5} POLL:{4,10,11} right? IMHO the format I suggested is better to read, as far as banks is not so big number. CPU 0 MCA banks map : CCCC PCCC CCPP CCCC CCCC CC CPU 1 MCA banks map : ssCC PCss ssPP ssss ssss ss Thanks, H.Seto -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Valdis.Kletnieks on 28 Oct 2009 08:10
On Wed, 28 Oct 2009 06:24:45 BST, Andi Kleen said: > >>> mce: CPU supports 0 MCE banks > > That message can be just removed I think. I don't see much value in it > because the value is in sysfs and when you see the CPU type you can easily > determine it anyways. Maybe it should only print a message if it finds an unexpected number of banks? "Hey dood - we're on a Core3.5 and there should be 6 banks here, but the hardware says there's only 4. What's up with that?" |