From: Mathieu Desnoyers on
Hi,

There seem to have been some churn regarding Perf problems with per-cpu memory
allocation which uses vmalloc. Long story short: faulting NMIs reactivate NMIs
faster than supposed, because x86 re-enables NMIs at the first iret encountered,
which leads to nested NMIs.

x86_32 cannot use vmalloc_sync_all() to sychronize the TLBs from every
processes because the vmalloc area is mapped in a different address space for
each process on this architecture. A second alternative is to duplicate the
per-cpu allocation API to have a variant using kmalloc only. This would lead to
code and API duplication and should probably be kept as last resort. A third
solution to this problem is to make the page fault handler aware of NMIs and
ensure it can be called from this context. This third solution is proposed by
this patchset.

So I'm respinning this patchset which has been sitting for a while, used for
about 1-2 years in the LTTng tree without problems, already tested in a -tip
sub-branch in the past. It uses a ret/popf instruction pair instead of iret when
it detects that a trap handler is nested over an NMI. A second patch takes care
of making the page fault handler nmi-safe by using the cr3 register rather than
accessing ->current, which could be in the middle of being changed by a context
switch.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/