From: Naoya Horiguchi on
This patchset enables error handling for hugepage by containing error
in the affected hugepage.

Until now, memory error (classified as SRAO in MCA language) on hugepage
was simply ignored, which means if someone accesses the error page later,
the second MCE (severer than the first one) occurs and the system panics.

It's useful for some aggressive hugepage users if only affected processes
are killed. Then other unrelated processes aren't disturbed by the error
and can continue operation.

Moreover, for other extensive hugetlb users which have own "pagecache"
on hugepage, the most valued feature would be being able to receive
the early kill signal BUS_MCEERR_AO, because the cache pages have
good opportunity to be dropped without side effects on BUS_MCEERR_AO.


The design of hugepage error handling is based on that of non-hugepage
error handling, where we:
1. mark the error page as hwpoison,
2. unmap the hwpoisoned page from processes using it,
3. invalidate error page, and
4. block later accesses to the hwpoisoned pages.

Similarities and differences between huge and non-huge case are
summarized below:

1. (Difference) when error occurs on a hugepage, PG_hwpoison bits on all pages
in the hugepage are set, because we have no simple way to break up
hugepage into individual pages for now. This means there is a some
risk to be killed by touching non-guilty pages within the error hugepage.

2. (Similarity) hugetlb entry for the error hugepage is replaced by hwpoison
swap entry, with which we can detect hwpoisoned memory in VM code.
This is accomplished by adding rmapping code for hugepage, which enables
to use try_to_unmap() for hugepage.

3. (Difference) since hugepage is not linked to LRU list and is unswappable,
there are not many things to do for page invalidation (only dequeuing
free/reserved hugepage from freelist. See patch 5/7.)
If we want to contain the error into one page, there may be more to do.

4. (Similarity) we block later accesses by forcing page requests for
hwpoisoned hugepage to fail as done in non-hugepage case in do_wp_page().

ToDo:
- Narrow down the containment region into one raw page.
- Soft-offlining for hugepage is not supported due to the lack of migration
for hugepage.
- Counting file-mapped/anonymous hugepage in NR_FILE_MAPPED/NR_ANON_PAGES.

[PATCH 1/7] hugetlb, rmap: add reverse mapping for hugepage
[PATCH 2/7] HWPOISON, hugetlb: enable error handling path for hugepage
[PATCH 3/7] HWPOISON, hugetlb: set/clear PG_hwpoison bits on hugepage
[PATCH 4/7] HWPOISON, hugetlb: maintain mce_bad_pages in handling hugepage error
[PATCH 5/7] HWPOISON, hugetlb: isolate corrupted hugepage
[PATCH 6/7] HWPOISON, hugetlb: detect hwpoison in hugetlb code
[PATCH 7/7] HWPOISON, hugetlb: support hwpoison injection for hugepage

Dependency:
- patch 2 depends on patch 1.
- patch 3 to patch 6 depend on patch 2.

include/linux/hugetlb.h | 3 +
mm/hugetlb.c | 98 ++++++++++++++++++++++++++++++++++++++-
mm/hwpoison-inject.c | 15 ++++--
mm/memory-failure.c | 120 +++++++++++++++++++++++++++++++++++------------
mm/rmap.c | 16 ++++++
5 files changed, 215 insertions(+), 37 deletions(-)

ChangeLog from v4:
- rebased to 2.6.34-rc7
- add isolation code for free/reserved hugepage in me_huge_page()
- set/clear PG_hwpoison bits of all pages in hugepage.
- mce_bad_pages counts all pages in hugepage.
- rename __hugepage_set_anon_rmap() to hugepage_add_anon_rmap()
- add huge_pte_offset() dummy function in header file on !CONFIG_HUGETLBFS

ChangeLog from v3:
- rebased to 2.6.34-rc5
- support for privately mapped hugepage

ChangeLog from v2:
- rebase to 2.6.34-rc3
- consider mapcount of hugepage
- rename pointer "head" into "hpage"

ChangeLog from v1:
- rebase to 2.6.34-rc1
- add comment from Wu Fengguang

Thanks,
Naoya Horiguchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/