From: Balbir Singh on
* KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> [2010-04-13 13:45:53]:

> Thank you very much for your kindly helps!.
> -Kame
> =
> Documentation update.
>
> Some information are old, and I think current document doesn't work
> as "a guide for users".
> We need summary of all of our controls, at least.
>
> Changelog: 2010/04/12
> * applied feedback
>
> Changelog: 2010/04/09
> * replace 'lru' with 'LRU' and 'oom' with 'OOM'
> * fixed double-space breakage
> * applied all comments and fixed wrong parts pointed out.
> * fixed cgroup.procs
>
> Changelog: 2009/04/07
> * fixed tons of typos.
> * replaced "memcg" with "memory cgroup" AMAP.
> * replaced "mem+swap" with "memory+swap"
>
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com>
> ---
> Documentation/cgroups/memory.txt | 276 ++++++++++++++++++++++++++-------------
> 1 file changed, 188 insertions(+), 88 deletions(-)
>
> Index: mmotm-temp/Documentation/cgroups/memory.txt
> ===================================================================
> --- mmotm-temp.orig/Documentation/cgroups/memory.txt
> +++ mmotm-temp/Documentation/cgroups/memory.txt
> @@ -4,16 +4,6 @@ NOTE: The Memory Resource Controller has
> to as the memory controller in this document. Do not confuse memory controller
> used here with the memory controller that is used in hardware.
>
> -Salient features
> -
> -a. Enable control of Anonymous, Page Cache (mapped and unmapped) and
> - Swap Cache memory pages.
> -b. The infrastructure allows easy addition of other types of memory to control
> -c. Provides *zero overhead* for non memory controller users
> -d. Provides a double LRU: global memory pressure causes reclaim from the
> - global LRU; a cgroup on hitting a limit, reclaims from the per
> - cgroup LRU
> -
> Benefits and Purpose of the memory controller
>
> The memory controller isolates the memory behaviour of a group of tasks
> @@ -33,6 +23,45 @@ d. A CD/DVD burner could control the amo
> e. There are several other use cases, find one or use the controller just
> for fun (to learn and hack on the VM subsystem).
>
> +Current Status: linux-2.6.34-mmotm(development version of 2010/April)
> +
> +Features:
> + - accounting anonymous pages, file caches, swap caches usage and limit them.
> + - private LRU and reclaim routine. (system's global LRU and private LRU
> + work independently from each other)
> + - optionally, memory+swap usage can be accounted and limited.
> + - hierarchical accounting
> + - soft limit
> + - moving(recharging) account at moving a task is selectable.
> + - usage threshold notifier
> + - oom-killer disable knob and oom-notifier
> + - Root cgroup has no limit controls.
> +
> + Kernel memory and Hugepages are not under control yet. We just manage
> + pages on LRU. To add more controls, we have to take care of performance.
> +
> +Brief summary of control files.
> +
> + tasks # attach a task(thread) and show list of threads
> + cgroup.procs # show list of processes
> + cgroup.event_control # an interface for event_fd()
> + memory.usage_in_bytes # show current memory(RSS+Cache) usage.
> + memory.memsw.usage_in_bytes # show current memory+Swap usage
> + memory.limit_in_bytes # set/show limit of memory usage
> + memory.memsw.limit_in_bytes # set/show limit of memory+Swap usage
> + memory.failcnt # show the number of memory usage hits limits
> + memory.memsw.failcnt # show the number of memory+Swap hits limits
> + memory.max_usage_in_bytes # show max memory usage recorded
> + memory.memsw.usage_in_bytes # show max memory+Swap usage recorded
> + memory.soft_limit_in_bytes # set/show soft limit of memory usage
> + memory.stat # show various statistics
> + memory.use_hierarchy # set/show hierarchical account enabled
> + memory.force_empty # trigger forced move charge to parent
> + memory.swappiness # set/show swappiness parameter of vmscan
> + (See sysctl's vm.swappiness)
> + memory.move_charge_at_immigrate # set/show controls of moving charges
> + memory.oom_control # set/show oom controls.
> +

Can we align the "#" comments, please!

> 1. History
>
> The memory controller has a long history. A request for comments for the memory
> @@ -106,14 +135,14 @@ the necessary data structures and check
> is over its limit. If it is then reclaim is invoked on the cgroup.
> More details can be found in the reclaim section of this document.
> If everything goes well, a page meta-data-structure called page_cgroup is
> -allocated and associated with the page. This routine also adds the page to
> -the per cgroup LRU.
> +updated. page_cgroup has its own LRU on cgroup.
> +(*) page_cgroup structure is allocated at boot/memory-hotplug time.
>
> 2.2.1 Accounting details
>
> All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
> -(some pages which never be reclaimable and will not be on global LRU
> - are not accounted. we just accounts pages under usual vm management.)
> +Some pages which are never reclaimable and will not be on the global LRU
> +are not accounted. We just account pages under usual VM management.
>
> RSS pages are accounted at page_fault unless they've already been accounted
> for earlier. A file page will be accounted for as Page Cache when it's
> @@ -121,12 +150,19 @@ inserted into inode (radix-tree). While
> processes, duplicate accounting is carefully avoided.
>
> A RSS page is unaccounted when it's fully unmapped. A PageCache page is
> -unaccounted when it's removed from radix-tree.
> +unaccounted when it's removed from radix-tree. Even if RSS pages are fully
> +unmapped (by kswapd), they may exist as SwapCache in the system until they
> +are really freed. Such SwapCaches also also accounted.
> +A swapped-in page is not accounted until it's mapped.
> +
> +Note: The kernel does swapin-readahead and read multiple swaps at once.
> +This means swapped-in pages may contain pages for other tasks than a task
> +causing page fault. So, we avoid accounting at swap-in I/O.
>
> At page migration, accounting information is kept.
>
> -Note: we just account pages-on-lru because our purpose is to control amount
> -of used pages. not-on-lru pages are tend to be out-of-control from vm view.
> +Note: we just account pages-on-LRU because our purpose is to control amount
> +of used pages. not-on-LRU pages tend to be out-of-control from VM view.

^^ period might not be appropriate, if it is, n (not) should
be caps
>
> 2.3 Shared Page Accounting
>
> @@ -143,6 +179,7 @@ caller of swapoff rather than the users
>
>
> 2.4 Swap Extension (CONFIG_CGROUP_MEM_RES_CTLR_SWAP)
> +
> Swap Extension allows you to record charge for swap. A swapped-in page is
> charged back to original page allocator if possible.
>
> @@ -150,13 +187,20 @@ When swap is accounted, following files
> - memory.memsw.usage_in_bytes.
> - memory.memsw.limit_in_bytes.
>
> -usage of mem+swap is limited by memsw.limit_in_bytes.
> +memsw means memory+swap. Usage of memory+swap is limited by
> +memsw.limit_in_bytes.
>
> -* why 'mem+swap' rather than swap.
> +Example: Assume a system with 4G of swap. A task which allocates 6G of memory
> +(by mistake) under 2G memory limitation will use all swap.
> +In this case, setting memsw.limit_in_bytes=3G will prevent bad use of swap.
> +By using memsw limit, you can avoid system OOM which can be caused by swap
> +shortage.
> +
> +* why 'memory+swap' rather than swap.
> The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
> to move account from memory to swap...there is no change in usage of
> -mem+swap. In other words, when we want to limit the usage of swap without
> -affecting global LRU, mem+swap limit is better than just limiting swap from
> +memory+swap. In other words, when we want to limit the usage of swap without
> +affecting global LRU, memory+swap limit is better than just limiting swap from
> OS point of view.
>
> * What happens when a cgroup hits memory.memsw.limit_in_bytes
> @@ -168,12 +212,12 @@ it by cgroup.
>
> 2.5 Reclaim
>
> -Each cgroup maintains a per cgroup LRU that consists of an active
> -and inactive list. When a cgroup goes over its limit, we first try
> +Each cgroup maintains a per cgroup LRU which has the same structure as
> +global VM. When a cgroup goes over its limit, we first try
> to reclaim memory from the cgroup so as to make space for the new
> pages that the cgroup has touched. If the reclaim is unsuccessful,
> an OOM routine is invoked to select and kill the bulkiest task in the
> -cgroup.
> +cgroup. (See 10. OOM Control below.)
>
> The reclaim algorithm has not been modified for cgroups, except that
> pages that are selected for reclaiming come from the per cgroup LRU
> @@ -187,13 +231,19 @@ Note2: When panic_on_oom is set to "2",
> When oom event notifier is registered, event will be delivered.
> (See oom_control section)
>
> -2. Locking
> +2.6 Locking
>
> -The memory controller uses the following hierarchy
> + lock_page_cgroup()/unlock_page_cgroup() should not be called under
> + mapping->tree_lock.
>
> -1. zone->lru_lock is used for selecting pages to be isolated
> -2. mem->per_zone->lru_lock protects the per cgroup LRU (per zone)
> -3. lock_page_cgroup() is used to protect page->page_cgroup
> + Other lock order is following:
> + PG_locked.
> + mm->page_table_lock
> + zone->lru_lock
> + lock_page_cgroup.
> + In many cases, just lock_page_cgroup() is called.
> + per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by
> + zone->lru_lock, it has no lock of its own.
>
> 3. User Interface
>
> @@ -202,6 +252,7 @@ The memory controller uses the following
> a. Enable CONFIG_CGROUPS
> b. Enable CONFIG_RESOURCE_COUNTERS
> c. Enable CONFIG_CGROUP_MEM_RES_CTLR
> +d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension)
>
> 1. Prepare the cgroups
> # mkdir -p /cgroups
> @@ -209,31 +260,29 @@ c. Enable CONFIG_CGROUP_MEM_RES_CTLR
>
> 2. Make the new group and move bash into it
> # mkdir /cgroups/0
> -# echo $$ > /cgroups/0/tasks
> +# echo $$ > /cgroups/0/tasks
>
> Since now we're in the 0 cgroup,
> We can alter the memory limit:
> # echo 4M > /cgroups/0/memory.limit_in_bytes
>
> NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
> -mega or gigabytes.
> +mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.)
> +
> NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited).
> NOTE: We cannot set limits on the root cgroup any more.
>
> # cat /cgroups/0/memory.limit_in_bytes
> 4194304
>
> -NOTE: The interface has now changed to display the usage in bytes
> -instead of pages
> -
> We can check the usage:
> # cat /cgroups/0/memory.usage_in_bytes
> 1216512
>
> A successful write to this file does not guarantee a successful set of
> -this limit to the value written into the file. This can be due to a
> +this limit to the value written into the file. This can be due to a
> number of factors, such as rounding up to page boundaries or the total
> -availability of memory on the system. The user is required to re-read
> +availability of memory on the system. The user is required to re-read
> this file after a write to guarantee the value committed by the kernel.
>
> # echo 1 > memory.limit_in_bytes
> @@ -248,15 +297,25 @@ caches, RSS and Active pages/Inactive pa
>
> 4. Testing
>
> -Balbir posted lmbench, AIM9, LTP and vmmstress results [10] and [11].
> -Apart from that v6 has been tested with several applications and regular
> -daily use. The controller has also been tested on the PPC64, x86_64 and
> -UML platforms.
> +For testing features and implementation, see memcg_test.txt.
> +
> +Performance test is also important. To see pure memory cgroup's overhead,
> +testing on tmpfs will give you good numbers of small overheads.
> +Example: do kernel make on tmpfs.
> +
> +Page-fault scalability is also important. At measuring parallel
> +page fault test, multi-process test may be better than multi-thread
> +test because it has noise of shared objects/status.
> +
> +But the above two are testing extreme situations.
> +Trying usual test under memory cgroup is always helpful.
> +
> +

Extra newline.

>
> 4.1 Troubleshooting
>
> Sometimes a user might find that the application under a cgroup is
> -terminated. There are several causes for this:
> +terminated by OOM killer. There are several causes for this:
>
> 1. The cgroup limit is too low (just too low to do anything useful)
> 2. The user is using anonymous memory and swap is turned off or too low
> @@ -264,6 +323,9 @@ terminated. There are several causes for
> A sync followed by echo 1 > /proc/sys/vm/drop_caches will help get rid of
> some of the pages cached in the cgroup (page cache pages).
>
> +To know what happens, disable OOM_Kill by 10. OOM Control(see below) and
> +seeing what happens will be helpful.
> +
> 4.2 Task migration
>
> When a task migrates from one cgroup to another, it's charge is not
> @@ -271,16 +333,19 @@ carried forward by default. The pages al
> remain charged to it, the charge is dropped when the page is freed or
> reclaimed.
>
> -Note: You can move charges of a task along with task migration. See 8.
> +You can move charges of a task along with task migration.
> +See 8. "Move charges at task migration"
>
> 4.3 Removing a cgroup
>
> A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a
> cgroup might have some charge associated with it, even though all
> -tasks have migrated away from it.
> -Such charges are freed(at default) or moved to its parent. When moved,
> -both of RSS and CACHES are moved to parent.
> -If both of them are busy, rmdir() returns -EBUSY. See 5.1 Also.
> +tasks have migrated away from it. (because we charge against pages, not
> +against tasks.)
> +
> +Such charges are freed or moved to their parent. At moving, both of RSS
> +and CACHES are moved to parent.
> +rmdir() may return -EBUSY if freeing/moving fails. See 5.1 also.
>
> Charges recorded in swap information is not updated at removal of cgroup.
> Recorded information is discarded and a cgroup which uses swap (swapcache)
> @@ -296,10 +361,10 @@ will be charged as a new owner of it.
>
> # echo 0 > memory.force_empty
>
> - Almost all pages tracked by this memcg will be unmapped and freed. Some of
> - pages cannot be freed because it's locked or in-use. Such pages are moved
> - to parent and this cgroup will be empty. But this may return -EBUSY in
> - some too busy case.
> + Almost all pages tracked by this memory cgroup will be unmapped and freed.
> + Some pages cannot be freed because they are locked or in-use. Such pages are
> + moved to parent and this cgroup will be empty. This may return -EBUSY if
> + VM is too busy to free/move all pages immediately.
>
> Typical use case of this interface is that calling this before rmdir().
> Because rmdir() moves all pages to parent, some out-of-use page caches can be
> @@ -309,19 +374,41 @@ will be charged as a new owner of it.
>
> memory.stat file includes following statistics
>
> +# per-memory cgroup local status
> cache - # of bytes of page cache memory.
> rss - # of bytes of anonymous and swap cache memory.
> +mapped_file - # of bytes of mapped file (includes tmpfs/shmem)
> pgpgin - # of pages paged in (equivalent to # of charging events).
> pgpgout - # of pages paged out (equivalent to # of uncharging events).
> -active_anon - # of bytes of anonymous and swap cache memory on active
> - lru list.
> +swap - # of bytes of swap usage
> inactive_anon - # of bytes of anonymous memory and swap cache memory on
> - inactive lru list.
> -active_file - # of bytes of file-backed memory on active lru list.
> -inactive_file - # of bytes of file-backed memory on inactive lru list.
> + LRU list.
> +active_anon - # of bytes of anonymous and swap cache memory on active
> + inactive LRU list.
> +inactive_file - # of bytes of file-backed memory on inactive LRU list.
> +active_file - # of bytes of file-backed memory on active LRU list.
> unevictable - # of bytes of memory that cannot be reclaimed (mlocked etc).
>
> -The following additional stats are dependent on CONFIG_DEBUG_VM.
> +# status considering hierarchy (see memory.use_hierarchy settings)
> +
> +hierarchical_memory_limit - # of bytes of memory limit with regard to hierarchy
> + under which the memory cgroup is
> +hierarchical_memsw_limit - # of bytes of memory+swap limit with regard to
> + hierarchy under which memory cgroup is.
> +
> +total_cache - sum of all children's "cache"
> +total_rss - sum of all children's "rss"
> +total_mapped_file - sum of all children's "cache"
> +total_pgpgin - sum of all children's "pgpgin"
> +total_pgpgout - sum of all children's "pgpgout"
> +total_swap - sum of all children's "swap"
> +total_inactive_anon - sum of all children's "inactive_anon"
> +total_active_anon - sum of all children's "active_anon"
> +total_inactive_file - sum of all children's "inactive_file"
> +total_active_file - sum of all children's "active_file"
> +total_unevictable - sum of all children's "unevictable"
> +
> +# The following additional stats are dependent on CONFIG_DEBUG_VM.
>
> inactive_ratio - VM internal parameter. (see mm/page_alloc.c)
> recent_rotated_anon - VM internal parameter. (see mm/vmscan.c)
> @@ -330,24 +417,37 @@ recent_scanned_anon - VM internal parame
> recent_scanned_file - VM internal parameter. (see mm/vmscan.c)
>

Can we align the data on the right, like for total_* data earlier.

> Memo:
> - recent_rotated means recent frequency of lru rotation.
> - recent_scanned means recent # of scans to lru.
> + recent_rotated means recent frequency of LRU rotation.
> + recent_scanned means recent # of scans to LRU.
> showing for better debug please see the code for meanings.
>
> Note:
> Only anonymous and swap cache memory is listed as part of 'rss' stat.
> This should not be confused with the true 'resident set size' or the
> - amount of physical memory used by the cgroup. Per-cgroup rss
> - accounting is not done yet.
> + amount of physical memory used by the cgroup.
> + 'rss + file_mapped" will give you resident set size of cgroup.
> + (Note: file and shmem may be shared among other cgroups. In that case,
> + file_mapped is accounted only when the memory cgroup is owner of page
> + cache.)
>
> 5.3 swappiness
> - Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only.
>
> - Following cgroups' swappiness can't be changed.
> - - root cgroup (uses /proc/sys/vm/swappiness).
> - - a cgroup which uses hierarchy and it has child cgroup.
> - - a cgroup which uses hierarchy and not the root of hierarchy.
> +Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only.
> +
> +Following cgroups' swappiness can't be changed.
> +- root cgroup (uses /proc/sys/vm/swappiness).
> +- a cgroup which uses hierarchy and it has other cgroup(s) below it.
> +- a cgroup which uses hierarchy and not the root of hierarchy.
> +
> +5.4 failcnt
> +
> +A memory cgroup provides memory.failcnt and memory.memsw.failcnt files.
> +This failcnt(== failure count) shows the number of times that a usage counter
> +hit its limit. When a memory cgroup hits a limit, failcnt increases and
> +memory under it will be reclaimed.
>
> +You can reset failcnt by writing 0 to failcnt file.
> +# echo 0 > .../memory.failcnt
>
> 6. Hierarchy support
>
> @@ -366,13 +466,13 @@ hierarchy
>
> In the diagram above, with hierarchical accounting enabled, all memory
> usage of e, is accounted to its ancestors up until the root (i.e, c and root),
> -that has memory.use_hierarchy enabled. If one of the ancestors goes over its
> +that has memory.use_hierarchy enabled. If one of the ancestors goes over its
> limit, the reclaim algorithm reclaims from the tasks in the ancestor and the
> children of the ancestor.
>
> 6.1 Enabling hierarchical accounting and reclaim
>
> -The memory controller by default disables the hierarchy feature. Support
> +A memory cgroup by default disables the hierarchy feature. Support
> can be enabled by writing 1 to memory.use_hierarchy file of the root cgroup
>
> # echo 1 > memory.use_hierarchy
> @@ -382,10 +482,10 @@ The feature can be disabled by
> # echo 0 > memory.use_hierarchy
>
> NOTE1: Enabling/disabling will fail if the cgroup already has other
> -cgroups created below it.
> + cgroups created below it.
>
> NOTE2: When panic_on_oom is set to "2", the whole system will panic in
> -case of an oom event in any cgroup.
> + case of an OOM event in any cgroup.
>
> 7. Soft limits
>
> @@ -395,7 +495,7 @@ is to allow control groups to use as muc
> a. There is no memory contention
> b. They do not exceed their hard limit
>
> -When the system detects memory contention or low memory control groups
> +When the system detects memory contention or low memory, control groups
> are pushed back to their soft limits. If the soft limit of each control
> group is very high, they are pushed back as much as possible to make
> sure that one control group does not starve the others of memory.
> @@ -409,7 +509,7 @@ it gets invoked from balance_pgdat (kswa
> 7.1 Interface
>
> Soft limits can be setup by using the following commands (in this example we
> -assume a soft limit of 256 megabytes)
> +assume a soft limit of 256 MiB)
>
> # echo 256M > memory.soft_limit_in_bytes
>
> @@ -418,7 +518,7 @@ If we want to change this to 1G, we can
> # echo 1G > memory.soft_limit_in_bytes
>
> NOTE1: Soft limits take effect over a long period of time, since they involve
> - reclaiming memory for balancing between memory cgroups
> +reclaiming memory for balancing between memory cgroups
> NOTE2: It is recommended to set the soft limit always below the hard limit,
> otherwise the hard limit will take precedence.
>
> @@ -445,7 +545,7 @@ Note: Charges are moved only when you mo
> Note: If we cannot find enough space for the task in the destination cgroup, we
> try to make space by reclaiming memory. Task migration may fail if we
> cannot make enough space.
> -Note: It can take several seconds if you move charges in giga bytes order.
> +Note: It can take several seconds if you move charges much.
>
> And if you want disable it again:
>
> @@ -476,15 +576,15 @@ Note: More type of pages(e.g. file cache
>
> 9. Memory thresholds
>
> -Memory controler implements memory thresholds using cgroups notification
> +Memory cgroup implements memory thresholds using cgroups notification
> API (see cgroups.txt). It allows to register multiple memory and memsw
> thresholds and gets notifications when it crosses.
>
> To register a threshold application need:
> - - create an eventfd using eventfd(2);
> - - open memory.usage_in_bytes or memory.memsw.usage_in_bytes;
> - - write string like "<event_fd> <memory.usage_in_bytes> <threshold>" to
> - cgroup.event_control.
> +- create an eventfd using eventfd(2);
> +- open memory.usage_in_bytes or memory.memsw.usage_in_bytes;
> +- write string like "<event_fd> <memory.usage_in_bytes> <threshold>" to

Do we need the <> around memory.usage_in_bytes

> + cgroup.event_control.
>
> Application will be notified through eventfd when memory usage crosses
> threshold in any direction.
> @@ -495,27 +595,27 @@ It's applicable for root and non-root cg
>
> memory.oom_control file is for OOM notification and other controls.
>
> -Memory controler implements oom notifier using cgroup notification
> -API (See cgroups.txt). It allows to register multiple oom notification
> -delivery and gets notification when oom happens.
> +Memory cgroup implements OOM notifier using cgroup notification
> +API (See cgroups.txt). It allows to register multiple OOM notification
> +delivery and gets notification when OOM happens.
>
> To register a notifier, application need:
> - create an eventfd using eventfd(2)
> - open memory.oom_control file
> - write string like "<event_fd> <memory.oom_control>" to cgroup.event_control
>
> -Application will be notifier through eventfd when oom happens.
> +Application will be notified through eventfd when OOM happens.
> OOM notification doesn't work for root cgroup.
>
> -You can disable oom-killer by writing "1" to memory.oom_control file.
> +You can disable OOM-killer by writing "1" to memory.oom_control file.
> As.
> #echo 1 > memory.oom_control
>
> -This operation is only allowed to the top cgroup of subhierarchy.
> -If oom-killer is disabled, tasks under cgroup will hang/sleep
> -in memcg's oom-waitq when they request accountable memory.
> +This operation is only allowed to the top cgroup of sub-hierarchy.
> +If OOM-killer is disabled, tasks under cgroup will hang/sleep
> +in memory cgroup's OOM-waitqueue when they request accountable memory.
>
> -For running them, you have to relax the memcg's oom sitaution by
> +For running them, you have to relax the memory cgroup's OOM status by
> * enlarge limit or reduce usage.
> To reduce usage,
> * kill some tasks.
> @@ -526,7 +626,7 @@ Then, stopped tasks will work again.
>
> At reading, current status of OOM is shown.
> oom_kill_disable 0 or 1 (if 1, oom-killer is disabled)
> - under_oom 0 or 1 (if 1, the memcg is under OOM,tasks may
> + under_oom 0 or 1 (if 1, the memory cgroup is under OOM, tasks may
> be stopped.)
>
> 11. TODO
>

--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on
On Tue, 13 Apr 2010 11:34:06 +0530
Balbir Singh <balbir(a)linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> [2010-04-13 13:45:53]:
>
> > Thank you very much for your kindly helps!.
> > -Kame
> > =
> > Documentation update.
> >
> > Some information are old, and I think current document doesn't work
> > as "a guide for users".
> > We need summary of all of our controls, at least.
> >
> > Changelog: 2010/04/12
> > * applied feedback
> >
> > Changelog: 2010/04/09
> > * replace 'lru' with 'LRU' and 'oom' with 'OOM'
> > * fixed double-space breakage
> > * applied all comments and fixed wrong parts pointed out.
> > * fixed cgroup.procs
> >
> > Changelog: 2009/04/07
> > * fixed tons of typos.
> > * replaced "memcg" with "memory cgroup" AMAP.
> > * replaced "mem+swap" with "memory+swap"
> >
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com>
> > ---
> > Documentation/cgroups/memory.txt | 276 ++++++++++++++++++++++++++-------------
> > 1 file changed, 188 insertions(+), 88 deletions(-)
> >
> > Index: mmotm-temp/Documentation/cgroups/memory.txt
> > ===================================================================
> > --- mmotm-temp.orig/Documentation/cgroups/memory.txt
> > +++ mmotm-temp/Documentation/cgroups/memory.txt
> > @@ -4,16 +4,6 @@ NOTE: The Memory Resource Controller has
> > to as the memory controller in this document. Do not confuse memory controller
> > used here with the memory controller that is used in hardware.
> >
> > -Salient features
> > -
> > -a. Enable control of Anonymous, Page Cache (mapped and unmapped) and
> > - Swap Cache memory pages.
> > -b. The infrastructure allows easy addition of other types of memory to control
> > -c. Provides *zero overhead* for non memory controller users
> > -d. Provides a double LRU: global memory pressure causes reclaim from the
> > - global LRU; a cgroup on hitting a limit, reclaims from the per
> > - cgroup LRU
> > -
> > Benefits and Purpose of the memory controller
> >
> > The memory controller isolates the memory behaviour of a group of tasks
> > @@ -33,6 +23,45 @@ d. A CD/DVD burner could control the amo
> > e. There are several other use cases, find one or use the controller just
> > for fun (to learn and hack on the VM subsystem).
> >
> > +Current Status: linux-2.6.34-mmotm(development version of 2010/April)
> > +
> > +Features:
> > + - accounting anonymous pages, file caches, swap caches usage and limit them.
> > + - private LRU and reclaim routine. (system's global LRU and private LRU
> > + work independently from each other)
> > + - optionally, memory+swap usage can be accounted and limited.
> > + - hierarchical accounting
> > + - soft limit
> > + - moving(recharging) account at moving a task is selectable.
> > + - usage threshold notifier
> > + - oom-killer disable knob and oom-notifier
> > + - Root cgroup has no limit controls.
> > +
> > + Kernel memory and Hugepages are not under control yet. We just manage
> > + pages on LRU. To add more controls, we have to take care of performance.
> > +
> > +Brief summary of control files.
> > +
> > + tasks # attach a task(thread) and show list of threads
> > + cgroup.procs # show list of processes
> > + cgroup.event_control # an interface for event_fd()
> > + memory.usage_in_bytes # show current memory(RSS+Cache) usage.
> > + memory.memsw.usage_in_bytes # show current memory+Swap usage
> > + memory.limit_in_bytes # set/show limit of memory usage
> > + memory.memsw.limit_in_bytes # set/show limit of memory+Swap usage
> > + memory.failcnt # show the number of memory usage hits limits
> > + memory.memsw.failcnt # show the number of memory+Swap hits limits
> > + memory.max_usage_in_bytes # show max memory usage recorded
> > + memory.memsw.usage_in_bytes # show max memory+Swap usage recorded
> > + memory.soft_limit_in_bytes # set/show soft limit of memory usage
> > + memory.stat # show various statistics
> > + memory.use_hierarchy # set/show hierarchical account enabled
> > + memory.force_empty # trigger forced move charge to parent
> > + memory.swappiness # set/show swappiness parameter of vmscan
> > + (See sysctl's vm.swappiness)
> > + memory.move_charge_at_immigrate # set/show controls of moving charges
> > + memory.oom_control # set/show oom controls.
> > +
>
> Can we align the "#" comments, please!
>
It's aligned.


> > 1. History
> >
> > The memory controller has a long history. A request for comments for the memory
> > @@ -106,14 +135,14 @@ the necessary data structures and check
> > is over its limit. If it is then reclaim is invoked on the cgroup.
> > More details can be found in the reclaim section of this document.
> > If everything goes well, a page meta-data-structure called page_cgroup is
> > -allocated and associated with the page. This routine also adds the page to
> > -the per cgroup LRU.
> > +updated. page_cgroup has its own LRU on cgroup.
> > +(*) page_cgroup structure is allocated at boot/memory-hotplug time.
> >
> > 2.2.1 Accounting details
> >
> > All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
> > -(some pages which never be reclaimable and will not be on global LRU
> > - are not accounted. we just accounts pages under usual vm management.)
> > +Some pages which are never reclaimable and will not be on the global LRU
> > +are not accounted. We just account pages under usual VM management.
> >
> > RSS pages are accounted at page_fault unless they've already been accounted
> > for earlier. A file page will be accounted for as Page Cache when it's
> > @@ -121,12 +150,19 @@ inserted into inode (radix-tree). While
> > processes, duplicate accounting is carefully avoided.
> >
> > A RSS page is unaccounted when it's fully unmapped. A PageCache page is
> > -unaccounted when it's removed from radix-tree.
> > +unaccounted when it's removed from radix-tree. Even if RSS pages are fully
> > +unmapped (by kswapd), they may exist as SwapCache in the system until they
> > +are really freed. Such SwapCaches also also accounted.
> > +A swapped-in page is not accounted until it's mapped.
> > +
> > +Note: The kernel does swapin-readahead and read multiple swaps at once.
> > +This means swapped-in pages may contain pages for other tasks than a task
> > +causing page fault. So, we avoid accounting at swap-in I/O.
> >
> > At page migration, accounting information is kept.
> >
> > -Note: we just account pages-on-lru because our purpose is to control amount
> > -of used pages. not-on-lru pages are tend to be out-of-control from vm view.
> > +Note: we just account pages-on-LRU because our purpose is to control amount
> > +of used pages. not-on-LRU pages tend to be out-of-control from VM view.
>
> ^^ period might not be appropriate, if it is, n (not) should
> be caps

replaced with ,

> >
> > 2.3 Shared Page Accounting
> >
> > @@ -143,6 +179,7 @@ caller of swapoff rather than the users
> >
> >
> > 2.4 Swap Extension (CONFIG_CGROUP_MEM_RES_CTLR_SWAP)
> > +
> > Swap Extension allows you to record charge for swap. A swapped-in page is
> > charged back to original page allocator if possible.
> >
> > @@ -150,13 +187,20 @@ When swap is accounted, following files
> > - memory.memsw.usage_in_bytes.
> > - memory.memsw.limit_in_bytes.
> >
> > -usage of mem+swap is limited by memsw.limit_in_bytes.
> > +memsw means memory+swap. Usage of memory+swap is limited by
> > +memsw.limit_in_bytes.
> >
> > -* why 'mem+swap' rather than swap.
> > +Example: Assume a system with 4G of swap. A task which allocates 6G of memory
> > +(by mistake) under 2G memory limitation will use all swap.
> > +In this case, setting memsw.limit_in_bytes=3G will prevent bad use of swap.
> > +By using memsw limit, you can avoid system OOM which can be caused by swap
> > +shortage.
> > +
> > +* why 'memory+swap' rather than swap.
> > The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
> > to move account from memory to swap...there is no change in usage of
> > -mem+swap. In other words, when we want to limit the usage of swap without
> > -affecting global LRU, mem+swap limit is better than just limiting swap from
> > +memory+swap. In other words, when we want to limit the usage of swap without
> > +affecting global LRU, memory+swap limit is better than just limiting swap from
> > OS point of view.
> >
> > * What happens when a cgroup hits memory.memsw.limit_in_bytes
> > @@ -168,12 +212,12 @@ it by cgroup.
> >
> > 2.5 Reclaim
> >
> > -Each cgroup maintains a per cgroup LRU that consists of an active
> > -and inactive list. When a cgroup goes over its limit, we first try
> > +Each cgroup maintains a per cgroup LRU which has the same structure as
> > +global VM. When a cgroup goes over its limit, we first try
> > to reclaim memory from the cgroup so as to make space for the new
> > pages that the cgroup has touched. If the reclaim is unsuccessful,
> > an OOM routine is invoked to select and kill the bulkiest task in the
> > -cgroup.
> > +cgroup. (See 10. OOM Control below.)
> >
> > The reclaim algorithm has not been modified for cgroups, except that
> > pages that are selected for reclaiming come from the per cgroup LRU
> > @@ -187,13 +231,19 @@ Note2: When panic_on_oom is set to "2",
> > When oom event notifier is registered, event will be delivered.
> > (See oom_control section)
> >
> > -2. Locking
> > +2.6 Locking
> >
> > -The memory controller uses the following hierarchy
> > + lock_page_cgroup()/unlock_page_cgroup() should not be called under
> > + mapping->tree_lock.
> >
> > -1. zone->lru_lock is used for selecting pages to be isolated
> > -2. mem->per_zone->lru_lock protects the per cgroup LRU (per zone)
> > -3. lock_page_cgroup() is used to protect page->page_cgroup
> > + Other lock order is following:
> > + PG_locked.
> > + mm->page_table_lock
> > + zone->lru_lock
> > + lock_page_cgroup.
> > + In many cases, just lock_page_cgroup() is called.
> > + per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by
> > + zone->lru_lock, it has no lock of its own.
> >
> > 3. User Interface
> >
> > @@ -202,6 +252,7 @@ The memory controller uses the following
> > a. Enable CONFIG_CGROUPS
> > b. Enable CONFIG_RESOURCE_COUNTERS
> > c. Enable CONFIG_CGROUP_MEM_RES_CTLR
> > +d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension)
> >
> > 1. Prepare the cgroups
> > # mkdir -p /cgroups
> > @@ -209,31 +260,29 @@ c. Enable CONFIG_CGROUP_MEM_RES_CTLR
> >
> > 2. Make the new group and move bash into it
> > # mkdir /cgroups/0
> > -# echo $$ > /cgroups/0/tasks
> > +# echo $$ > /cgroups/0/tasks
> >
> > Since now we're in the 0 cgroup,
> > We can alter the memory limit:
> > # echo 4M > /cgroups/0/memory.limit_in_bytes
> >
> > NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
> > -mega or gigabytes.
> > +mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.)
> > +
> > NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited).
> > NOTE: We cannot set limits on the root cgroup any more.
> >
> > # cat /cgroups/0/memory.limit_in_bytes
> > 4194304
> >
> > -NOTE: The interface has now changed to display the usage in bytes
> > -instead of pages
> > -
> > We can check the usage:
> > # cat /cgroups/0/memory.usage_in_bytes
> > 1216512
> >
> > A successful write to this file does not guarantee a successful set of
> > -this limit to the value written into the file. This can be due to a
> > +this limit to the value written into the file. This can be due to a
> > number of factors, such as rounding up to page boundaries or the total
> > -availability of memory on the system. The user is required to re-read
> > +availability of memory on the system. The user is required to re-read
> > this file after a write to guarantee the value committed by the kernel.
> >
> > # echo 1 > memory.limit_in_bytes
> > @@ -248,15 +297,25 @@ caches, RSS and Active pages/Inactive pa
> >
> > 4. Testing
> >
> > -Balbir posted lmbench, AIM9, LTP and vmmstress results [10] and [11].
> > -Apart from that v6 has been tested with several applications and regular
> > -daily use. The controller has also been tested on the PPC64, x86_64 and
> > -UML platforms.
> > +For testing features and implementation, see memcg_test.txt.
> > +
> > +Performance test is also important. To see pure memory cgroup's overhead,
> > +testing on tmpfs will give you good numbers of small overheads.
> > +Example: do kernel make on tmpfs.
> > +
> > +Page-fault scalability is also important. At measuring parallel
> > +page fault test, multi-process test may be better than multi-thread
> > +test because it has noise of shared objects/status.
> > +
> > +But the above two are testing extreme situations.
> > +Trying usual test under memory cgroup is always helpful.
> > +
> > +
>
> Extra newline.
>
will remove

> >
> > 4.1 Troubleshooting
> >
> > Sometimes a user might find that the application under a cgroup is
> > -terminated. There are several causes for this:
> > +terminated by OOM killer. There are several causes for this:
> >
> > 1. The cgroup limit is too low (just too low to do anything useful)
> > 2. The user is using anonymous memory and swap is turned off or too low
> > @@ -264,6 +323,9 @@ terminated. There are several causes for
> > A sync followed by echo 1 > /proc/sys/vm/drop_caches will help get rid of
> > some of the pages cached in the cgroup (page cache pages).
> >
> > +To know what happens, disable OOM_Kill by 10. OOM Control(see below) and
> > +seeing what happens will be helpful.
> > +
> > 4.2 Task migration
> >
> > When a task migrates from one cgroup to another, it's charge is not
> > @@ -271,16 +333,19 @@ carried forward by default. The pages al
> > remain charged to it, the charge is dropped when the page is freed or
> > reclaimed.
> >
> > -Note: You can move charges of a task along with task migration. See 8.
> > +You can move charges of a task along with task migration.
> > +See 8. "Move charges at task migration"
> >
> > 4.3 Removing a cgroup
> >
> > A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a
> > cgroup might have some charge associated with it, even though all
> > -tasks have migrated away from it.
> > -Such charges are freed(at default) or moved to its parent. When moved,
> > -both of RSS and CACHES are moved to parent.
> > -If both of them are busy, rmdir() returns -EBUSY. See 5.1 Also.
> > +tasks have migrated away from it. (because we charge against pages, not
> > +against tasks.)
> > +
> > +Such charges are freed or moved to their parent. At moving, both of RSS
> > +and CACHES are moved to parent.
> > +rmdir() may return -EBUSY if freeing/moving fails. See 5.1 also.
> >
> > Charges recorded in swap information is not updated at removal of cgroup.
> > Recorded information is discarded and a cgroup which uses swap (swapcache)
> > @@ -296,10 +361,10 @@ will be charged as a new owner of it.
> >
> > # echo 0 > memory.force_empty
> >
> > - Almost all pages tracked by this memcg will be unmapped and freed. Some of
> > - pages cannot be freed because it's locked or in-use. Such pages are moved
> > - to parent and this cgroup will be empty. But this may return -EBUSY in
> > - some too busy case.
> > + Almost all pages tracked by this memory cgroup will be unmapped and freed.
> > + Some pages cannot be freed because they are locked or in-use. Such pages are
> > + moved to parent and this cgroup will be empty. This may return -EBUSY if
> > + VM is too busy to free/move all pages immediately.
> >
> > Typical use case of this interface is that calling this before rmdir().
> > Because rmdir() moves all pages to parent, some out-of-use page caches can be
> > @@ -309,19 +374,41 @@ will be charged as a new owner of it.
> >
> > memory.stat file includes following statistics
> >
> > +# per-memory cgroup local status
> > cache - # of bytes of page cache memory.
> > rss - # of bytes of anonymous and swap cache memory.
> > +mapped_file - # of bytes of mapped file (includes tmpfs/shmem)
> > pgpgin - # of pages paged in (equivalent to # of charging events).
> > pgpgout - # of pages paged out (equivalent to # of uncharging events).
> > -active_anon - # of bytes of anonymous and swap cache memory on active
> > - lru list.
> > +swap - # of bytes of swap usage
> > inactive_anon - # of bytes of anonymous memory and swap cache memory on
> > - inactive lru list.
> > -active_file - # of bytes of file-backed memory on active lru list.
> > -inactive_file - # of bytes of file-backed memory on inactive lru list.
> > + LRU list.
> > +active_anon - # of bytes of anonymous and swap cache memory on active
> > + inactive LRU list.
> > +inactive_file - # of bytes of file-backed memory on inactive LRU list.
> > +active_file - # of bytes of file-backed memory on active LRU list.
> > unevictable - # of bytes of memory that cannot be reclaimed (mlocked etc).
> >
> > -The following additional stats are dependent on CONFIG_DEBUG_VM.
> > +# status considering hierarchy (see memory.use_hierarchy settings)
> > +
> > +hierarchical_memory_limit - # of bytes of memory limit with regard to hierarchy
> > + under which the memory cgroup is
> > +hierarchical_memsw_limit - # of bytes of memory+swap limit with regard to
> > + hierarchy under which memory cgroup is.
> > +
> > +total_cache - sum of all children's "cache"
> > +total_rss - sum of all children's "rss"
> > +total_mapped_file - sum of all children's "cache"
> > +total_pgpgin - sum of all children's "pgpgin"
> > +total_pgpgout - sum of all children's "pgpgout"
> > +total_swap - sum of all children's "swap"
> > +total_inactive_anon - sum of all children's "inactive_anon"
> > +total_active_anon - sum of all children's "active_anon"
> > +total_inactive_file - sum of all children's "inactive_file"
> > +total_active_file - sum of all children's "active_file"
> > +total_unevictable - sum of all children's "unevictable"
> > +
> > +# The following additional stats are dependent on CONFIG_DEBUG_VM.
> >
> > inactive_ratio - VM internal parameter. (see mm/page_alloc.c)
> > recent_rotated_anon - VM internal parameter. (see mm/vmscan.c)
> > @@ -330,24 +417,37 @@ recent_scanned_anon - VM internal parame
> > recent_scanned_file - VM internal parameter. (see mm/vmscan.c)
> >
>
> Can we align the data on the right, like for total_* data earlier.
>
It's aligned.


> > Memo:
> > - recent_rotated means recent frequency of lru rotation.
> > - recent_scanned means recent # of scans to lru.
> > + recent_rotated means recent frequency of LRU rotation.
> > + recent_scanned means recent # of scans to LRU.
> > showing for better debug please see the code for meanings.
> >
> > Note:
> > Only anonymous and swap cache memory is listed as part of 'rss' stat.
> > This should not be confused with the true 'resident set size' or the
> > - amount of physical memory used by the cgroup. Per-cgroup rss
> > - accounting is not done yet.
> > + amount of physical memory used by the cgroup.
> > + 'rss + file_mapped" will give you resident set size of cgroup.
> > + (Note: file and shmem may be shared among other cgroups. In that case,
> > + file_mapped is accounted only when the memory cgroup is owner of page
> > + cache.)
> >
> > 5.3 swappiness
> > - Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only.
> >
> > - Following cgroups' swappiness can't be changed.
> > - - root cgroup (uses /proc/sys/vm/swappiness).
> > - - a cgroup which uses hierarchy and it has child cgroup.
> > - - a cgroup which uses hierarchy and not the root of hierarchy.
> > +Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only.
> > +
> > +Following cgroups' swappiness can't be changed.
> > +- root cgroup (uses /proc/sys/vm/swappiness).
> > +- a cgroup which uses hierarchy and it has other cgroup(s) below it.
> > +- a cgroup which uses hierarchy and not the root of hierarchy.
> > +
> > +5.4 failcnt
> > +
> > +A memory cgroup provides memory.failcnt and memory.memsw.failcnt files.
> > +This failcnt(== failure count) shows the number of times that a usage counter
> > +hit its limit. When a memory cgroup hits a limit, failcnt increases and
> > +memory under it will be reclaimed.
> >
> > +You can reset failcnt by writing 0 to failcnt file.
> > +# echo 0 > .../memory.failcnt
> >
> > 6. Hierarchy support
> >
> > @@ -366,13 +466,13 @@ hierarchy
> >
> > In the diagram above, with hierarchical accounting enabled, all memory
> > usage of e, is accounted to its ancestors up until the root (i.e, c and root),
> > -that has memory.use_hierarchy enabled. If one of the ancestors goes over its
> > +that has memory.use_hierarchy enabled. If one of the ancestors goes over its
> > limit, the reclaim algorithm reclaims from the tasks in the ancestor and the
> > children of the ancestor.
> >
> > 6.1 Enabling hierarchical accounting and reclaim
> >
> > -The memory controller by default disables the hierarchy feature. Support
> > +A memory cgroup by default disables the hierarchy feature. Support
> > can be enabled by writing 1 to memory.use_hierarchy file of the root cgroup
> >
> > # echo 1 > memory.use_hierarchy
> > @@ -382,10 +482,10 @@ The feature can be disabled by
> > # echo 0 > memory.use_hierarchy
> >
> > NOTE1: Enabling/disabling will fail if the cgroup already has other
> > -cgroups created below it.
> > + cgroups created below it.
> >
> > NOTE2: When panic_on_oom is set to "2", the whole system will panic in
> > -case of an oom event in any cgroup.
> > + case of an OOM event in any cgroup.
> >
> > 7. Soft limits
> >
> > @@ -395,7 +495,7 @@ is to allow control groups to use as muc
> > a. There is no memory contention
> > b. They do not exceed their hard limit
> >
> > -When the system detects memory contention or low memory control groups
> > +When the system detects memory contention or low memory, control groups
> > are pushed back to their soft limits. If the soft limit of each control
> > group is very high, they are pushed back as much as possible to make
> > sure that one control group does not starve the others of memory.
> > @@ -409,7 +509,7 @@ it gets invoked from balance_pgdat (kswa
> > 7.1 Interface
> >
> > Soft limits can be setup by using the following commands (in this example we
> > -assume a soft limit of 256 megabytes)
> > +assume a soft limit of 256 MiB)
> >
> > # echo 256M > memory.soft_limit_in_bytes
> >
> > @@ -418,7 +518,7 @@ If we want to change this to 1G, we can
> > # echo 1G > memory.soft_limit_in_bytes
> >
> > NOTE1: Soft limits take effect over a long period of time, since they involve
> > - reclaiming memory for balancing between memory cgroups
> > +reclaiming memory for balancing between memory cgroups
> > NOTE2: It is recommended to set the soft limit always below the hard limit,
> > otherwise the hard limit will take precedence.
> >
> > @@ -445,7 +545,7 @@ Note: Charges are moved only when you mo
> > Note: If we cannot find enough space for the task in the destination cgroup, we
> > try to make space by reclaiming memory. Task migration may fail if we
> > cannot make enough space.
> > -Note: It can take several seconds if you move charges in giga bytes order.
> > +Note: It can take several seconds if you move charges much.
> >
> > And if you want disable it again:
> >
> > @@ -476,15 +576,15 @@ Note: More type of pages(e.g. file cache
> >
> > 9. Memory thresholds
> >
> > -Memory controler implements memory thresholds using cgroups notification
> > +Memory cgroup implements memory thresholds using cgroups notification
> > API (see cgroups.txt). It allows to register multiple memory and memsw
> > thresholds and gets notifications when it crosses.
> >
> > To register a threshold application need:
> > - - create an eventfd using eventfd(2);
> > - - open memory.usage_in_bytes or memory.memsw.usage_in_bytes;
> > - - write string like "<event_fd> <memory.usage_in_bytes> <threshold>" to
> > - cgroup.event_control.
> > +- create an eventfd using eventfd(2);
> > +- open memory.usage_in_bytes or memory.memsw.usage_in_bytes;
> > +- write string like "<event_fd> <memory.usage_in_bytes> <threshold>" to
>
> Do we need the <> around memory.usage_in_bytes
>

Hmm ? I'm not sure.

Could you explain why you think removing <> is better ?

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Balbir Singh on
* KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> [2010-04-13 15:20:48]:

[snip]

The alignment does not show up in the patches, hence the comments

> > Do we need the <> around memory.usage_in_bytes
> >
>
> Hmm ? I'm not sure.
>
> Could you explain why you think removing <> is better ?
>

Because it is an actual name as compared to using <> for to be
replaced with the right thing.

--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on
On Tue, Apr 13, 2010 at 01:45:53PM +0900, KAMEZAWA Hiroyuki wrote:

[..]
> -2. Locking
> +2.6 Locking
>
> -The memory controller uses the following hierarchy
> + lock_page_cgroup()/unlock_page_cgroup() should not be called under
> + mapping->tree_lock.
>

Because I never understood very well, I will ask. Why lock_page_cgroup()
should not be called under mapping->tree_lock?

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on
On Tue, Apr 13, 2010 at 09:57:18AM -0400, Vivek Goyal wrote:
> On Tue, Apr 13, 2010 at 01:45:53PM +0900, KAMEZAWA Hiroyuki wrote:
>

Typed wrong email id last time and mail bounced. So here is another
attempt.

> [..]
> > -2. Locking
> > +2.6 Locking
> >
> > -The memory controller uses the following hierarchy
> > + lock_page_cgroup()/unlock_page_cgroup() should not be called under
> > + mapping->tree_lock.
> >
>
> Because I never understood very well, I will ask. Why lock_page_cgroup()
> should not be called under mapping->tree_lock?
>
> Thanks
> Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/