quick lookup memcg by ID [Kernel]

Prev: linux-next: build failure after merge of the block tree
Next: [PATCH resend] ksm: cleanup for mm_slots_hash

From: Balbir Singh on 2 Aug 2010 23:30

* KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> [2010-08-02 19:13:04]:

> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com>
>
> Now, memory cgroup has an ID per cgroup and make use of it at
> - hierarchy walk,
> - swap recording.
>
> This patch is for making more use of it. The final purpose is
> to replace page_cgroup->mem_cgroup's pointer to an unsigned short.
>
> This patch caches a pointer of memcg in an array. By this, we
> don't have to call css_lookup() which requires radix-hash walk.
> This saves some amount of memory footprint at lookup memcg via id.
>

It is a memory versus speed tradeoff, but if the number of created
cgroups is low, it might not be all that slow, besides we do that for
swap_cgroup anyway - no?

--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: KAMEZAWA Hiroyuki on 2 Aug 2010 23:30

On Tue, 3 Aug 2010 08:52:16 +0530
Balbir Singh <balbir(a)linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> [2010-08-02 19:13:04]:
>
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com>
> >
> > Now, memory cgroup has an ID per cgroup and make use of it at
> > - hierarchy walk,
> > - swap recording.
> >
> > This patch is for making more use of it. The final purpose is
> > to replace page_cgroup->mem_cgroup's pointer to an unsigned short.
> >
> > This patch caches a pointer of memcg in an array. By this, we
> > don't have to call css_lookup() which requires radix-hash walk.
> > This saves some amount of memory footprint at lookup memcg via id.
> >
>
> It is a memory versus speed tradeoff, but if the number of created
> cgroups is low, it might not be all that slow, besides we do that for
> swap_cgroup anyway - no?
>

In following patch, pc->page_cgroup is changed from pointer to ID.
Then, this lookup happens in lru_add/del, for example.
And, by this, we can place all lookup related things to __read_mostly.
With css_lookup(), we can't do it and have to be afraid of cache
behavior.

I hear there are a users who create 2000+ cgroups and considering
about "low number" user here is not important.
This patch is a help for getting _stable_ performance even when there are
many cgroups.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Balbir Singh on 2 Aug 2010 23:40

* KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> [2010-08-03 12:21:58]:

> On Tue, 3 Aug 2010 08:52:16 +0530
> Balbir Singh <balbir(a)linux.vnet.ibm.com> wrote:
>
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> [2010-08-02 19:13:04]:
> >
> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com>
> > >
> > > Now, memory cgroup has an ID per cgroup and make use of it at
> > > - hierarchy walk,
> > > - swap recording.
> > >
> > > This patch is for making more use of it. The final purpose is
> > > to replace page_cgroup->mem_cgroup's pointer to an unsigned short.
> > >
> > > This patch caches a pointer of memcg in an array. By this, we
> > > don't have to call css_lookup() which requires radix-hash walk.
> > > This saves some amount of memory footprint at lookup memcg via id.
> > >
> >
> > It is a memory versus speed tradeoff, but if the number of created
> > cgroups is low, it might not be all that slow, besides we do that for
> > swap_cgroup anyway - no?
> >
>
> In following patch, pc->page_cgroup is changed from pointer to ID.
> Then, this lookup happens in lru_add/del, for example.
> And, by this, we can place all lookup related things to __read_mostly.
> With css_lookup(), we can't do it and have to be afraid of cache
> behavior.
>

OK, fair enough

> I hear there are a users who create 2000+ cgroups and considering
> about "low number" user here is not important.
> This patch is a help for getting _stable_ performance even when there are
> many cgroups.
>

I've heard of one such user on the libcgroup mailing list.

--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Daisuke Nishimura on 3 Aug 2010 00:40

Hi.

Thank you for all of your works.

Several comments are inlined.

On Mon, 2 Aug 2010 19:13:04 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:

> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com>
>
> Now, memory cgroup has an ID per cgroup and make use of it at
> - hierarchy walk,
> - swap recording.
>
> This patch is for making more use of it. The final purpose is
> to replace page_cgroup->mem_cgroup's pointer to an unsigned short.
>
> This patch caches a pointer of memcg in an array. By this, we
> don't have to call css_lookup() which requires radix-hash walk.
> This saves some amount of memory footprint at lookup memcg via id.
>
> Changelog: 20100730
> - fixed rcu_read_unlock() placement.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com>
> ---
> init/Kconfig | 11 +++++++++++
> mm/memcontrol.c | 48 ++++++++++++++++++++++++++++++++++--------------
> 2 files changed, 45 insertions(+), 14 deletions(-)
>
> Index: mmotm-0727/mm/memcontrol.c
> ===================================================================
> --- mmotm-0727.orig/mm/memcontrol.c
> +++ mmotm-0727/mm/memcontrol.c
> @@ -292,6 +292,30 @@ static bool move_file(void)
> &mc.to->move_charge_at_immigrate);
> }
>
> +/* 0 is unused */
> +static atomic_t mem_cgroup_num;
> +#define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
> +static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
> +
> +static struct mem_cgroup *id_to_memcg(unsigned short id)
> +{
> + /*
> + * This array is set to NULL when mem_cgroup is freed.
> + * IOW, there are no more references && rcu_synchronized().
> + * This lookup-caching is safe.
> + */
> + if (unlikely(!mem_cgroups[id])) {
> + struct cgroup_subsys_state *css;
> +
> + rcu_read_lock();
> + css = css_lookup(&mem_cgroup_subsys, id);
> + rcu_read_unlock();
> + if (!css)
> + return NULL;
> + mem_cgroups[id] = container_of(css, struct mem_cgroup, css);
> + }
> + return mem_cgroups[id];
> +}
id_to_memcg() seems to be called under rcu_read_lock() already, so I think
rcu_read_lock()/unlock() would be unnecessary.

> Index: mmotm-0727/init/Kconfig
> ===================================================================
> --- mmotm-0727.orig/init/Kconfig
> +++ mmotm-0727/init/Kconfig
> @@ -594,6 +594,17 @@ config CGROUP_MEM_RES_CTLR_SWAP
> Now, memory usage of swap_cgroup is 2 bytes per entry. If swap page
> size is 4096bytes, 512k per 1Gbytes of swap.
>
> +config MEM_CGROUP_MAX_GROUPS
> + int "Maximum number of memory cgroups on a system"
> + range 1 65535
> + default 8192 if 64BIT
> + default 2048 if 32BIT
> + help
> + Memory cgroup has limitation of the number of groups created.
> + Please select your favorite value. The more you allow, the more
> + memory will be consumed. This consumes vmalloc() area, so,
> + this should be small on 32bit arch.
> +
We don't use vmalloc() area in this version :)

Thanks,
Daisuke Nishimura.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: KAMEZAWA Hiroyuki on 3 Aug 2010 00:50

On Tue, 3 Aug 2010 13:31:09 +0900
Daisuke Nishimura <nishimura(a)mxp.nes.nec.co.jp> wrote:

> Hi.
>
> Thank you for all of your works.
>
> Several comments are inlined.
>
> On Mon, 2 Aug 2010 19:13:04 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
>
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com>
> >
> > Now, memory cgroup has an ID per cgroup and make use of it at
> > - hierarchy walk,
> > - swap recording.
> >
> > This patch is for making more use of it. The final purpose is
> > to replace page_cgroup->mem_cgroup's pointer to an unsigned short.
> >
> > This patch caches a pointer of memcg in an array. By this, we
> > don't have to call css_lookup() which requires radix-hash walk.
> > This saves some amount of memory footprint at lookup memcg via id.
> >
> > Changelog: 20100730
> > - fixed rcu_read_unlock() placement.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com>
> > ---
> > init/Kconfig | 11 +++++++++++
> > mm/memcontrol.c | 48 ++++++++++++++++++++++++++++++++++--------------
> > 2 files changed, 45 insertions(+), 14 deletions(-)
> >
> > Index: mmotm-0727/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-0727.orig/mm/memcontrol.c
> > +++ mmotm-0727/mm/memcontrol.c
> > @@ -292,6 +292,30 @@ static bool move_file(void)
> > &mc.to->move_charge_at_immigrate);
> > }
> >
> > +/* 0 is unused */
> > +static atomic_t mem_cgroup_num;
> > +#define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
> > +static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
> > +
> > +static struct mem_cgroup *id_to_memcg(unsigned short id)
> > +{
> > + /*
> > + * This array is set to NULL when mem_cgroup is freed.
> > + * IOW, there are no more references && rcu_synchronized().
> > + * This lookup-caching is safe.
> > + */
> > + if (unlikely(!mem_cgroups[id])) {
> > + struct cgroup_subsys_state *css;
> > +
> > + rcu_read_lock();
> > + css = css_lookup(&mem_cgroup_subsys, id);
> > + rcu_read_unlock();
> > + if (!css)
> > + return NULL;
> > + mem_cgroups[id] = container_of(css, struct mem_cgroup, css);
> > + }
> > + return mem_cgroups[id];
> > +}
> id_to_memcg() seems to be called under rcu_read_lock() already, so I think
> rcu_read_lock()/unlock() would be unnecessary.
>

Maybe. I thought about which is better to add

VM_BUG_ON(!rcu_read_lock_held);
or
rcu_read_lock()
..
rcu_read_unlock()

Do you like former ? If so, it's ok to remove rcu-read-lock.

> > Index: mmotm-0727/init/Kconfig
> > ===================================================================
> > --- mmotm-0727.orig/init/Kconfig
> > +++ mmotm-0727/init/Kconfig
> > @@ -594,6 +594,17 @@ config CGROUP_MEM_RES_CTLR_SWAP
> > Now, memory usage of swap_cgroup is 2 bytes per entry. If swap page
> > size is 4096bytes, 512k per 1Gbytes of swap.
> >
> > +config MEM_CGROUP_MAX_GROUPS
> > + int "Maximum number of memory cgroups on a system"
> > + range 1 65535
> > + default 8192 if 64BIT
> > + default 2048 if 32BIT
> > + help
> > + Memory cgroup has limitation of the number of groups created.
> > + Please select your favorite value. The more you allow, the more
> > + memory will be consumed. This consumes vmalloc() area, so,
> > + this should be small on 32bit arch.
> > +
> We don't use vmalloc() area in this version :)
>
Oh. yes. thank you. I'll fix

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2
Prev: linux-next: build failure after merge of the block tree
Next: [PATCH resend] ksm: cleanup for mm_slots_hash