From: KAMEZAWA Hiroyuki on
On Fri, 09 Jul 2010 19:06:28 -0400
Munehiro Ikeda <m-ikeda(a)ds.jp.nec.com> wrote:

> OK, we can do it like:
>
> do {
> old = pc->flags;
> new = old & ((1UL << PAGE_TRACKING_ID_SHIFT) - 1);
> new |= (unsigned long)(id << PAGE_TRACKING_ID_SHIFT);
> } while (cmpxchg(&pc->flags, old, new) != old);
>
>
> > IIUC, there is an generic version now even if it's heavy.
>
> I couldn't find it out...is there already? Or you mean we should
> have generic one?
>

generic cmpxchg in /include/asm-generic/cmpxchg.h
But...ahh...in some arch, you can't use cmpxchg, sorry.

How about start from "a new field" for I/O cgroup in page_cgroup ?

struct page_cgroup {
unsigned long flags;
struct mem_cgroup *mem_cgroup;
unsigned short blkio_cgroup_id;
struct page *page;
struct list_head lru; /* per cgroup LRU list */
};

We can consider how we optimize it out, later.
(And, it's easier to debug and make development smooth.)

For example, If we can create a vmalloced-array of mem_cgroup,
id->mem_cgroup lookup can be very fast and we can replace
pc->mem_cgroup link with pc->mem_cgroup_id.


>
> >> +unsigned long page_cgroup_get_owner(struct page *page);
> >> +int page_cgroup_set_owner(struct page *page, unsigned long id);
> >> +int page_cgroup_copy_owner(struct page *npage, struct page *opage);
> >> +
> >> void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat);
> >>
> >> #ifdef CONFIG_SPARSEMEM
> >> diff --git a/init/Kconfig b/init/Kconfig
> >> index 2e40f2f..337ee01 100644
> >> --- a/init/Kconfig
> >> +++ b/init/Kconfig
> >> @@ -650,7 +650,7 @@ endif # CGROUPS
> >>
> >> config CGROUP_PAGE
> >> def_bool y
> >> - depends on CGROUP_MEM_RES_CTLR
> >> + depends on CGROUP_MEM_RES_CTLR || GROUP_IOSCHED_ASYNC
> >>
> >> config MM_OWNER
> >> bool
> >> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> >> index 6c00814..69e080c 100644
> >> --- a/mm/page_cgroup.c
> >> +++ b/mm/page_cgroup.c
> >> @@ -9,6 +9,7 @@
> >> #include<linux/vmalloc.h>
> >> #include<linux/cgroup.h>
> >> #include<linux/swapops.h>
> >> +#include<linux/blk-iotrack.h>
> >>
> >> static void __meminit
> >> __init_page_cgroup(struct page_cgroup *pc, unsigned long pfn)
> >> @@ -74,7 +75,7 @@ void __init page_cgroup_init_flatmem(void)
> >>
> >> int nid, fail;
> >>
> >> - if (mem_cgroup_disabled())
> >> + if (mem_cgroup_disabled()&& blk_iotrack_disabled())
> >> return;
> >>
> >> for_each_online_node(nid) {
> >> @@ -83,12 +84,13 @@ void __init page_cgroup_init_flatmem(void)
> >> goto fail;
> >> }
> >> printk(KERN_INFO "allocated %ld bytes of page_cgroup\n", total_usage);
> >> - printk(KERN_INFO "please try 'cgroup_disable=memory' option if you"
> >> - " don't want memory cgroups\n");
> >> + printk(KERN_INFO "please try 'cgroup_disable=memory,blkio' option"
> >> + " if you don't want memory and blkio cgroups\n");
> >> return;
> >> fail:
> >> printk(KERN_CRIT "allocation of page_cgroup failed.\n");
> >> - printk(KERN_CRIT "please try 'cgroup_disable=memory' boot option\n");
> >> + printk(KERN_CRIT
> >> + "please try 'cgroup_disable=memory,blkio' boot option\n");
> >> panic("Out of memory");
> >> }
> >
> > Hmm, io-track is always done if blkio-cgroup is used, Right ?
>
> No, iotrack is enabled only when CONFIG_GROUP_IOSCHED_ASYNC=y.
> If =n, iotrack is disabled even if blkio-cgroup is enabled.
>


>
> > Then, why you have extra config as CONFIG_GROUP_IOSCHED_ASYNC ?
> > Is it necessary ?
>
> Current purpose of the option is only for debug because it is
> experimental functionality.
> It can be removed if this work is completed, or dynamic switch
> might be useful.
>
> In fact, just "cgroup_disable=memory" is enough for the failure
> case. Let me think right messages.
>

IMHO, once you add boot-option or sysctl, it's very hard to remove it, lator.
So, if you think you'll remove it lator, don't add it or just add CONFIG.

Regards,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Munehiro IKEDA on
KAMEZAWA Hiroyuki wrote:
> On Fri, 09 Jul 2010 19:06:28 -0400
> Munehiro Ikeda <m-ikeda(a)ds.jp.nec.com> wrote:
>
>> OK, we can do it like:
>>
>> do {
>> old = pc->flags;
>> new = old & ((1UL << PAGE_TRACKING_ID_SHIFT) - 1);
>> new |= (unsigned long)(id << PAGE_TRACKING_ID_SHIFT);
>> } while (cmpxchg(&pc->flags, old, new) != old);
>>
>>
>>> IIUC, there is an generic version now even if it's heavy.
>> I couldn't find it out...is there already? Or you mean we should
>> have generic one?
>>
>
> generic cmpxchg in /include/asm-generic/cmpxchg.h
> But...ahh...in some arch, you can't use cmpxchg, sorry.
>
> How about start from "a new field" for I/O cgroup in page_cgroup ?
>
> struct page_cgroup {
> unsigned long flags;
> struct mem_cgroup *mem_cgroup;
> unsigned short blkio_cgroup_id;
> struct page *page;
> struct list_head lru; /* per cgroup LRU list */
> };
>
> We can consider how we optimize it out, later.
> (And, it's easier to debug and make development smooth.)
>
> For example, If we can create a vmalloced-array of mem_cgroup,
> id->mem_cgroup lookup can be very fast and we can replace
> pc->mem_cgroup link with pc->mem_cgroup_id.

Nice suggestion. Thanks Kame-san.


>>>> +unsigned long page_cgroup_get_owner(struct page *page);
>>>> +int page_cgroup_set_owner(struct page *page, unsigned long id);
>>>> +int page_cgroup_copy_owner(struct page *npage, struct page *opage);
>>>> +
>>>> void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat);
>>>>
>>>> #ifdef CONFIG_SPARSEMEM
>>>> diff --git a/init/Kconfig b/init/Kconfig
>>>> index 2e40f2f..337ee01 100644
>>>> --- a/init/Kconfig
>>>> +++ b/init/Kconfig
>>>> @@ -650,7 +650,7 @@ endif # CGROUPS
>>>>
>>>> config CGROUP_PAGE
>>>> def_bool y
>>>> - depends on CGROUP_MEM_RES_CTLR
>>>> + depends on CGROUP_MEM_RES_CTLR || GROUP_IOSCHED_ASYNC
>>>>
>>>> config MM_OWNER
>>>> bool
>>>> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
>>>> index 6c00814..69e080c 100644
>>>> --- a/mm/page_cgroup.c
>>>> +++ b/mm/page_cgroup.c
>>>> @@ -9,6 +9,7 @@
>>>> #include<linux/vmalloc.h>
>>>> #include<linux/cgroup.h>
>>>> #include<linux/swapops.h>
>>>> +#include<linux/blk-iotrack.h>
>>>>
>>>> static void __meminit
>>>> __init_page_cgroup(struct page_cgroup *pc, unsigned long pfn)
>>>> @@ -74,7 +75,7 @@ void __init page_cgroup_init_flatmem(void)
>>>>
>>>> int nid, fail;
>>>>
>>>> - if (mem_cgroup_disabled())
>>>> + if (mem_cgroup_disabled()&& blk_iotrack_disabled())
>>>> return;
>>>>
>>>> for_each_online_node(nid) {
>>>> @@ -83,12 +84,13 @@ void __init page_cgroup_init_flatmem(void)
>>>> goto fail;
>>>> }
>>>> printk(KERN_INFO "allocated %ld bytes of page_cgroup\n", total_usage);
>>>> - printk(KERN_INFO "please try 'cgroup_disable=memory' option if you"
>>>> - " don't want memory cgroups\n");
>>>> + printk(KERN_INFO "please try 'cgroup_disable=memory,blkio' option"
>>>> + " if you don't want memory and blkio cgroups\n");
>>>> return;
>>>> fail:
>>>> printk(KERN_CRIT "allocation of page_cgroup failed.\n");
>>>> - printk(KERN_CRIT "please try 'cgroup_disable=memory' boot option\n");
>>>> + printk(KERN_CRIT
>>>> + "please try 'cgroup_disable=memory,blkio' boot option\n");
>>>> panic("Out of memory");
>>>> }
>>> Hmm, io-track is always done if blkio-cgroup is used, Right ?
>> No, iotrack is enabled only when CONFIG_GROUP_IOSCHED_ASYNC=y.
>> If =n, iotrack is disabled even if blkio-cgroup is enabled.
>>
>
>
>>> Then, why you have extra config as CONFIG_GROUP_IOSCHED_ASYNC ?
>>> Is it necessary ?
>> Current purpose of the option is only for debug because it is
>> experimental functionality.
>> It can be removed if this work is completed, or dynamic switch
>> might be useful.
>>
>> In fact, just "cgroup_disable=memory" is enough for the failure
>> case. Let me think right messages.
>>
>
> IMHO, once you add boot-option or sysctl, it's very hard to remove it, lator.
> So, if you think you'll remove it lator, don't add it or just add CONFIG.

OK. I understand we need to seriously think over to add a new boot option.


Thanks,
Muuhh


--
IKEDA, Munehiro
NEC Corporation of America
m-ikeda(a)ds.jp.nec.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/