Prev: [-next July 9 - s390 ] Badness at fs/sysfs/symlink.c:82 during qeth initalization
Next: Badness at fs/sysfs/symlink.c:82 during qeth initalization
From: Vivek Goyal on 16 Jul 2010 09:50 On Thu, Jul 15, 2010 at 09:00:48AM +0900, KAMEZAWA Hiroyuki wrote: > On Wed, 14 Jul 2010 10:29:19 -0400 > Vivek Goyal <vgoyal(a)redhat.com> wrote: > > > > > > Cgroup's feature as mounting several subsystems at a mount point at once > > > is very useful in many case. > > > > I agree that it is useful but if some controllers are not supporting > > hierarchy, it just adds to more confusion. And later when hierarchy > > support comes in, there will be additional issue of keeping this file > > "use_hierarchy" like memory controller. > > > > So at this point of time , I am not too inclined towards allowing hierarchical > > cgroup creation but treating them as flat in CFQ. I think it adds to the > > confusion and user space should handle this situation. > > > > Hmm. > > Could you fix error code in create blkio cgroup ? It returns -EINVAL now. > IIUC, mkdir(2) doesn't return -EINVAL as error code (from man.) > Then, it's very confusing. I think -EPERM or -ENOMEM will be much better. Hm..., Probably -EPERM is somewhat close to what we are doing. File system does supoort creation of directories but not after certain level. I will trace more instances of mkdir error values. > > Anyway, I need to see source code of blk-cgroup.c to know why libvirt fails > to create cgroup. [CCing daniel berrange] AFAIK, libvirt does not have support for blkio controller yet. Are you trying to introduce that? libvirt creates a direcotry tree. I think /cgroup/libvirt/qemu/kvm-dirs. So actual virtual machine directors are 2-3 level below and that would explain that if you try to use blkio controller with libvirt, it will fail because it will not be able to create directories at that level. I think libvirt need to special case blkio here to create directories in top level. It is odd but really there are no easy answeres. Will we not support a controller in libvirt till controller support hierarchy. > Where is the user-visible information (in RHEL or Fedora) > about "you can't use blkio-cgroup via libvirt or libcgroup" ? [CCing balbir] I think with libcgroup you can use blkio controller. I know somebody who was using cgexec command to launch some jobs in blkio cgroups. AFAIK, libcgroup does not have too much controller specific state and should not require any modifications for blkio controller. Balbir can tell us more. libvirt will require modification to support blkio controller. I also noticed that libvirt by default puts every virtual machine into its own cgroup. I think it might not be a very good strategy for blkio controller because putting every virtual machine in its own cgroup will kill overall throughput if each virtual machine is not driving enough IO. I am also trying to come up with some additional logic of letting go fairness if a group is not doing sufficient IO. Daniel, do you know where is the documentation which says what controllers are currently supported by libvirt. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Daniel P. Berrange on 16 Jul 2010 10:20 On Fri, Jul 16, 2010 at 09:43:53AM -0400, Vivek Goyal wrote: > On Thu, Jul 15, 2010 at 09:00:48AM +0900, KAMEZAWA Hiroyuki wrote: > > On Wed, 14 Jul 2010 10:29:19 -0400 > > Vivek Goyal <vgoyal(a)redhat.com> wrote: > > > > > > > > Cgroup's feature as mounting several subsystems at a mount point at once > > > > is very useful in many case. > > > > > > I agree that it is useful but if some controllers are not supporting > > > hierarchy, it just adds to more confusion. And later when hierarchy > > > support comes in, there will be additional issue of keeping this file > > > "use_hierarchy" like memory controller. > > > > > > So at this point of time , I am not too inclined towards allowing hierarchical > > > cgroup creation but treating them as flat in CFQ. I think it adds to the > > > confusion and user space should handle this situation. > > > > > > > Hmm. > > > > Could you fix error code in create blkio cgroup ? It returns -EINVAL now. > > IIUC, mkdir(2) doesn't return -EINVAL as error code (from man.) > > Then, it's very confusing. I think -EPERM or -ENOMEM will be much better. > > Hm..., Probably -EPERM is somewhat close to what we are doing. File system > does supoort creation of directories but not after certain level. > > I will trace more instances of mkdir error values. > > > > > Anyway, I need to see source code of blk-cgroup.c to know why libvirt fails > > to create cgroup. > > [CCing daniel berrange] > > AFAIK, libvirt does not have support for blkio controller yet. Are you > trying to introduce that? > > libvirt creates a direcotry tree. I think /cgroup/libvirt/qemu/kvm-dirs. > So actual virtual machine directors are 2-3 level below and that would > explain that if you try to use blkio controller with libvirt, it will fail > because it will not be able to create directories at that level. Yes, we use a hierarchy to deal with namespace uniqueness. The first step is to determine where libvirtd process is placed. This may be the root cgroup, but it may already be one or more levels down due to the init system (sysv-init, upstart, systemd etc) startup policy. Once that's determined we create a 'libvirt' cgroup which acts as container for everything run by libvirtd. At the next level is the driver name (qemu, lxc, uml). This allows confinement of all guests for a particular driver and gives us a unique namespace for the next level where we have a directory per guest. This last level is where libvirt actually sets tunables normally. The higher levels are for administrator use. $ROOT (where libvirtd process is, not the root mount point) | +- libvirt | +- qemu | | | +- guest1 | +- guest2 | +- guest3 | ... | +- lxc +- guest1 +- guest2 +- guest3 ... > I think libvirt need to special case blkio here to create directories in > top level. It is odd but really there are no easy answeres. Will we not > support a controller in libvirt till controller support hierarchy. We explicitly avoided creating anything at the top level. We always detect where the libvirtd process has been placed & only ever create stuff below that point. This ensures the host admin can set overall limits for virt on a host, and not have libvirt side-step these limits by jumping back upto the root cgroup. > > Where is the user-visible information (in RHEL or Fedora) > > about "you can't use blkio-cgroup via libvirt or libcgroup" ? > > [CCing balbir] > > I think with libcgroup you can use blkio controller. I know somebody > who was using cgexec command to launch some jobs in blkio cgroups. AFAIK, > libcgroup does not have too much controller specific state and should > not require any modifications for blkio controller. > > Balbir can tell us more. > > libvirt will require modification to support blkio controller. I also > noticed that libvirt by default puts every virtual machine into its > own cgroup. I think it might not be a very good strategy for blkio > controller because putting every virtual machine in its own cgroup > will kill overall throughput if each virtual machine is not driving > enough IO. A requirement todo everything in the top level and not use a hiearchy for blkio makes this a pretty unfriendly controller to use. It seriously limits flexibility of what libvirt and host administrators can do and means we can't effectively split poilicy between them. It also means that if the blkio contorller were ever mounted at same point as another controller, you'd loose the hierarchy support for that other controller IMHO use of the cgroups hiearchy is key to making cgroups managable for applications. We can't have many different applications on a system all having to create many directories at the top level. > I am also trying to come up with some additional logic of letting go > fairness if a group is not doing sufficient IO. > > Daniel, do you know where is the documentation which says what controllers > are currently supported by libvirt. We use cpu, cpuacct, cpuset, memory, devices & freezer currently. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on 16 Jul 2010 10:40 On Fri, Jul 16, 2010 at 03:15:49PM +0100, Daniel P. Berrange wrote: [..] > > libvirt will require modification to support blkio controller. I also > > noticed that libvirt by default puts every virtual machine into its > > own cgroup. I think it might not be a very good strategy for blkio > > controller because putting every virtual machine in its own cgroup > > will kill overall throughput if each virtual machine is not driving > > enough IO. > > A requirement todo everything in the top level and not use a hiearchy > for blkio makes this a pretty unfriendly controller to use. It seriously > limits flexibility of what libvirt and host administrators can do and > means we can't effectively split poilicy between them. It also means > that if the blkio contorller were ever mounted at same point as another > controller, you'd loose the hierarchy support for that other controller > IMHO use of the cgroups hiearchy is key to making cgroups managable for > applications. We can't have many different applications on a system > all having to create many directories at the top level. > I understand that not having hierarchical support is a huge limitation and in future I would like to be there. Just that at the moment provinding that support is hard as I am struggling with more basic issues which are more important. Secondly, just because some controller allows creation of hierarchy does not mean that hierarchy is being enforced. For example, memory controller. IIUC, one needs to explicitly set "use_hierarchy" to enforce hierarchy otherwise effectively it is flat. So if libvirt is creating groups and putting machines in child groups thinking that we are not interfering with admin's policy, is not entirely correct. So how do we make progress here. I really want to see blkio controller integrated with libvirt. About the issue of hierarchy, I can probably travel down the path of allowing creation of hierarchy but CFQ will treat it as flat. Though I don't like it because it will force me to introduce variables like "use_hierarchy" once real hierarchical support comes in but I guess I can live with that. (Anyway memory controller is already doing it.). There is another issue though and that is by default every virtual machine going into a group of its own. As of today, it can have severe performance penalties (depending on workload) if group is not driving doing enough IO. (Especially with group_isolation=1). I was thinking of a model where an admin moves out the bad virtual machines in separate group and limit their IO. Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Daniel P. Berrange on 16 Jul 2010 11:00 On Fri, Jul 16, 2010 at 10:35:36AM -0400, Vivek Goyal wrote: > On Fri, Jul 16, 2010 at 03:15:49PM +0100, Daniel P. Berrange wrote: > Secondly, just because some controller allows creation of hierarchy does > not mean that hierarchy is being enforced. For example, memory controller. > IIUC, one needs to explicitly set "use_hierarchy" to enforce hierarchy > otherwise effectively it is flat. So if libvirt is creating groups and > putting machines in child groups thinking that we are not interfering > with admin's policy, is not entirely correct. That is true, but that 'use_hierarchy' at least provides admins the mechanism required to implement the neccessary policy > So how do we make progress here. I really want to see blkio controller > integrated with libvirt. > > About the issue of hierarchy, I can probably travel down the path of allowing > creation of hierarchy but CFQ will treat it as flat. Though I don't like it > because it will force me to introduce variables like "use_hierarchy" once > real hierarchical support comes in but I guess I can live with that. > (Anyway memory controller is already doing it.). > > There is another issue though and that is by default every virtual > machine going into a group of its own. As of today, it can have > severe performance penalties (depending on workload) if group is not > driving doing enough IO. (Especially with group_isolation=1). > > I was thinking of a model where an admin moves out the bad virtual > machines in separate group and limit their IO. In the simple / normal case I imagine all guests VMs will be running unrestricted I/O initially. Thus instead of creating the cgroup at time of VM startup, we could create the cgroup only when the admin actually sets an I/O limit. IIUC, this should maintain the one cgroup per guest model, while avoiding the performance penalty in normal use. The caveat of course is that this would require blkio controller to have a dedicated mount point, not shared with other controller. I think we might also want this kind of model for net I/O, since we probably don't want to creating TC classes + net_cls groups for every VM the moment it starts unless the admin has actually set a net I/O limit. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on 16 Jul 2010 11:20
On Fri, Jul 16, 2010 at 03:53:09PM +0100, Daniel P. Berrange wrote: > On Fri, Jul 16, 2010 at 10:35:36AM -0400, Vivek Goyal wrote: > > On Fri, Jul 16, 2010 at 03:15:49PM +0100, Daniel P. Berrange wrote: > > Secondly, just because some controller allows creation of hierarchy does > > not mean that hierarchy is being enforced. For example, memory controller. > > IIUC, one needs to explicitly set "use_hierarchy" to enforce hierarchy > > otherwise effectively it is flat. So if libvirt is creating groups and > > putting machines in child groups thinking that we are not interfering > > with admin's policy, is not entirely correct. > > That is true, but that 'use_hierarchy' at least provides admins > the mechanism required to implement the neccessary policy > > > So how do we make progress here. I really want to see blkio controller > > integrated with libvirt. > > > > About the issue of hierarchy, I can probably travel down the path of allowing > > creation of hierarchy but CFQ will treat it as flat. Though I don't like it > > because it will force me to introduce variables like "use_hierarchy" once > > real hierarchical support comes in but I guess I can live with that. > > (Anyway memory controller is already doing it.). > > > > There is another issue though and that is by default every virtual > > machine going into a group of its own. As of today, it can have > > severe performance penalties (depending on workload) if group is not > > driving doing enough IO. (Especially with group_isolation=1). > > > > I was thinking of a model where an admin moves out the bad virtual > > machines in separate group and limit their IO. > > In the simple / normal case I imagine all guests VMs will be running > unrestricted I/O initially. Thus instead of creating the cgroup at time > of VM startup, we could create the cgroup only when the admin actually > sets an I/O limit. That makes sense. Run all the virtual machines by default in root group and move out a virtual machine to a separate group of either low weight (if virtual machine is a bad one and driving lot of IO) or of higher weight (if we want to give more IO bw to this machine). > IIUC, this should maintain the one cgroup per guest > model, while avoiding the performance penalty in normal use. The caveat > of course is that this would require blkio controller to have a dedicated > mount point, not shared with other controller. Yes. Because for other controllers we seem to be putting virtual machines in separate cgroups by default at startup time. So it seems we will require a separate mount point here for blkio controller. > I think we might also > want this kind of model for net I/O, since we probably don't want to > creating TC classes + net_cls groups for every VM the moment it starts > unless the admin has actually set a net I/O limit. Looks like. So good, then network controller and blkio controller can share the this new mount point. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |