[PATCH v2 0/6] CFS Bandwidth Control [Kernel]

Prev: [PATCH 2/2] asm-generic: Don't warn that atomic_t is only 24 bit
Next: RFC: p9auth: add p9auth fs

From: Paul Turner on 28 Apr 2010 07:20

Hi all,

Please find attached v2 of our proposed approach for bandwidth provisioning
under CFS. Bharata's original RFC motivating discussion on this topic can be
found at: http://lkml.org/lkml/2009/6/4/24

This is an evolution of our previous posting: http://lkml.org/lkml/2010/2/12/393
The improvements herein are incremental: hierarchal task tracking for better
load-balance under throttle conditions, statistics export for decision
guidance in user-space control systems, minor bugs fixed, and some code
clean-up.

The skeleton of our approach is as follows:
- As above we maintain a global pool, per-tg, pool of unassigned quota. On it
we track the bandwidth period, quota per period, and runtime remaining in the
current period. As bandwidth is used within a period it is decremented from
runtime. Runtime is currently synchronized using a spinlock, in the current
implementation there's no reason this couldn't be done using atomic ops
instead however the spinlock allows for a little more flexibility in
experimentation with other schemes.
- When a cfs_rq participating in a bandwidth constrained task_group executes it
acquires time in sysctl_sched_cfs_bandwidth_slice (default currently 10ms)
size chunks from the global pool, this synchronizes under rq->lock and is part
of the update_curr path.
- Throttled entities are dequeued immediately. Throttled entities are gated
from participating in the tree at the {enqueue, dequeue}_entity level.

More details on the motivation and approach, as well as performance benchmark
results can be found in the original posting.

One caveat that bears discussion is that this leads to an alternate
specification of bandwidth versus the sched_rt case. The defined bandwidth
becomes an absolute quantifier relative to the period and is agnostic of allowed
cpus.

Open-questions:
- Is there any value in having the slice be tunable at the task-group level?
- I suspect 5ms may be a better default slice value, however I have not had the
opportunity to verify this yet. There's also room for some dynamic range
here.

Acknowledgements:
We would like to thank Bharata B Rao and Dhaval Giani for discussion and their
original proposal, many elements in this patchset are directly inspired by
their original posting. Bharata has also been integral in the preparation of
this second version, providing valuable feedback and review.

Ken Chen also provided early review and comments.

Thanks,

- Paul and Nikhil
---

Nikhil Rao (1):
sched: add exports tracking cfs bandwidth control statistics

Paul Turner (5):
sched: introduce primitives to account for CFS bandwidth tracking
sched: accumulate per-cfs_rq cpu usage
sched: throttle cfs_rq entities which exceed their local quota
sched: unthrottle cfs_rq(s) who ran out of quota at period refresh
sched: hierarchical task accounting for FAIR_GROUP_SCHED

include/linux/sched.h | 4 +
init/Kconfig | 9 +
kernel/sched.c | 347 +++++++++++++++++++++++++++++++++++++++++++++----
kernel/sched_fair.c | 240 +++++++++++++++++++++++++++++++++-
kernel/sched_rt.c | 24 +--
kernel/sysctl.c | 10 +
6 files changed, 585 insertions(+), 49 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

|
Pages: 1
Prev: [PATCH 2/2] asm-generic: Don't warn that atomic_t is only 24 bit
Next: RFC: p9auth: add p9auth fs