From: Vivek Goyal on
- Currently in CFQ there are many situations where don't know how
much time slice has been consumed by a queue. For example, all
the random reader/writer queues where we don't idle on
individual queues and we expire the queue either immediately
after the request dispatch.

- In this case time consumed by a queue is just a memory copy
operation. Actually time measurement is possible only if we
idle on a queue and allow dispatch from a queue for significant
amount of time.

- As of today, in such cases we calculate the time since the
dispatch from the queue started and charge all that time.
Generally this rounds to 1 jiffy but in some cases it can
be more. For example, if we are driving high request queue
depth and driver is too busy and does not ask for new
requests for 8-10 jiffies. In such cases, the active queue
gets charged very unfairly.

- So fundamentally, whole notion of charging for time slice
is valid only if we have been idling on the queue. Otherwise
in an NCQ queue, there might be other requests on the queue
and we can not do the time slice calculation.

- This patch tweaks the slice charging logic a bit so that
in the cases where we can't know the amount of time, we
start charging in terms of number of requests dispatched
(IOPS). This practically switching CFQ fairness model to
fairness in terms of IOPS with slice_idle=0.

- As of today this will primarily be useful only with
group_idle patches so that we get fairness in terms of
IOPS across groups. The idea is that on fast storage
one can run CFQ with slice_idle=0 and still get IO
controller working without losing too much of
throughput.

Signed-off-by: Vivek Goyal <vgoyal(a)redhat.com>
---
block/cfq-iosched.c | 24 +++++++++++++++++++++---
1 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 7982b83..f44064c 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -896,16 +896,34 @@ static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq)
* if there are mutiple queues in the group, each can dispatch
* a single request on seeky media and cause lots of seek time
* and group will never know it.
+ *
+ * If drive is NCQ and we are driving deep queue depths, then
+ * it is not reasonable to charge the slice since dispatch
+ * started because this time will include time taken by all
+ * the other requests in the queue.
+ *
+ * Actually there is no reasonable way to know the disk time
+ * here and we need to come up with some approximation. If
+ * disk is non NCQ, we should be driving request queue depth
+ * 1, then charge for time since dispatch start and this will
+ * account for seek time properly on seeky media. If request
+ * queue depth is high, then charge for number of requests
+ * dispatched from the queue. This will sort of becoming
+ * charging in terms of IOPS.
*/
- slice_used = max_t(unsigned, (jiffies - cfqq->dispatch_start),
- 1);
+ if (cfqq->cfqd->hw_tag == 0)
+ slice_used = max_t(unsigned,
+ (jiffies - cfqq->dispatch_start), 1);
+ else
+ slice_used = cfqq->slice_dispatch;
} else {
slice_used = jiffies - cfqq->slice_start;
if (slice_used > cfqq->allocated_slice)
slice_used = cfqq->allocated_slice;
}

- cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u", slice_used);
+ cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u, sl_disp=%u", slice_used,
+ cfqq->slice_dispatch);
return slice_used;
}

--
1.7.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/