IO scheduler based IO controller V10 [Kernel]

Prev: kernel : USB sound problem
Next: [PATCH 1/2] jsm: IRQ handlers doesn't need to have IRQ_DISABLED enabled

From: Ryo Tsuruta on 28 Sep 2009 03:40

Hi Vivek,

Vivek Goyal <vgoyal(a)redhat.com> wrote:
> > Because dm-ioband provides faireness in terms of how many IO requests
> > are issued or how many bytes are transferred, so this behaviour is to
> > be expected. Do you think fairness in terms of IO requests and size is
> > not fair?
> >
>
> Hi Ryo,
>
> Fairness in terms of size of IO or number of requests is probably not the
> best thing to do on rotational media where seek latencies are significant.
>
> It probably should work just well on media with very low seek latencies
> like SSD.
>
> So on rotational media, either you will not provide fairness to random
> readers because they are too slow or you will choke the sequential readers
> in other group and also bring down the overall disk throughput.
>
> If you don't decide to choke/throttle sequential reader group for the sake
> of random reader in other group then you will not have a good control
> on random reader latencies. Because now IO scheduler sees the IO from both
> sequential reader as well as random reader and sequential readers have not
> been throttled. So the dispatch pattern/time slices will again look like..
>
> SR1 SR2 SR3 SR4 SR5 RR.....
>
> instead of
>
> SR1 RR SR2 RR SR3 RR SR4 RR ....
>
> SR --> sequential reader, RR --> random reader

Thank you for elaborating. However, I think that fairness in terms of
disk time has a similar problem. The below is a benchmark result of
randread vs seqread I posted before, rand-readers and seq-readers ran
on individual groups and their weights were equally assigned.

Throughput [KiB/s]
io-controller dm-ioband
randread 161 314
seqread 9556 631

I know that dm-ioband is needed to improvement on the seqread
throughput, but I don't think that io-controller seems quite fair,
even the disk times of each group are equal, why randread can't get
more bandwidth. So I think that this is how users think about
faireness, and it would be good thing to provide multiple policies of
bandwidth control for uses.

> > The write-starve-reads on dm-ioband, that you pointed out before, was
> > not caused by FIFO release, it was caused by IO flow control in
> > dm-ioband. When I turned off the flow control, then the read
> > throughput was quite improved.
>
> What was flow control doing?

dm-ioband gives a limit on each IO group. When the number of IO
requests backlogged in a group exceeds the limit, processes which are
going to issue IO requests to the group are made sleep until all the
backlogged requests are flushed out.

> > Now I'm considering separating dm-ioband's internal queue into sync
> > and async and giving a certain priority of dispatch to async IOs.
>
> Even if you maintain separate queues for sync and async, in what ratio will
> you dispatch reads and writes to underlying layer once fresh tokens become
> available to the group and you decide to unthrottle the group.

Now I'm thinking that It's according to the requested order, but
when the number of in-flight sync IOs exceeds io_limit (io_limit is
calculated based on nr_requests of underlying block device), dm-ioband
dispatches only async IOs until the number of in-flight sync IOs are
below the io_limit, and vice versa. At least it could solve the
write-starve-read issue which you pointed out.

> Whatever policy you adopt for read and write dispatch, it might not match
> with policy of underlying IO scheduler because every IO scheduler seems to
> have its own way of determining how reads and writes should be dispatched.

I think that this is a matter of users choise, which a user would
like to give priority to bandwidth or IO scheduler's policy.

> Now somebody might start complaining that my job inside the group is not
> getting same reader/writer ratio as it was getting outside the group.
>
> Thanks
> Vivek

Thanks,
Ryo Tsuruta
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ryo Tsuruta on 28 Sep 2009 03:40

Hi Rik,

Rik van Riel <riel(a)redhat.com> wrote:
> Ryo Tsuruta wrote:
>
> > Because dm-ioband provides faireness in terms of how many IO requests
> > are issued or how many bytes are transferred, so this behaviour is to
> > be expected. Do you think fairness in terms of IO requests and size is
> > not fair?
>
> When there are two workloads competing for the same
> resources, I would expect each of the workloads to
> run at about 50% of the speed at which it would run
> on an uncontended system.
>
> Having one of the workloads run at 95% of the
> uncontended speed and the other workload at 5%
> is "not fair" (to put it diplomatically).

As I wrote in the mail to Vivek, I think that providing multiple
policies, on a per disk time basis, on a per iosize basis, maximum
rate limiting or etc would be good for users.

Thanks,
Ryo Tsuruta
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Vivek Goyal on 28 Sep 2009 11:00

On Sun, Sep 27, 2009 at 07:00:08PM +0200, Corrado Zoccolo wrote:
> Hi Vivek,
> On Fri, Sep 25, 2009 at 10:26 PM, Vivek Goyal <vgoyal(a)redhat.com> wrote:
> > On Fri, Sep 25, 2009 at 04:20:14AM +0200, Ulrich Lukas wrote:
> >> Vivek Goyal wrote:
> >> > Notes:
> >> > - With vanilla CFQ, random writers can overwhelm a random reader.
> >> > � Bring down its throughput and bump up latencies significantly.
> >>
> >>
> >> IIRC, with vanilla CFQ, sequential writing can overwhelm random readers,
> >> too.
> >>
> >> I'm basing this assumption on the observations I made on both OpenSuse
> >> 11.1 and Ubuntu 9.10 alpha6 which I described in my posting on LKML
> >> titled: "Poor desktop responsiveness with background I/O-operations" of
> >> 2009-09-20.
> >> (Message ID: 4AB59CBB.8090907(a)datenparkplatz.de)
> >>
> >>
> >> Thus, I'm posting this to show that your work is greatly appreciated,
> >> given the rather disappointig status quo of Linux's fairness when it
> >> comes to disk IO time.
> >>
> >> I hope that your efforts lead to a change in performance of current
> >> userland applications, the sooner, the better.
> >>
> > [Please don't remove people from original CC list. I am putting them back.]
> >
> > Hi Ulrich,
> >
> > I quicky went through that mail thread and I tried following on my
> > desktop.
> >
> > ##########################################
> > dd if=/home/vgoyal/4G-file of=/dev/null &
> > sleep 5
> > time firefox
> > # close firefox once gui pops up.
> > ##########################################
> >
> > It was taking close to 1 minute 30 seconds to launch firefox and dd got
> > following.
> >
> > 4294967296 bytes (4.3 GB) copied, 100.602 s, 42.7 MB/s
> >
> > (Results do vary across runs, especially if system is booted fresh. Don't
> > �know why...).
> >
> >
> > Then I tried putting both the applications in separate groups and assign
> > them weights 200 each.
> >
> > ##########################################
> > dd if=/home/vgoyal/4G-file of=/dev/null &
> > echo $! > /cgroup/io/test1/tasks
> > sleep 5
> > echo $$ > /cgroup/io/test2/tasks
> > time firefox
> > # close firefox once gui pops up.
> > ##########################################
> >
> > Now I firefox pops up in 27 seconds. So it cut down the time by 2/3.
> >
> > 4294967296 bytes (4.3 GB) copied, 84.6138 s, 50.8 MB/s
> >
> > Notice that throughput of dd also improved.
> >
> > I ran the block trace and noticed in many a cases firefox threads
> > immediately preempted the "dd". Probably because it was a file system
> > request. So in this case latency will arise from seek time.
> >
> > In some other cases, threads had to wait for up to 100ms because dd was
> > not preempted. In this case latency will arise both from waiting on queue
> > as well as seek time.
>
> I think cfq should already be doing something similar, i.e. giving
> 100ms slices to firefox, that alternate with dd, unless:
> * firefox is too seeky (in this case, the idle window will be too small)
> * firefox has too much think time.
>

Hi Corrado,

"firefox" is the shell script to setup the environment and launch the
broser. It seems to be a group of threads. Some of them run in parallel
and some of these seems to be running one after the other (once previous
process or threads finished).

> To rule out the first case, what happens if you run the test with your
> "fairness for seeky processes" patch?

I applied that patch and it helps a lot.

http://lwn.net/Articles/341032/

With above patchset applied, and fairness=1, firefox pops up in 27-28
seconds.

So it looks like if we don't disable idle window for seeky processes on
hardware supporting command queuing, it helps in this particular case.

Thanks
Vivek

> To rule out the second case, what happens if you increase the slice_idle?
>
> Thanks,
> Corrado
>
> >
> > With cgroup thing, We will run 100ms slice for the group in which firefox
> > is being launched and then give 100ms uninterrupted time slice to dd. So
> > it should cut down on number of seeks happening and that's why we probably
> > see this improvement.
> >
> > So grouping can help in such cases. May be you can move your X session in
> > one group and launch the big IO in other group. Most likely you should
> > have better desktop experience without compromising on dd thread output.
>
> > Thanks
> > Vivek
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo(a)vger.kernel.org
> > More majordomo info at �http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at �http://www.tux.org/lkml/
> >
>
>
>
> --
> __________________________________________________________________________
>
> dott. Corrado Zoccolo mailto:czoccolo(a)gmail.com
> PhD - Department of Computer Science - University of Pisa, Italy
> --------------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Corrado Zoccolo on 28 Sep 2009 11:40

On Mon, Sep 28, 2009 at 4:56 PM, Vivek Goyal <vgoyal(a)redhat.com> wrote:
> On Sun, Sep 27, 2009 at 07:00:08PM +0200, Corrado Zoccolo wrote:
>> Hi Vivek,
>> On Fri, Sep 25, 2009 at 10:26 PM, Vivek Goyal <vgoyal(a)redhat.com> wrote:
>> > On Fri, Sep 25, 2009 at 04:20:14AM +0200, Ulrich Lukas wrote:
>> >> Vivek Goyal wrote:
>> >> > Notes:
>> >> > - With vanilla CFQ, random writers can overwhelm a random reader.
>> >> > Â Bring down its throughput and bump up latencies significantly.
>> >>
>> >>
>> >> IIRC, with vanilla CFQ, sequential writing can overwhelm random readers,
>> >> too.
>> >>
>> >> I'm basing this assumption on the observations I made on both OpenSuse
>> >> 11.1 and Ubuntu 9.10 alpha6 which I described in my posting on LKML
>> >> titled: "Poor desktop responsiveness with background I/O-operations" of
>> >> 2009-09-20.
>> >> (Message ID: 4AB59CBB.8090907(a)datenparkplatz.de)
>> >>
>> >>
>> >> Thus, I'm posting this to show that your work is greatly appreciated,
>> >> given the rather disappointig status quo of Linux's fairness when it
>> >> comes to disk IO time.
>> >>
>> >> I hope that your efforts lead to a change in performance of current
>> >> userland applications, the sooner, the better.
>> >>
>> > [Please don't remove people from original CC list. I am putting them back.]
>> >
>> > Hi Ulrich,
>> >
>> > I quicky went through that mail thread and I tried following on my
>> > desktop.
>> >
>> > ##########################################
>> > dd if=/home/vgoyal/4G-file of=/dev/null &
>> > sleep 5
>> > time firefox
>> > # close firefox once gui pops up.
>> > ##########################################
>> >
>> > It was taking close to 1 minute 30 seconds to launch firefox and dd got
>> > following.
>> >
>> > 4294967296 bytes (4.3 GB) copied, 100.602 s, 42.7 MB/s
>> >
>> > (Results do vary across runs, especially if system is booted fresh. Don't
>> > Â know why...).
>> >
>> >
>> > Then I tried putting both the applications in separate groups and assign
>> > them weights 200 each.
>> >
>> > ##########################################
>> > dd if=/home/vgoyal/4G-file of=/dev/null &
>> > echo $! > /cgroup/io/test1/tasks
>> > sleep 5
>> > echo $$ > /cgroup/io/test2/tasks
>> > time firefox
>> > # close firefox once gui pops up.
>> > ##########################################
>> >
>> > Now I firefox pops up in 27 seconds. So it cut down the time by 2/3.
>> >
>> > 4294967296 bytes (4.3 GB) copied, 84.6138 s, 50.8 MB/s
>> >
>> > Notice that throughput of dd also improved.
>> >
>> > I ran the block trace and noticed in many a cases firefox threads
>> > immediately preempted the "dd". Probably because it was a file system
>> > request. So in this case latency will arise from seek time.
>> >
>> > In some other cases, threads had to wait for up to 100ms because dd was
>> > not preempted. In this case latency will arise both from waiting on queue
>> > as well as seek time.
>>
>> I think cfq should already be doing something similar, i.e. giving
>> 100ms slices to firefox, that alternate with dd, unless:
>> * firefox is too seeky (in this case, the idle window will be too small)
>> * firefox has too much think time.
>>
>
Hi Vivek,
> Hi Corrado,
>
> "firefox" is the shell script to setup the environment and launch the
> broser. It seems to be a group of threads. Some of them run in parallel
> and some of these seems to be running one after the other (once previous
> process or threads finished).

Ok.

>
>> To rule out the first case, what happens if you run the test with your
>> "fairness for seeky processes" patch?
>
> I applied that patch and it helps a lot.
>
> http://lwn.net/Articles/341032/
>
> With above patchset applied, and fairness=1, firefox pops up in 27-28 seconds.

Great.
Can you try the attached patch (on top of 2.6.31)?
It implements the alternative approach we discussed privately in july,
and it addresses the possible latency increase that could happen with
your patch.

To summarize for everyone, we separate sync sequential queues, sync
seeky queues and async queues in three separate RR strucutres, and
alternate servicing requests between them.

When servicing seeky queues (the ones that are usually penalized by
cfq, for which no fairness is usually provided), we do not idle
between them, but we do idle for the last queue (the idle can be
exited when any seeky queue has requests). This allows us to allocate
disk time globally for all seeky processes, and to reduce seeky
processes latencies.

I tested with 'konsole -e exit', while doing a sequential write with
dd, and the start up time reduced from 37s to 7s, on an old laptop
disk.

Thanks,
Corrado

>
>> To rule out the first case, what happens if you run the test with your
>> "fairness for seeky processes" patch?
>
> I applied that patch and it helps a lot.
>
> http://lwn.net/Articles/341032/
>
> With above patchset applied, and fairness=1, firefox pops up in 27-28
> seconds.
>
> So it looks like if we don't disable idle window for seeky processes on
> hardware supporting command queuing, it helps in this particular case.
>
> Thanks
> Vivek
>

From: Vivek Goyal on 28 Sep 2009 13:20

On Mon, Sep 28, 2009 at 05:35:02PM +0200, Corrado Zoccolo wrote:
> On Mon, Sep 28, 2009 at 4:56 PM, Vivek Goyal <vgoyal(a)redhat.com> wrote:
> > On Sun, Sep 27, 2009 at 07:00:08PM +0200, Corrado Zoccolo wrote:
> >> Hi Vivek,
> >> On Fri, Sep 25, 2009 at 10:26 PM, Vivek Goyal <vgoyal(a)redhat.com> wrote:
> >> > On Fri, Sep 25, 2009 at 04:20:14AM +0200, Ulrich Lukas wrote:
> >> >> Vivek Goyal wrote:
> >> >> > Notes:
> >> >> > - With vanilla CFQ, random writers can overwhelm a random reader.
> >> >> > � Bring down its throughput and bump up latencies significantly.
> >> >>
> >> >>
> >> >> IIRC, with vanilla CFQ, sequential writing can overwhelm random readers,
> >> >> too.
> >> >>
> >> >> I'm basing this assumption on the observations I made on both OpenSuse
> >> >> 11.1 and Ubuntu 9.10 alpha6 which I described in my posting on LKML
> >> >> titled: "Poor desktop responsiveness with background I/O-operations" of
> >> >> 2009-09-20.
> >> >> (Message ID: 4AB59CBB.8090907(a)datenparkplatz.de)
> >> >>
> >> >>
> >> >> Thus, I'm posting this to show that your work is greatly appreciated,
> >> >> given the rather disappointig status quo of Linux's fairness when it
> >> >> comes to disk IO time.
> >> >>
> >> >> I hope that your efforts lead to a change in performance of current
> >> >> userland applications, the sooner, the better.
> >> >>
> >> > [Please don't remove people from original CC list. I am putting them back.]
> >> >
> >> > Hi Ulrich,
> >> >
> >> > I quicky went through that mail thread and I tried following on my
> >> > desktop.
> >> >
> >> > ##########################################
> >> > dd if=/home/vgoyal/4G-file of=/dev/null &
> >> > sleep 5
> >> > time firefox
> >> > # close firefox once gui pops up.
> >> > ##########################################
> >> >
> >> > It was taking close to 1 minute 30 seconds to launch firefox and dd got
> >> > following.
> >> >
> >> > 4294967296 bytes (4.3 GB) copied, 100.602 s, 42.7 MB/s
> >> >
> >> > (Results do vary across runs, especially if system is booted fresh. Don't
> >> > �know why...).
> >> >
> >> >
> >> > Then I tried putting both the applications in separate groups and assign
> >> > them weights 200 each.
> >> >
> >> > ##########################################
> >> > dd if=/home/vgoyal/4G-file of=/dev/null &
> >> > echo $! > /cgroup/io/test1/tasks
> >> > sleep 5
> >> > echo $$ > /cgroup/io/test2/tasks
> >> > time firefox
> >> > # close firefox once gui pops up.
> >> > ##########################################
> >> >
> >> > Now I firefox pops up in 27 seconds. So it cut down the time by 2/3.
> >> >
> >> > 4294967296 bytes (4.3 GB) copied, 84.6138 s, 50.8 MB/s
> >> >
> >> > Notice that throughput of dd also improved.
> >> >
> >> > I ran the block trace and noticed in many a cases firefox threads
> >> > immediately preempted the "dd". Probably because it was a file system
> >> > request. So in this case latency will arise from seek time.
> >> >
> >> > In some other cases, threads had to wait for up to 100ms because dd was
> >> > not preempted. In this case latency will arise both from waiting on queue
> >> > as well as seek time.
> >>
> >> I think cfq should already be doing something similar, i.e. giving
> >> 100ms slices to firefox, that alternate with dd, unless:
> >> * firefox is too seeky (in this case, the idle window will be too small)
> >> * firefox has too much think time.
> >>
> >
> Hi Vivek,
> > Hi Corrado,
> >
> > "firefox" is the shell script to setup the environment and launch the
> > broser. It seems to be a group of threads. Some of them run in parallel
> > and some of these seems to be running one after the other (once previous
> > process or threads finished).
>
> Ok.
>
> >
> >> To rule out the first case, what happens if you run the test with your
> >> "fairness for seeky processes" patch?
> >
> > I applied that patch and it helps a lot.
> >
> > http://lwn.net/Articles/341032/
> >
> > With above patchset applied, and fairness=1, firefox pops up in 27-28 seconds.
>
> Great.
> Can you try the attached patch (on top of 2.6.31)?
> It implements the alternative approach we discussed privately in july,
> and it addresses the possible latency increase that could happen with
> your patch.
>
> To summarize for everyone, we separate sync sequential queues, sync
> seeky queues and async queues in three separate RR strucutres, and
> alternate servicing requests between them.
>
> When servicing seeky queues (the ones that are usually penalized by
> cfq, for which no fairness is usually provided), we do not idle
> between them, but we do idle for the last queue (the idle can be
> exited when any seeky queue has requests). This allows us to allocate
> disk time globally for all seeky processes, and to reduce seeky
> processes latencies.
>

Ok, I seem to be doing same thing at group level (In group scheduling
patches). I do not idle on individual sync seeky queues but if this is
last queue in the group, then I do idle to make sure group does not loose
its fair share and exit from idle the moment there is any busy queue in
the group.

So you seem to be grouping all the sync seeky queues system wide in a
single group. So all the sync seeky queues collectively get 100ms in a
single round of dispatch? I am wondering what happens if there are lot
of such sync seeky queues this 100ms time slice is consumed before all the
sync seeky queues got a chance to dispatch. Does that mean that some of
the queues can completely skip the one dispatch round?

Thanks
Vivek

> I tested with 'konsole -e exit', while doing a sequential write with
> dd, and the start up time reduced from 37s to 7s, on an old laptop
> disk.
>
> Thanks,
> Corrado
>
> >
> >> To rule out the first case, what happens if you run the test with your
> >> "fairness for seeky processes" patch?
> >
> > I applied that patch and it helps a lot.
> >
> > http://lwn.net/Articles/341032/
> >
> > With above patchset applied, and fairness=1, firefox pops up in 27-28
> > seconds.
> >
> > So it looks like if we don't disable idle window for seeky processes on
> > hardware supporting command queuing, it helps in this particular case.
> >
> > Thanks
> > Vivek
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Prev: kernel : USB sound problem
Next: [PATCH 1/2] jsm: IRQ handlers doesn't need to have IRQ_DISABLED enabled