Prev: kernel : USB sound problem
Next: [PATCH 1/2] jsm: IRQ handlers doesn't need to have IRQ_DISABLED enabled
From: Ryo Tsuruta on 28 Sep 2009 03:40 Hi Vivek, Vivek Goyal <vgoyal(a)redhat.com> wrote: > > Because dm-ioband provides faireness in terms of how many IO requests > > are issued or how many bytes are transferred, so this behaviour is to > > be expected. Do you think fairness in terms of IO requests and size is > > not fair? > > > > Hi Ryo, > > Fairness in terms of size of IO or number of requests is probably not the > best thing to do on rotational media where seek latencies are significant. > > It probably should work just well on media with very low seek latencies > like SSD. > > So on rotational media, either you will not provide fairness to random > readers because they are too slow or you will choke the sequential readers > in other group and also bring down the overall disk throughput. > > If you don't decide to choke/throttle sequential reader group for the sake > of random reader in other group then you will not have a good control > on random reader latencies. Because now IO scheduler sees the IO from both > sequential reader as well as random reader and sequential readers have not > been throttled. So the dispatch pattern/time slices will again look like.. > > SR1 SR2 SR3 SR4 SR5 RR..... > > instead of > > SR1 RR SR2 RR SR3 RR SR4 RR .... > > SR --> sequential reader, RR --> random reader Thank you for elaborating. However, I think that fairness in terms of disk time has a similar problem. The below is a benchmark result of randread vs seqread I posted before, rand-readers and seq-readers ran on individual groups and their weights were equally assigned. Throughput [KiB/s] io-controller dm-ioband randread 161 314 seqread 9556 631 I know that dm-ioband is needed to improvement on the seqread throughput, but I don't think that io-controller seems quite fair, even the disk times of each group are equal, why randread can't get more bandwidth. So I think that this is how users think about faireness, and it would be good thing to provide multiple policies of bandwidth control for uses. > > The write-starve-reads on dm-ioband, that you pointed out before, was > > not caused by FIFO release, it was caused by IO flow control in > > dm-ioband. When I turned off the flow control, then the read > > throughput was quite improved. > > What was flow control doing? dm-ioband gives a limit on each IO group. When the number of IO requests backlogged in a group exceeds the limit, processes which are going to issue IO requests to the group are made sleep until all the backlogged requests are flushed out. > > Now I'm considering separating dm-ioband's internal queue into sync > > and async and giving a certain priority of dispatch to async IOs. > > Even if you maintain separate queues for sync and async, in what ratio will > you dispatch reads and writes to underlying layer once fresh tokens become > available to the group and you decide to unthrottle the group. Now I'm thinking that It's according to the requested order, but when the number of in-flight sync IOs exceeds io_limit (io_limit is calculated based on nr_requests of underlying block device), dm-ioband dispatches only async IOs until the number of in-flight sync IOs are below the io_limit, and vice versa. At least it could solve the write-starve-read issue which you pointed out. > Whatever policy you adopt for read and write dispatch, it might not match > with policy of underlying IO scheduler because every IO scheduler seems to > have its own way of determining how reads and writes should be dispatched. I think that this is a matter of users choise, which a user would like to give priority to bandwidth or IO scheduler's policy. > Now somebody might start complaining that my job inside the group is not > getting same reader/writer ratio as it was getting outside the group. > > Thanks > Vivek Thanks, Ryo Tsuruta -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Ryo Tsuruta on 28 Sep 2009 03:40 Hi Rik, Rik van Riel <riel(a)redhat.com> wrote: > Ryo Tsuruta wrote: > > > Because dm-ioband provides faireness in terms of how many IO requests > > are issued or how many bytes are transferred, so this behaviour is to > > be expected. Do you think fairness in terms of IO requests and size is > > not fair? > > When there are two workloads competing for the same > resources, I would expect each of the workloads to > run at about 50% of the speed at which it would run > on an uncontended system. > > Having one of the workloads run at 95% of the > uncontended speed and the other workload at 5% > is "not fair" (to put it diplomatically). As I wrote in the mail to Vivek, I think that providing multiple policies, on a per disk time basis, on a per iosize basis, maximum rate limiting or etc would be good for users. Thanks, Ryo Tsuruta -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on 28 Sep 2009 11:00 On Sun, Sep 27, 2009 at 07:00:08PM +0200, Corrado Zoccolo wrote: > Hi Vivek, > On Fri, Sep 25, 2009 at 10:26 PM, Vivek Goyal <vgoyal(a)redhat.com> wrote: > > On Fri, Sep 25, 2009 at 04:20:14AM +0200, Ulrich Lukas wrote: > >> Vivek Goyal wrote: > >> > Notes: > >> > - With vanilla CFQ, random writers can overwhelm a random reader. > >> > � Bring down its throughput and bump up latencies significantly. > >> > >> > >> IIRC, with vanilla CFQ, sequential writing can overwhelm random readers, > >> too. > >> > >> I'm basing this assumption on the observations I made on both OpenSuse > >> 11.1 and Ubuntu 9.10 alpha6 which I described in my posting on LKML > >> titled: "Poor desktop responsiveness with background I/O-operations" of > >> 2009-09-20. > >> (Message ID: 4AB59CBB.8090907(a)datenparkplatz.de) > >> > >> > >> Thus, I'm posting this to show that your work is greatly appreciated, > >> given the rather disappointig status quo of Linux's fairness when it > >> comes to disk IO time. > >> > >> I hope that your efforts lead to a change in performance of current > >> userland applications, the sooner, the better. > >> > > [Please don't remove people from original CC list. I am putting them back.] > > > > Hi Ulrich, > > > > I quicky went through that mail thread and I tried following on my > > desktop. > > > > ########################################## > > dd if=/home/vgoyal/4G-file of=/dev/null & > > sleep 5 > > time firefox > > # close firefox once gui pops up. > > ########################################## > > > > It was taking close to 1 minute 30 seconds to launch firefox and dd got > > following. > > > > 4294967296 bytes (4.3 GB) copied, 100.602 s, 42.7 MB/s > > > > (Results do vary across runs, especially if system is booted fresh. Don't > > �know why...). > > > > > > Then I tried putting both the applications in separate groups and assign > > them weights 200 each. > > > > ########################################## > > dd if=/home/vgoyal/4G-file of=/dev/null & > > echo $! > /cgroup/io/test1/tasks > > sleep 5 > > echo $$ > /cgroup/io/test2/tasks > > time firefox > > # close firefox once gui pops up. > > ########################################## > > > > Now I firefox pops up in 27 seconds. So it cut down the time by 2/3. > > > > 4294967296 bytes (4.3 GB) copied, 84.6138 s, 50.8 MB/s > > > > Notice that throughput of dd also improved. > > > > I ran the block trace and noticed in many a cases firefox threads > > immediately preempted the "dd". Probably because it was a file system > > request. So in this case latency will arise from seek time. > > > > In some other cases, threads had to wait for up to 100ms because dd was > > not preempted. In this case latency will arise both from waiting on queue > > as well as seek time. > > I think cfq should already be doing something similar, i.e. giving > 100ms slices to firefox, that alternate with dd, unless: > * firefox is too seeky (in this case, the idle window will be too small) > * firefox has too much think time. > Hi Corrado, "firefox" is the shell script to setup the environment and launch the broser. It seems to be a group of threads. Some of them run in parallel and some of these seems to be running one after the other (once previous process or threads finished). > To rule out the first case, what happens if you run the test with your > "fairness for seeky processes" patch? I applied that patch and it helps a lot. http://lwn.net/Articles/341032/ With above patchset applied, and fairness=1, firefox pops up in 27-28 seconds. So it looks like if we don't disable idle window for seeky processes on hardware supporting command queuing, it helps in this particular case. Thanks Vivek > To rule out the second case, what happens if you increase the slice_idle? > > Thanks, > Corrado > > > > > With cgroup thing, We will run 100ms slice for the group in which firefox > > is being launched and then give 100ms uninterrupted time slice to dd. So > > it should cut down on number of seeks happening and that's why we probably > > see this improvement. > > > > So grouping can help in such cases. May be you can move your X session in > > one group and launch the big IO in other group. Most likely you should > > have better desktop experience without compromising on dd thread output. > > > Thanks > > Vivek > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo(a)vger.kernel.org > > More majordomo info at �http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at �http://www.tux.org/lkml/ > > > > > > -- > __________________________________________________________________________ > > dott. Corrado Zoccolo mailto:czoccolo(a)gmail.com > PhD - Department of Computer Science - University of Pisa, Italy > -------------------------------------------------------------------------- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Corrado Zoccolo on 28 Sep 2009 11:40 On Mon, Sep 28, 2009 at 4:56 PM, Vivek Goyal <vgoyal(a)redhat.com> wrote: > On Sun, Sep 27, 2009 at 07:00:08PM +0200, Corrado Zoccolo wrote: >> Hi Vivek, >> On Fri, Sep 25, 2009 at 10:26 PM, Vivek Goyal <vgoyal(a)redhat.com> wrote: >> > On Fri, Sep 25, 2009 at 04:20:14AM +0200, Ulrich Lukas wrote: >> >> Vivek Goyal wrote: >> >> > Notes: >> >> > - With vanilla CFQ, random writers can overwhelm a random reader. >> >> > Â Bring down its throughput and bump up latencies significantly. >> >> >> >> >> >> IIRC, with vanilla CFQ, sequential writing can overwhelm random readers, >> >> too. >> >> >> >> I'm basing this assumption on the observations I made on both OpenSuse >> >> 11.1 and Ubuntu 9.10 alpha6 which I described in my posting on LKML >> >> titled: "Poor desktop responsiveness with background I/O-operations" of >> >> 2009-09-20. >> >> (Message ID: 4AB59CBB.8090907(a)datenparkplatz.de) >> >> >> >> >> >> Thus, I'm posting this to show that your work is greatly appreciated, >> >> given the rather disappointig status quo of Linux's fairness when it >> >> comes to disk IO time. >> >> >> >> I hope that your efforts lead to a change in performance of current >> >> userland applications, the sooner, the better. >> >> >> > [Please don't remove people from original CC list. I am putting them back.] >> > >> > Hi Ulrich, >> > >> > I quicky went through that mail thread and I tried following on my >> > desktop. >> > >> > ########################################## >> > dd if=/home/vgoyal/4G-file of=/dev/null & >> > sleep 5 >> > time firefox >> > # close firefox once gui pops up. >> > ########################################## >> > >> > It was taking close to 1 minute 30 seconds to launch firefox and dd got >> > following. >> > >> > 4294967296 bytes (4.3 GB) copied, 100.602 s, 42.7 MB/s >> > >> > (Results do vary across runs, especially if system is booted fresh. Don't >> > Â know why...). >> > >> > >> > Then I tried putting both the applications in separate groups and assign >> > them weights 200 each. >> > >> > ########################################## >> > dd if=/home/vgoyal/4G-file of=/dev/null & >> > echo $! > /cgroup/io/test1/tasks >> > sleep 5 >> > echo $$ > /cgroup/io/test2/tasks >> > time firefox >> > # close firefox once gui pops up. >> > ########################################## >> > >> > Now I firefox pops up in 27 seconds. So it cut down the time by 2/3. >> > >> > 4294967296 bytes (4.3 GB) copied, 84.6138 s, 50.8 MB/s >> > >> > Notice that throughput of dd also improved. >> > >> > I ran the block trace and noticed in many a cases firefox threads >> > immediately preempted the "dd". Probably because it was a file system >> > request. So in this case latency will arise from seek time. >> > >> > In some other cases, threads had to wait for up to 100ms because dd was >> > not preempted. In this case latency will arise both from waiting on queue >> > as well as seek time. >> >> I think cfq should already be doing something similar, i.e. giving >> 100ms slices to firefox, that alternate with dd, unless: >> * firefox is too seeky (in this case, the idle window will be too small) >> * firefox has too much think time. >> > Hi Vivek, > Hi Corrado, > > "firefox" is the shell script to setup the environment and launch the > broser. It seems to be a group of threads. Some of them run in parallel > and some of these seems to be running one after the other (once previous > process or threads finished). Ok. > >> To rule out the first case, what happens if you run the test with your >> "fairness for seeky processes" patch? > > I applied that patch and it helps a lot. > > http://lwn.net/Articles/341032/ > > With above patchset applied, and fairness=1, firefox pops up in 27-28 seconds. Great. Can you try the attached patch (on top of 2.6.31)? It implements the alternative approach we discussed privately in july, and it addresses the possible latency increase that could happen with your patch. To summarize for everyone, we separate sync sequential queues, sync seeky queues and async queues in three separate RR strucutres, and alternate servicing requests between them. When servicing seeky queues (the ones that are usually penalized by cfq, for which no fairness is usually provided), we do not idle between them, but we do idle for the last queue (the idle can be exited when any seeky queue has requests). This allows us to allocate disk time globally for all seeky processes, and to reduce seeky processes latencies. I tested with 'konsole -e exit', while doing a sequential write with dd, and the start up time reduced from 37s to 7s, on an old laptop disk. Thanks, Corrado > >> To rule out the first case, what happens if you run the test with your >> "fairness for seeky processes" patch? > > I applied that patch and it helps a lot. > > http://lwn.net/Articles/341032/ > > With above patchset applied, and fairness=1, firefox pops up in 27-28 > seconds. > > So it looks like if we don't disable idle window for seeky processes on > hardware supporting command queuing, it helps in this particular case. > > Thanks > Vivek >
From: Vivek Goyal on 28 Sep 2009 13:20
On Mon, Sep 28, 2009 at 05:35:02PM +0200, Corrado Zoccolo wrote: > On Mon, Sep 28, 2009 at 4:56 PM, Vivek Goyal <vgoyal(a)redhat.com> wrote: > > On Sun, Sep 27, 2009 at 07:00:08PM +0200, Corrado Zoccolo wrote: > >> Hi Vivek, > >> On Fri, Sep 25, 2009 at 10:26 PM, Vivek Goyal <vgoyal(a)redhat.com> wrote: > >> > On Fri, Sep 25, 2009 at 04:20:14AM +0200, Ulrich Lukas wrote: > >> >> Vivek Goyal wrote: > >> >> > Notes: > >> >> > - With vanilla CFQ, random writers can overwhelm a random reader. > >> >> > � Bring down its throughput and bump up latencies significantly. > >> >> > >> >> > >> >> IIRC, with vanilla CFQ, sequential writing can overwhelm random readers, > >> >> too. > >> >> > >> >> I'm basing this assumption on the observations I made on both OpenSuse > >> >> 11.1 and Ubuntu 9.10 alpha6 which I described in my posting on LKML > >> >> titled: "Poor desktop responsiveness with background I/O-operations" of > >> >> 2009-09-20. > >> >> (Message ID: 4AB59CBB.8090907(a)datenparkplatz.de) > >> >> > >> >> > >> >> Thus, I'm posting this to show that your work is greatly appreciated, > >> >> given the rather disappointig status quo of Linux's fairness when it > >> >> comes to disk IO time. > >> >> > >> >> I hope that your efforts lead to a change in performance of current > >> >> userland applications, the sooner, the better. > >> >> > >> > [Please don't remove people from original CC list. I am putting them back.] > >> > > >> > Hi Ulrich, > >> > > >> > I quicky went through that mail thread and I tried following on my > >> > desktop. > >> > > >> > ########################################## > >> > dd if=/home/vgoyal/4G-file of=/dev/null & > >> > sleep 5 > >> > time firefox > >> > # close firefox once gui pops up. > >> > ########################################## > >> > > >> > It was taking close to 1 minute 30 seconds to launch firefox and dd got > >> > following. > >> > > >> > 4294967296 bytes (4.3 GB) copied, 100.602 s, 42.7 MB/s > >> > > >> > (Results do vary across runs, especially if system is booted fresh. Don't > >> > �know why...). > >> > > >> > > >> > Then I tried putting both the applications in separate groups and assign > >> > them weights 200 each. > >> > > >> > ########################################## > >> > dd if=/home/vgoyal/4G-file of=/dev/null & > >> > echo $! > /cgroup/io/test1/tasks > >> > sleep 5 > >> > echo $$ > /cgroup/io/test2/tasks > >> > time firefox > >> > # close firefox once gui pops up. > >> > ########################################## > >> > > >> > Now I firefox pops up in 27 seconds. So it cut down the time by 2/3. > >> > > >> > 4294967296 bytes (4.3 GB) copied, 84.6138 s, 50.8 MB/s > >> > > >> > Notice that throughput of dd also improved. > >> > > >> > I ran the block trace and noticed in many a cases firefox threads > >> > immediately preempted the "dd". Probably because it was a file system > >> > request. So in this case latency will arise from seek time. > >> > > >> > In some other cases, threads had to wait for up to 100ms because dd was > >> > not preempted. In this case latency will arise both from waiting on queue > >> > as well as seek time. > >> > >> I think cfq should already be doing something similar, i.e. giving > >> 100ms slices to firefox, that alternate with dd, unless: > >> * firefox is too seeky (in this case, the idle window will be too small) > >> * firefox has too much think time. > >> > > > Hi Vivek, > > Hi Corrado, > > > > "firefox" is the shell script to setup the environment and launch the > > broser. It seems to be a group of threads. Some of them run in parallel > > and some of these seems to be running one after the other (once previous > > process or threads finished). > > Ok. > > > > >> To rule out the first case, what happens if you run the test with your > >> "fairness for seeky processes" patch? > > > > I applied that patch and it helps a lot. > > > > http://lwn.net/Articles/341032/ > > > > With above patchset applied, and fairness=1, firefox pops up in 27-28 seconds. > > Great. > Can you try the attached patch (on top of 2.6.31)? > It implements the alternative approach we discussed privately in july, > and it addresses the possible latency increase that could happen with > your patch. > > To summarize for everyone, we separate sync sequential queues, sync > seeky queues and async queues in three separate RR strucutres, and > alternate servicing requests between them. > > When servicing seeky queues (the ones that are usually penalized by > cfq, for which no fairness is usually provided), we do not idle > between them, but we do idle for the last queue (the idle can be > exited when any seeky queue has requests). This allows us to allocate > disk time globally for all seeky processes, and to reduce seeky > processes latencies. > Ok, I seem to be doing same thing at group level (In group scheduling patches). I do not idle on individual sync seeky queues but if this is last queue in the group, then I do idle to make sure group does not loose its fair share and exit from idle the moment there is any busy queue in the group. So you seem to be grouping all the sync seeky queues system wide in a single group. So all the sync seeky queues collectively get 100ms in a single round of dispatch? I am wondering what happens if there are lot of such sync seeky queues this 100ms time slice is consumed before all the sync seeky queues got a chance to dispatch. Does that mean that some of the queues can completely skip the one dispatch round? Thanks Vivek > I tested with 'konsole -e exit', while doing a sequential write with > dd, and the start up time reduced from 37s to 7s, on an old laptop > disk. > > Thanks, > Corrado > > > > >> To rule out the first case, what happens if you run the test with your > >> "fairness for seeky processes" patch? > > > > I applied that patch and it helps a lot. > > > > http://lwn.net/Articles/341032/ > > > > With above patchset applied, and fairness=1, firefox pops up in 27-28 > > seconds. > > > > So it looks like if we don't disable idle window for seeky processes on > > hardware supporting command queuing, it helps in this particular case. > > > > Thanks > > Vivek > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |