Prev: crypto: Fix aesni algorithm implementation without test
Next: + binfmt-introduce-coredump-parameter-structure.patch added to -mm tree
From: Mel Gorman on 26 Nov 2009 07:30 (cc'ing the people from the page allocator failure thread as this might be relevant to some of their problems) I know this is very last minute but I believe we should consider disabling the "low_latency" tunable for block devices by default for 2.6.32. There was evidence that low_latency was a problem last week for page allocation failure reports but the reproduction-case was unusual and involved high-order atomic allocations in low-memory conditions. It took another few days to accurately show the problem for more normal workloads and it's a bit more wide-spread than just allocation failures. Basically, low_latency looks great as long as you have plenty of memory but in low memory situations, it appears to cause problems that manifest as reduced performance, desktop stalls and in some cases, page allocation failures. I think most kernel developers are not seeing the problem as they tend to test on beefier machines and without hitting swap or low-memory situations for the most part. When they are hitting low-memory situations, it tends to be for stress tests where stalls and low performance are expected. To show the problem, I used an x86-64 machine booting booted with 512MB of memory. This is a small amount of RAM but the bug reports related to page allocation failures were on smallish machines and the disks in the system are not very high-performance. I used three tests. The first was sysbench on postgres running an IO-heavy test against a large database with 10,000,000 rows. The second was IOZone running most of the automatic tests with a record length of 4KB and the last was a simulated launching of gitk with a music player running in the background to act as a desktop-like scenario. The final test was similar to the test described here http://lwn.net/Articles/362184/ except that dm-crypt was not used as it has its own problems. Sysbench results looks as follows sysbench-with sysbench-without low-latency low-latency 1 1266.02 ( 0.00%) 1278.55 ( 0.98%) 2 1182.58 ( 0.00%) 1379.25 (14.26%) 3 1257.08 ( 0.00%) 1580.08 (20.44%) 4 1212.11 ( 0.00%) 1534.17 (20.99%) 5 1046.77 ( 0.00%) 1552.48 (32.57%) 6 1187.14 ( 0.00%) 1661.19 (28.54%) 7 1179.37 ( 0.00%) 790.26 (-49.24%) 8 1164.62 ( 0.00%) 854.10 (-36.36%) 9 1125.04 ( 0.00%) 1655.04 (32.02%) 10 1147.52 ( 0.00%) 1653.89 (30.62%) 11 823.38 ( 0.00%) 1627.45 (49.41%) 12 813.73 ( 0.00%) 1494.63 (45.56%) 13 898.22 ( 0.00%) 1521.64 (40.97%) 14 873.50 ( 0.00%) 1311.09 (33.38%) 15 808.32 ( 0.00%) 1009.70 (19.94%) 16 758.17 ( 0.00%) 725.17 (-4.55%) The first column is threads. Disabling low_latency performs much better for the most part. I should point out that with plenty of memory, sysbench tends to perform better *with* low_latency but as we're seeing page allocation failure reports in low memory situations and desktop stalls, the lower memory situation is also important. The IOZone results are long I'm afraid. iozone-with iozone-without low-latency low-latency write-64 151212 ( 0.00%) 159856 ( 5.41%) write-128 189357 ( 0.00%) 206233 ( 8.18%) write-256 219883 ( 0.00%) 223174 ( 1.47%) write-512 224932 ( 0.00%) 220227 (-2.14%) write-1024 227738 ( 0.00%) 226155 (-0.70%) write-2048 227564 ( 0.00%) 224848 (-1.21%) write-4096 208556 ( 0.00%) 223430 ( 6.66%) write-8192 219484 ( 0.00%) 219389 (-0.04%) write-16384 206670 ( 0.00%) 206295 (-0.18%) write-32768 203023 ( 0.00%) 201852 (-0.58%) write-65536 162134 ( 0.00%) 189173 (14.29%) write-131072 68534 ( 0.00%) 67417 (-1.66%) write-262144 32936 ( 0.00%) 27750 (-18.69%) write-524288 24044 ( 0.00%) 23759 (-1.20%) rewrite-64 755681 ( 0.00%) 755681 ( 0.00%) rewrite-128 581518 ( 0.00%) 799840 (27.30%) rewrite-256 639427 ( 0.00%) 659861 ( 3.10%) rewrite-512 669577 ( 0.00%) 684954 ( 2.24%) rewrite-1024 680960 ( 0.00%) 686182 ( 0.76%) rewrite-2048 685263 ( 0.00%) 692780 ( 1.09%) rewrite-4096 631352 ( 0.00%) 643266 ( 1.85%) rewrite-8192 442146 ( 0.00%) 442624 ( 0.11%) rewrite-16384 428641 ( 0.00%) 432613 ( 0.92%) rewrite-32768 425361 ( 0.00%) 430568 ( 1.21%) rewrite-65536 405183 ( 0.00%) 389242 (-4.10%) rewrite-131072 66110 ( 0.00%) 58472 (-13.06%) rewrite-262144 29254 ( 0.00%) 29306 ( 0.18%) rewrite-524288 23812 ( 0.00%) 24543 ( 2.98%) read-64 934589 ( 0.00%) 840903 (-11.14%) read-128 1601534 ( 0.00%) 1280633 (-25.06%) read-256 1255511 ( 0.00%) 1310683 ( 4.21%) read-512 1291158 ( 0.00%) 1319723 ( 2.16%) read-1024 1319408 ( 0.00%) 1347557 ( 2.09%) read-2048 1316016 ( 0.00%) 1347393 ( 2.33%) read-4096 1253710 ( 0.00%) 1251882 (-0.15%) read-8192 995149 ( 0.00%) 1011794 ( 1.65%) read-16384 883156 ( 0.00%) 897458 ( 1.59%) read-32768 844368 ( 0.00%) 856364 ( 1.40%) read-65536 816099 ( 0.00%) 826473 ( 1.26%) read-131072 818055 ( 0.00%) 824351 ( 0.76%) read-262144 827225 ( 0.00%) 835693 ( 1.01%) read-524288 24653 ( 0.00%) 22519 (-9.48%) reread-64 2329708 ( 0.00%) 1985134 (-17.36%) reread-128 1446222 ( 0.00%) 2137031 (32.33%) reread-256 1828508 ( 0.00%) 1879725 ( 2.72%) reread-512 1521718 ( 0.00%) 1579934 ( 3.68%) reread-1024 1347557 ( 0.00%) 1375171 ( 2.01%) reread-2048 1340664 ( 0.00%) 1350783 ( 0.75%) reread-4096 1259592 ( 0.00%) 1284839 ( 1.96%) reread-8192 1007285 ( 0.00%) 1011317 ( 0.40%) reread-16384 891404 ( 0.00%) 905022 ( 1.50%) reread-32768 850492 ( 0.00%) 862772 ( 1.42%) reread-65536 836565 ( 0.00%) 847020 ( 1.23%) reread-131072 844516 ( 0.00%) 853155 ( 1.01%) reread-262144 851524 ( 0.00%) 860653 ( 1.06%) reread-524288 24927 ( 0.00%) 22487 (-10.85%) randread-64 1605256 ( 0.00%) 1775099 ( 9.57%) randread-128 1179358 ( 0.00%) 1528576 (22.85%) randread-256 1421755 ( 0.00%) 1310683 (-8.47%) randread-512 1306873 ( 0.00%) 1281909 (-1.95%) randread-1024 1201314 ( 0.00%) 1231629 ( 2.46%) randread-2048 1179413 ( 0.00%) 1190529 ( 0.93%) randread-4096 1107005 ( 0.00%) 1116792 ( 0.88%) randread-8192 894337 ( 0.00%) 899487 ( 0.57%) randread-16384 783760 ( 0.00%) 791341 ( 0.96%) randread-32768 740498 ( 0.00%) 743511 ( 0.41%) randread-65536 721640 ( 0.00%) 728139 ( 0.89%) randread-131072 715284 ( 0.00%) 720825 ( 0.77%) randread-262144 709855 ( 0.00%) 714943 ( 0.71%) randread-524288 394 ( 0.00%) 431 ( 8.58%) randwrite-64 730988 ( 0.00%) 730988 ( 0.00%) randwrite-128 746459 ( 0.00%) 742331 (-0.56%) randwrite-256 695778 ( 0.00%) 727850 ( 4.41%) randwrite-512 666253 ( 0.00%) 691126 ( 3.60%) randwrite-1024 651223 ( 0.00%) 659625 ( 1.27%) randwrite-2048 655558 ( 0.00%) 664073 ( 1.28%) randwrite-4096 635556 ( 0.00%) 642400 ( 1.07%) randwrite-8192 467357 ( 0.00%) 469734 ( 0.51%) randwrite-16384 413188 ( 0.00%) 417282 ( 0.98%) randwrite-32768 404161 ( 0.00%) 407580 ( 0.84%) randwrite-65536 379372 ( 0.00%) 381273 ( 0.50%) randwrite-131072 21780 ( 0.00%) 19758 (-10.23%) randwrite-262144 6249 ( 0.00%) 6316 ( 1.06%) randwrite-524288 2915 ( 0.00%) 2859 (-1.96%) bkwdread-64 1141196 ( 0.00%) 1141196 ( 0.00%) bkwdread-128 1066865 ( 0.00%) 1101900 ( 3.18%) bkwdread-256 877797 ( 0.00%) 1105556 (20.60%) bkwdread-512 1133103 ( 0.00%) 1162547 ( 2.53%) bkwdread-1024 1163562 ( 0.00%) 1195962 ( 2.71%) bkwdread-2048 1163439 ( 0.00%) 1204552 ( 3.41%) bkwdread-4096 1116792 ( 0.00%) 1150600 ( 2.94%) bkwdread-8192 912288 ( 0.00%) 934724 ( 2.40%) bkwdread-16384 817707 ( 0.00%) 829152 ( 1.38%) bkwdread-32768 775898 ( 0.00%) 787691 ( 1.50%) bkwdread-65536 759643 ( 0.00%) 772174 ( 1.62%) bkwdread-131072 763215 ( 0.00%) 773816 ( 1.37%) bkwdread-262144 765491 ( 0.00%) 780021 ( 1.86%) bkwdread-524288 3688 ( 0.00%) 3724 ( 0.97%) The first column is "operation-sizeInKB". The other figures are measured in operations (-O in iozone). It's a little less clear-cut but disabling low_latency wins more often than not although many of the gains are small and in the 1-3% range (or is that considered lots in iozone land?) There were big gains and losses for some tests but the really big differences were around 128 bytes so it might be a CPU caching effect. Running a simulation of multiple instances of gitk and a music player results in the following gitk-with gitk-without low-latency low-latency min 954.46 ( 0.00%) 640.65 (32.88%) mean 964.79 ( 0.00%) 655.57 (32.05%) stddev 10.01 ( 0.00%) 13.33 (-33.18%) max 981.23 ( 0.00%) 675.65 (31.14%) The measure is the time taken for the fake-gitk program to complete its job. Disabling low_latency completes the test far faster. On previous tests, I had busted networking to do high-order atomic allocations to simualate wireless cards which are high-order happy. In those tests, disabling low_latency performed better, produced more stable results, stalled less (which I think would look like a desktop stall in a normal environment) and critically, it didn't fail high-order page allocations. i.e. Enabling low_latency hurts reclaim in some unspecified fashion. On my laptop (2GB RAM), I find the desktop stalls less when I disable low_latency in the situation where something kicks off a lot of IO. For example, if I do a large git operation and switch to a browser while that is doing its thing, I notice that the desktop sometimes stalls for almost a second. I do not see this with low_latency disabled but I cannot quantify this better and it's tricky to reproduce. I also might be fooling myself because I expect to see problems with low_latency enabled. I regret that I do not have an explanation as to why low_latency causes problems other than a hunch that low_latency is preventing page writeback happening fast enough and that causes stalls later. Theories and patches welcome but if it cannot be resolved, should the following be applied? Signed-off-by: Mel Gorman <mel(a)csn.ul.ie> --- block/cfq-iosched.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index aa1e953..dc33045 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -2543,7 +2543,7 @@ static void *cfq_init_queue(struct request_queue *q) cfqd->cfq_slice[1] = cfq_slice_sync; cfqd->cfq_slice_async_rq = cfq_slice_async_rq; cfqd->cfq_slice_idle = cfq_slice_idle; - cfqd->cfq_latency = 1; + cfqd->cfq_latency = 0; cfqd->hw_tag = 1; cfqd->last_end_sync_rq = jiffies; return cfqd; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |