Prev: kbuild: allow user to assign LDFLAGS_MODULE
Next: [PATCH net-next 3/4] drivers/net/wireless/at76c50x-usb.c: Neaten macros
From: Vivek Goyal on 26 Jul 2010 17:30 On Mon, Jul 26, 2010 at 10:30:23AM -0400, Vivek Goyal wrote: > On Sat, Jul 24, 2010 at 11:07:07AM +0200, Corrado Zoccolo wrote: > > On Sat, Jul 24, 2010 at 10:51 AM, Christoph Hellwig <hch(a)infradead.org> wrote: > > > To me this sounds like slice_idle=0 is the right default then, as it > > > gives useful behaviour for all systems linux runs on. > > No, it will give bad performance on single disks, possibly worse than > > deadline (deadline at least sorts the requests between different > > queues, while CFQ with slice_idle=0 doesn't even do this for readers). > > > Setting slice_idle to 0 should be considered only when a single > > sequential reader cannot saturate the disk bandwidth, and this happens > > only on smart enough hardware with large number of spindles. > > I was thinking of writting a user space utility which can launch > increasing number of parallel direct/buffered reads from device and if > device can sustain more than 1 parallel reads with increasing throughput, > then it probably is good indicator that one might be better off with > slice_idle=0. > > Will try that today... Ok, here is a small hackish bash script which takes a block device as input. It runs multiple parallel sequential readers in raw mode (dd on block device) and measures the total throughput. I run readers on different areas of disks so that readers don't overlap and don't end up reading same block. The idea is to write a simple script which can run bunch of tests and suggest to user what IO scheduler to run or what IO scheduler tunable to use. At this point of time I am only looking to identify if we should use slice_idle or not in CFQ on a given block device. Here are some results of various runs. First column reporesents number of processes run in paralle, second column is total BW and third column is bandwidth of individual dd processes. Throughputs are in MB/s. SATA disk ========= Noop ---- 1 63.3 63.3 2 18.7 9.4 9.3 4 21.6 5.5 5.4 5.4 5.3 8 29.6 5.9 4.5 3.6 3.5 3.3 3.0 3.0 2.8 CFQ --- 1 63.2 63.2 2 54.8 29.2 25.6 4 50.3 13.9 12.8 12.1 11.5 8 42.9 6.0 5.8 5.5 5.4 5.2 5.1 5.0 4.9 Storage Array (12 disks in RAID 5 configuration) ================================================ Noop ---- 1 62.5 62.5 2 86.5 46.1 40.4 4 98.7 32.4 24.3 21.9 20.1 8 112.5 15.8 15.5 15.3 13.6 13.6 13.3 13.2 12.2 CFQ --- 1 56.9 56.9 2 34.8 18.0 16.8 4 38.8 10.4 10.3 9.4 8.7 8 44.4 6.1 6.1 5.9 5.9 5.7 5.0 4.9 4.8 SSD === Noop ---- 1 243 243 2 231 122 109 4 270.6 73.8 73.5 65.1 58.2 8 262.9 33.3 33.2 33.2 33.2 33.2 33.2 33.2 30.4 CFQ --- 1 244 244 2 228 120 108 4 260.6 67.1 67.0 67.0 59.5 8 266.0 35.0 33.4 33.4 33.4 33.4 33.4 33.4 30.6 Summary: - On SATA disk with single spindle as number of processes increase (2), disk starts experiencing seeks and throughput drops dramatically. Here CFQ idling helps. - On storage array, with noop, total throughput increases as number of dd processes increase. That means underlying storage can support multiple parallel readers without getting seek bound. In this probably one should set slice_idle=0 - With SSD throughput does not deteriorate as number of readers are incrased. CFQ also performs well because internally idling is disabled as SSD is marked as non-rotational device. So bottom line, if device can support multiple parallel read stream without significant drop in throughput, one can set slice_idle=0 in CFQ to achieve better overall throughput. This will primarily be true for data disks and not root disk as it does not gurantee better latencies in presence of buffered WRITES. Thanks Vivek |