Prev: [PATCH 8/8] Staging: comedi: fix brace and printk() coding style issues in adl_pci9111.c
Next: rmap: Fix Bugzilla Bug #5493
From: Albert Strasheim on 5 Mar 2010 08:30 Hello all I've been doing some experiments with the 2.6.31 and 2.6.33 kernels with multipathing to Sun JBODs over SAS. Theoretically, our setup should allow throughput in the 3600 MB/sec range, but we get about 2200-2300 MB/sec. Even taking into account the various potential inefficiencies in the setup, it still seems like one might be able to squeeze more out of this system. We have the following hardware: Sun Blade X6270 2* LSI SAS 1068E controllers 2* Sun J4400 JBODs with 1 TB disks (24 disks per JBOD) Fedora Core 12 2.6.33 kernel from FC13 (also tried with latest 2.6.31 kernel from FC12, same results) The build of 2.6.33 is from here: http://kojipkgs.fedoraproject.org/packages/kernel/2.6.33/4.fc13/ Here's the datasheet for the SAS hardware: http://www.sun.com/storage/storage_networking/hba/sas/PCIe.pdf It's using PCI Express 1.0a, 8x lanes. With a bandwidth of 250 MB/sec per lane, we should be able to do 2000 MB/sec per SAS controller. Each controller can do 3 Gb/sec per port and has two 4 port PHYs. We connect both PHYs from a controller to a JBOD. So between the JBOD and the controller we have 2 PHYs * 4 SAS ports * 3 Gb/sec = 24 Gb/sec of bandwidth, which is more than the PCI Express bandwidth. With write caching enabled and when doing big writes, each disk can sustain about 80 MB/sec (near the start of the disk). With 24 disks, that means we should be able to do 1920 MB/sec per JBOD. I write to the disks using dd, since we're mostly interested in sequential performance. The multipath configuration for each disk looks as follows: multipath { rr_min_io 100 uid 0 path_grouping_policy multibus failback manual path_selector "round-robin 0" rr_weight priorities alias somealias no_path_retry queue mode 0644 gid 0 wwid somewwid } I tried values of 50, 100, 1000 for rr_min_io, but it doesn't seem to make much difference. Along with varying rr_min_io I tried adding some delay between starting the dd's to prevent all of them writing over the same PHY at the same time, but this didn't make any difference, so I think the I/O's are getting properly spread out. According to /proc/interrupts, the SAS controllers are using a "IR-IO-APIC-fasteoi" interrupt scheme. For some reason only core #0 in the machine is handling these interrupts. I can improve performance slightly by assigning a separate core to handle the interrupts for each SAS controller: echo 2 > /proc/irq/24/smp_affinity echo 4 > /proc/irq/26/smp_affinity Using dd to write to the disk generates "Function call interrupts" (no idea what these are), which are handled by core #4, so I keep other processes off this core too. I run 48 dd's (one for each disk), assigning them to cores not dealing with interrupts like so: taskset -c somecore dd if=/dev/zero of=/dev/mapper/mpathx oflag=direct bs=128M oflag=direct prevents any kind of buffer cache from getting involved. None of my cores seem maxed out. The cores dealing with interrupts are mostly idle and all the other cores are waiting on I/O as one would expect. Cpu0 : 0.0%us, 1.0%sy, 0.0%ni, 91.2%id, 7.5%wa, 0.0%hi, 0.2%si, 0.0%st Cpu1 : 0.0%us, 0.8%sy, 0.0%ni, 93.0%id, 0.2%wa, 0.0%hi, 6.0%si, 0.0%st Cpu2 : 0.0%us, 0.6%sy, 0.0%ni, 94.4%id, 0.1%wa, 0.0%hi, 4.8%si, 0.0%st Cpu3 : 0.0%us, 7.5%sy, 0.0%ni, 36.3%id, 56.1%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 1.3%sy, 0.0%ni, 85.7%id, 4.9%wa, 0.0%hi, 8.1%si, 0.0%st Cpu5 : 0.1%us, 5.5%sy, 0.0%ni, 36.2%id, 58.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 5.0%sy, 0.0%ni, 36.3%id, 58.7%wa, 0.0%hi, 0.0%si, 0.0%st .... Cpu15 : 0.1%us, 5.4%sy, 0.0%ni, 36.5%id, 58.1%wa, 0.0%hi, 0.0%si, 0.0%st Does anybody have any idea where my missing throughput went or how to continue this investigation to find the reason for the bottleneck? Thanks! Regards Albert Strasheim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |