Prev: [PATCH RFC] rt2500usb: disable broken HW encryption by default
Next: Make kernel-doc ignore __init_or_module
From: Anton Starikov on 23 Mar 2010 11:50 Hi, We have benchmarked some multithreaded code here on 16-core/4-way opteron 8356 host on number of kernels (see below) and found strange results. Up to 8 threads we didn't see any noticeable differences in performance, but starting from 9 threads performance diverges substantially. I provide here results for 14 threads 2.6.18-164.11.1.el5 (centos) user time: ~60 sec sys time: ~12 sec 2.6.32.9-70.fc12.x86_64 (fedora-12) user time: ~60 sec sys time: ~75 sec 2.6.33-0.46.rc8.git1.fc13.x86_64 (fedora-12 + rawhide kernel) user time: ~60 sec sys time: ~300 sec In all three cases real time corresponds to given numbers. Binary used for all three cases is exactly the same (compiled on centos). Setups for all three cases so identical as possible (last two - the sanme fedore-12 booted with different kernels). I submit to LKLM because I feel that this is some general kernel issue rather than RH-flavored kernel issue. What can be reason of this regress in performance? Is it possible to tune something to recover performance on 2.6.18 kernel? I perf'ed on 2.6.32.9-70.fc12.x86_64 kernel report (top part only): 43.64% dve22lts-mc [kernel] [k] _spin_lock_irqsave 32.93% dve22lts-mc ./dve22lts-mc [.] DBSLLlookup_ret 5.37% dve22lts-mc ./dve22lts-mc [.] SuperFastHash 3.76% dve22lts-mc /lib64/libc-2.11.1.so [.] __GI_memcpy 2.60% dve22lts-mc [kernel] [k] clear_page_c 1.60% dve22lts-mc ./dve22lts-mc [.] index_next_dfs stat: 129875.554435 task-clock-msecs # 10.210 CPUs 1883 context-switches # 0.000 M/sec 17 CPU-migrations # 0.000 M/sec 2695310 page-faults # 0.021 M/sec 298370338040 cycles # 2297.356 M/sec 130581778178 instructions # 0.438 IPC 42517143751 cache-references # 327.368 M/sec 101906904 cache-misses # 0.785 M/sec callgraph: 53.09% dve22lts-mc [kernel] [k] _spin_lock_irqsave | |--49.90%-- __down_read_trylock | down_read_trylock | do_page_fault | page_fault | | | |--99.99%-- __GI_memcpy | | | | | |--84.28%-- (nil) | | | | | |--9.78%-- 0x100000000 | | | | | --5.94%-- 0x1 | --0.01%-- [...] | |--49.39%-- __up_read | up_read | | | |--100.00%-- do_page_fault | | page_fault | | | | | |--99.99%-- __GI_memcpy | | | | | | | |--84.18%-- (nil) | | | | | | | |--10.13%-- 0x100000000 | | | | | | | --5.69%-- 0x1 | | --0.01%-- [...] | --0.00%-- [...] --0.72%-- [...] Anton.-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Anton Starikov on 23 Mar 2010 12:40
Created bug report with binary test-case. https://bugzilla.kernel.org/show_bug.cgi?id=15618 On Mar 23, 2010, at 4:43 PM, Anton Starikov wrote: > Hi, > > We have benchmarked some multithreaded code here on 16-core/4-way opteron 8356 host on number of kernels (see below) and found strange results. > Up to 8 threads we didn't see any noticeable differences in performance, but starting from 9 threads performance diverges substantially. I provide here results for 14 threads > > 2.6.18-164.11.1.el5 (centos) > > user time: ~60 sec > sys time: ~12 sec > > 2.6.32.9-70.fc12.x86_64 (fedora-12) > > user time: ~60 sec > sys time: ~75 sec > > 2.6.33-0.46.rc8.git1.fc13.x86_64 (fedora-12 + rawhide kernel) > > user time: ~60 sec > sys time: ~300 sec > > In all three cases real time corresponds to given numbers. > > Binary used for all three cases is exactly the same (compiled on centos). > Setups for all three cases so identical as possible (last two - the sanme fedore-12 booted with different kernels). > I submit to LKLM because I feel that this is some general kernel issue rather than RH-flavored kernel issue. > > What can be reason of this regress in performance? Is it possible to tune something to recover performance on 2.6.18 kernel? > > I perf'ed on 2.6.32.9-70.fc12.x86_64 kernel > > report (top part only): > > 43.64% dve22lts-mc [kernel] [k] _spin_lock_irqsave > 32.93% dve22lts-mc ./dve22lts-mc [.] DBSLLlookup_ret > 5.37% dve22lts-mc ./dve22lts-mc [.] SuperFastHash > 3.76% dve22lts-mc /lib64/libc-2.11.1.so [.] __GI_memcpy > 2.60% dve22lts-mc [kernel] [k] clear_page_c > 1.60% dve22lts-mc ./dve22lts-mc [.] index_next_dfs > > stat: > 129875.554435 task-clock-msecs # 10.210 CPUs > 1883 context-switches # 0.000 M/sec > 17 CPU-migrations # 0.000 M/sec > 2695310 page-faults # 0.021 M/sec > 298370338040 cycles # 2297.356 M/sec > 130581778178 instructions # 0.438 IPC > 42517143751 cache-references # 327.368 M/sec > 101906904 cache-misses # 0.785 M/sec > > callgraph: > > 53.09% dve22lts-mc [kernel] [k] > _spin_lock_irqsave > | > |--49.90%-- __down_read_trylock > | down_read_trylock > | do_page_fault > | page_fault > | | > | |--99.99%-- __GI_memcpy > | | | > | | |--84.28%-- (nil) > | | | > | | |--9.78%-- 0x100000000 > | | | > | | --5.94%-- 0x1 > | --0.01%-- > [...] > > | > |--49.39%-- __up_read > | up_read > | | > | |--100.00%-- do_page_fault > | | page_fault > | | | > | | |--99.99%-- __GI_memcpy > | | | | > | | | |--84.18%-- (nil) > | | | | > | | | |--10.13%-- 0x100000000 > | | | | > | | | --5.69%-- 0x1 > | | --0.01%-- > [...] > > | --0.00%-- > [...] > > --0.72%-- > [...] > > > Anton. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |