From: Anton Starikov on
Hi,

We have benchmarked some multithreaded code here on 16-core/4-way opteron 8356 host on number of kernels (see below) and found strange results.
Up to 8 threads we didn't see any noticeable differences in performance, but starting from 9 threads performance diverges substantially. I provide here results for 14 threads

2.6.18-164.11.1.el5 (centos)

user time: ~60 sec
sys time: ~12 sec

2.6.32.9-70.fc12.x86_64 (fedora-12)

user time: ~60 sec
sys time: ~75 sec

2.6.33-0.46.rc8.git1.fc13.x86_64 (fedora-12 + rawhide kernel)

user time: ~60 sec
sys time: ~300 sec

In all three cases real time corresponds to given numbers.

Binary used for all three cases is exactly the same (compiled on centos).
Setups for all three cases so identical as possible (last two - the sanme fedore-12 booted with different kernels).
I submit to LKLM because I feel that this is some general kernel issue rather than RH-flavored kernel issue.

What can be reason of this regress in performance? Is it possible to tune something to recover performance on 2.6.18 kernel?

I perf'ed on 2.6.32.9-70.fc12.x86_64 kernel

report (top part only):

43.64% dve22lts-mc [kernel] [k] _spin_lock_irqsave
32.93% dve22lts-mc ./dve22lts-mc [.] DBSLLlookup_ret
5.37% dve22lts-mc ./dve22lts-mc [.] SuperFastHash
3.76% dve22lts-mc /lib64/libc-2.11.1.so [.] __GI_memcpy
2.60% dve22lts-mc [kernel] [k] clear_page_c
1.60% dve22lts-mc ./dve22lts-mc [.] index_next_dfs

stat:
129875.554435 task-clock-msecs # 10.210 CPUs
1883 context-switches # 0.000 M/sec
17 CPU-migrations # 0.000 M/sec
2695310 page-faults # 0.021 M/sec
298370338040 cycles # 2297.356 M/sec
130581778178 instructions # 0.438 IPC
42517143751 cache-references # 327.368 M/sec
101906904 cache-misses # 0.785 M/sec

callgraph:

53.09% dve22lts-mc [kernel] [k]
_spin_lock_irqsave
|
|--49.90%-- __down_read_trylock
| down_read_trylock
| do_page_fault
| page_fault
| |
| |--99.99%-- __GI_memcpy
| | |
| | |--84.28%-- (nil)
| | |
| | |--9.78%-- 0x100000000
| | |
| | --5.94%-- 0x1
| --0.01%--
[...]

|
|--49.39%-- __up_read
| up_read
| |
| |--100.00%-- do_page_fault
| | page_fault
| | |
| | |--99.99%-- __GI_memcpy
| | | |
| | | |--84.18%-- (nil)
| | | |
| | | |--10.13%-- 0x100000000
| | | |
| | | --5.69%-- 0x1
| | --0.01%--
[...]

| --0.00%--
[...]

--0.72%--
[...]


Anton.--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Anton Starikov on
Created bug report with binary test-case.

https://bugzilla.kernel.org/show_bug.cgi?id=15618


On Mar 23, 2010, at 4:43 PM, Anton Starikov wrote:

> Hi,
>
> We have benchmarked some multithreaded code here on 16-core/4-way opteron 8356 host on number of kernels (see below) and found strange results.
> Up to 8 threads we didn't see any noticeable differences in performance, but starting from 9 threads performance diverges substantially. I provide here results for 14 threads
>
> 2.6.18-164.11.1.el5 (centos)
>
> user time: ~60 sec
> sys time: ~12 sec
>
> 2.6.32.9-70.fc12.x86_64 (fedora-12)
>
> user time: ~60 sec
> sys time: ~75 sec
>
> 2.6.33-0.46.rc8.git1.fc13.x86_64 (fedora-12 + rawhide kernel)
>
> user time: ~60 sec
> sys time: ~300 sec
>
> In all three cases real time corresponds to given numbers.
>
> Binary used for all three cases is exactly the same (compiled on centos).
> Setups for all three cases so identical as possible (last two - the sanme fedore-12 booted with different kernels).
> I submit to LKLM because I feel that this is some general kernel issue rather than RH-flavored kernel issue.
>
> What can be reason of this regress in performance? Is it possible to tune something to recover performance on 2.6.18 kernel?
>
> I perf'ed on 2.6.32.9-70.fc12.x86_64 kernel
>
> report (top part only):
>
> 43.64% dve22lts-mc [kernel] [k] _spin_lock_irqsave
> 32.93% dve22lts-mc ./dve22lts-mc [.] DBSLLlookup_ret
> 5.37% dve22lts-mc ./dve22lts-mc [.] SuperFastHash
> 3.76% dve22lts-mc /lib64/libc-2.11.1.so [.] __GI_memcpy
> 2.60% dve22lts-mc [kernel] [k] clear_page_c
> 1.60% dve22lts-mc ./dve22lts-mc [.] index_next_dfs
>
> stat:
> 129875.554435 task-clock-msecs # 10.210 CPUs
> 1883 context-switches # 0.000 M/sec
> 17 CPU-migrations # 0.000 M/sec
> 2695310 page-faults # 0.021 M/sec
> 298370338040 cycles # 2297.356 M/sec
> 130581778178 instructions # 0.438 IPC
> 42517143751 cache-references # 327.368 M/sec
> 101906904 cache-misses # 0.785 M/sec
>
> callgraph:
>
> 53.09% dve22lts-mc [kernel] [k]
> _spin_lock_irqsave
> |
> |--49.90%-- __down_read_trylock
> | down_read_trylock
> | do_page_fault
> | page_fault
> | |
> | |--99.99%-- __GI_memcpy
> | | |
> | | |--84.28%-- (nil)
> | | |
> | | |--9.78%-- 0x100000000
> | | |
> | | --5.94%-- 0x1
> | --0.01%--
> [...]
>
> |
> |--49.39%-- __up_read
> | up_read
> | |
> | |--100.00%-- do_page_fault
> | | page_fault
> | | |
> | | |--99.99%-- __GI_memcpy
> | | | |
> | | | |--84.18%-- (nil)
> | | | |
> | | | |--10.13%-- 0x100000000
> | | | |
> | | | --5.69%-- 0x1
> | | --0.01%--
> [...]
>
> | --0.00%--
> [...]
>
> --0.72%--
> [...]
>
>
> Anton.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/