From: Martin v. Loewis on 21 Feb 2010 16:22 John Nagle wrote: > I know there's a performance penalty for running Python on a > multicore CPU, but how bad is it? I've read the key paper > ("www.dabeaz.com/python/GIL.pdf"), of course. It would be adequate > if the GIL just limited Python to running on one CPU at a time, > but it's worse than that; there's excessive overhead due to > a lame locking implementation. Running CPU-bound multithreaded > code on a dual-core CPU runs HALF AS FAST as on a single-core > CPU, according to Beasley. I couldn't reproduce these results on Linux. Not sure what "HALF AS FAST" is; I suppose it means "it runs TWICE AS LONG" - this is what I couldn't reproduce. If I run Beazley's program on Linux 2.6.26, on a 4 processor Xeon (3GHz) machine, I get 30s for the sequential execution, 40s for the multi-threaded case, and 32s for the multi-threaded case when pinning the Python process to a single CPU (using taskset(1)). So it's 6% overhead for threading, and 25% penalty for multicore CPUs - far from the 100% you seem to expect. Regards, Martin
From: Ryan Kelly on 21 Feb 2010 16:45 On Sun, 2010-02-21 at 22:22 +0100, Martin v. Loewis wrote: > John Nagle wrote: > > I know there's a performance penalty for running Python on a > > multicore CPU, but how bad is it? I've read the key paper > > ("www.dabeaz.com/python/GIL.pdf"), of course. It would be adequate > > if the GIL just limited Python to running on one CPU at a time, > > but it's worse than that; there's excessive overhead due to > > a lame locking implementation. Running CPU-bound multithreaded > > code on a dual-core CPU runs HALF AS FAST as on a single-core > > CPU, according to Beasley. > > I couldn't reproduce these results on Linux. Not sure what "HALF AS > FAST" is; I suppose it means "it runs TWICE AS LONG" - this is what I > couldn't reproduce. > > If I run Beazley's program on Linux 2.6.26, on a 4 processor Xeon (3GHz) > machine, I get 30s for the sequential execution, 40s for the > multi-threaded case, and 32s for the multi-threaded case when pinning > the Python process to a single CPU (using taskset(1)). > > So it's 6% overhead for threading, and 25% penalty for multicore CPUs - > far from the 100% you seem to expect. It's far from scientific, but I've seen behaviour that's close to a 100% performance penalty on a dual-core linux system: http://www.rfk.id.au/blog/entry/a-gil-adventure-threading2 Short story: a particular test suite of mine used to run in around 25 seconds, but a bit of ctypes magic to set thread affinity dropped the running time to under 13 seconds. Cheers, Ryan -- Ryan Kelly http://www.rfk.id.au | This message is digitally signed. Please visit ryan(a)rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
From: Martin v. Loewis on 21 Feb 2010 17:05 > It's far from scientific, but I've seen behaviour that's close to a 100% > performance penalty on a dual-core linux system: > > http://www.rfk.id.au/blog/entry/a-gil-adventure-threading2 > > Short story: a particular test suite of mine used to run in around 25 > seconds, but a bit of ctypes magic to set thread affinity dropped the > running time to under 13 seconds. Indeed, it's not scientific - but with a few more details, you could improve it quite a lot: what specific Linux distribution (the posting doesn't even say it's Linux), what specific Python version had you been using? (less important) what CPUs? If you can: what specific test suite? A lot of science is about repeatability. Making a systematic study is (IMO) over-valued - anecdotal reports are useful, too, as long as they allow for repeatable experiments. Regards, Martin
From: Ryan Kelly on 21 Feb 2010 17:39 On Sun, 2010-02-21 at 23:05 +0100, Martin v. Loewis wrote: > > It's far from scientific, but I've seen behaviour that's close to a 100% > > performance penalty on a dual-core linux system: > > > > http://www.rfk.id.au/blog/entry/a-gil-adventure-threading2 > > > > Short story: a particular test suite of mine used to run in around 25 > > seconds, but a bit of ctypes magic to set thread affinity dropped the > > running time to under 13 seconds. > > Indeed, it's not scientific - but with a few more details, you could > improve it quite a lot: what specific Linux distribution (the posting > doesn't even say it's Linux), what specific Python version had you been > using? (less important) what CPUs? If you can: what specific test suite? I'm on Ubuntu Karmic, Python 2.6.4, an AMD Athlon 7750 dual core. Unfortunately the test suite is for a proprietary application. I've been able to reproduce similar behaviour with an open-source test suite, using the current trunk of the "pyfilesystem" project: http://code.google.com/p/pyfilesystem/ In this project "OSFS" is an object-oriented interface to the local filesystem. The test case "TestOSFS.test_cases_in_separate_dirs" runs three theads, each doing a bunch of IO in a different directory. Running the tests normally: rfk(a)durian:/storage/software/fs$ nosetests fs/tests/test_fs.py:TestOSFS..test_cases_in_separate_dirs . ---------------------------------------------------------------------- Ran 1 test in 9.787s That's the best result from five runs - I saw it go as high as 12 seconds. Watching it in top, I see CPU usage at around 150%. Now using threading2 to set the process cpu affinity at the start of the test run: rfk(a)durian:/storage/software/fs$ nosetests fs/tests/test_fs.py:TestOSFS..test_cases_in_separate_dirs . ---------------------------------------------------------------------- Ran 1 test in 3.792s Again, best of five. The variability in times here is much lower - I never saw it go above 4 seconds. CPU usage is consistently 100%. Cheers, Ryan -- Ryan Kelly http://www.rfk.id.au | This message is digitally signed. Please visit ryan(a)rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
First
|
Prev
|
Pages: 1 2 3 4 Prev: python admin abuse complaint Next: MBA ASSIGNMENT SOLVING VIA E-MAIL |