From: Felix on 8 Jul 2010 23:39 On Jul 4, 11:25 am, David Cournapeau <courn...(a)gmail.com> wrote: > On Mon, Jul 5, 2010 at 12:00 AM, D'Arcy J.M. Cain <da...(a)druid.net> wrote: > > I wish it was orders of magnitude faster for web development. I'm just > > saying that places where we need compiled language speed that Python > > already has that in C. > > Well, I wish I did not have to use C, then :) For example, as a > contributor to numpy, it bothers me at a fundamental level that so > much of numpy is in C. This is something that I have been thinking about recently. Python has won quite a following in the scientific computing area, probably especially because of great libraries such as numpy, scipy, pytables etc. But it also seems python itself is falling further and further behind in terms of performance and parallel processing abilities. Of course all that can be fixed by writing C modules (e.g. with the help of cython), but that weakens the case for using python in the first place. For an outsider it does not look like a solution to the GIL mess or a true breakthrough for performance are around the corner (even though there seem to be many different attempts at working around these problems or helping with parts). Am I wrong? If not, what is the perspective? Do we need to move on to the next language and loose all the great libraries that have been built around python? Felix
From: Stefan Behnel on 9 Jul 2010 00:44 Felix, 09.07.2010 05:39: > On Jul 4, 11:25 am, David Cournapeau wrote: >> Well, I wish I did not have to use C, then :) For example, as a >> contributor to numpy, it bothers me at a fundamental level that so >> much of numpy is in C. > > This is something that I have been thinking about recently. Python has > won quite a following in the scientific computing area, probably > especially because of great libraries such as numpy, scipy, pytables > etc. But it also seems python itself is falling further and further > behind in terms of performance and parallel processing abilities. Well, at least its "parallel processing abilities" are quite good actually. If you have really large computations, they usually run on more than one computer (not just more than one processor). So you can't really get around using something like MPI, in which case an additional threading layer is basically worthless, regardless of the language you use. For computations, threading keeps being highly overrated. WRT a single machine, you should note that GPGPUs are a lot faster these days than even multi-core CPUs. And Python has pretty good support for GPUs, too. > Of course all that can be fixed by writing C modules (e.g. with the help > of cython), but that weakens the case for using python in the first > place. Not at all. Look at Sage, for example. It's attractive because it provides tons of functionality, all nicely glued together through a simple language that even non-programmers can use efficiently and effectively. And its use of Cython makes all of this easily extensible without crossing the gap of a language border. Stefan
From: sturlamolden on 9 Jul 2010 01:16 On 9 Jul, 05:39, Felix <schle...(a)cshl.edu> wrote: > This is something that I have been thinking about recently. Python has > won quite a following in the scientific computing area, probably > especially because of great libraries such as numpy, scipy, pytables > etc. Python is much more friendly to memory than Matlab, and a much nicer language to work in. It can also be used to program more than just linear algebra. If you have to read data from a socket, Matlab is not so fun anymore. > But it also seems python itself is falling further and further > behind in terms of performance and parallel processing abilities. First, fine-grained parallelism really belongs in libraries like MKL, GotoBLAS and FFTW. Python can manage the high-level routines just like Matlab. You can call a NumPy routine like np.dot, and the BLAS library (e.g. Intel MKL) will do the multi-threading for you. We almost always use Python to orchestrate C and Fortran. We can use OpenMP in C or Fortran, or we can just release the GIL and use Python threads. Second, the GIL it does not matter for MPI, as it works with processes. Nor does it matter for os.fork or multiprocessing. On clusters, which are as common in high-performance computing as SMP systems, one has to use processes (usually MPI) rather than threads, as there is no shared memory between processors. On SMP systems, MPI can use shared-memory and be just as efficient as threads (OpenMP). (MPI is usually faster due to cache problems with threads.) Consider that Matlab does not even have threads (or did not last time I checked). Yet it takes advantage of multi-core CPUs for numerical computing. It's not the high-level interface that matters, it's the low-level libraries. And Python is just that: a high-level "glue" language. > For an outsider it does not look like a solution to the GIL mess or a > true breakthrough for performance are around the corner (even though > there seem to be many different attempts at working around these > problems or helping with parts). Am I wrong? Yes you are. We don't do CPU intensive work in "pure Python". We use Python to control C and Fortran libraries. That gives us the opportunity to multi-thread in C, release the GIL and multi-thread in Python, or both.
From: sturlamolden on 9 Jul 2010 01:25 On 9 Jul, 06:44, Stefan Behnel <stefan...(a)behnel.de> wrote: > WRT a single machine, you should note that GPGPUs are a lot faster these > days than even multi-core CPUs. And Python has pretty good support for > GPUs, too. With OpenCL, Python is better than C for heavy computing. The Python or C/C++ program has to supply OpenCL code (structured text) to the OpenCL driver, which does the real work on GPU or CPU. Python is much better than C or C++ at processing text. There will soon be OpenCL drivers for most processors on the market. But OpenCL drivers will not be pre-installed on Windows, as Microsoft has a competing COM-based technology (DirectX Compute, with an atrocious API and syntax).
From: Felix on 9 Jul 2010 09:25
On Jul 9, 1:16 am, sturlamolden <sturlamol...(a)yahoo.no> wrote: > On 9 Jul, 05:39, Felix <schle...(a)cshl.edu> wrote: > > For an outsider it does not look like a solution to the GIL mess or a > > true breakthrough for performance are around the corner (even though > > there seem to be many different attempts at working around these > > problems or helping with parts). Am I wrong? > > Yes you are. > > We don't do CPU intensive work in "pure Python". We use Python to > control C and Fortran libraries. That gives us the opportunity to > multi-thread in C, release the GIL and multi-thread in Python, or > both. Yes, this setup works very well and is (as I said) probably the reason python is so widely used in scientific computing these days. However I find that I can almost never do everything with vector operations, but have to iterate over data structures at some point. And here the combination of CPython slowness and the GIL means either bad performance or having to write this in C (with which cython helps fortunately). If it were possible to write simple, parallel, reasonably fast loops in (some subset of) python directly that would certainly be a great advantage. Given the performance of other JITs it sounds like it should be possible, but maybe python is too complex to make this realistic. Felix PS: No need to convince me that MATLAB is not the solution. |