Prev: instructor solution manual for Fundamentals of Fluid Mechanics, 4E (Bruce R. Munson, Donald F. Young, Theodore H.)
Next: Fascinating interview by Richard Stallman at KTH on emacs history and internals
From: Inquisitive Scientist on 16 Jul 2010 08:45 I am having problems with running copy.deepcopy on very large data structures containing lots of numeric data: 1. copy.deepcopy can be very slow 2. copy.deepcopy can cause memory errors even when I have plenty of memory I think the problem is that the current implementation keeps a memo for everything it copies even immutable types. In addition to being slow, this makes the memo dict grow very large when there is lots of simple numeric data to be copied. For long running programs, large memo dicts seem to cause memory fragmentation and result in memory errors. It seems like this could be easily fixed by adding the following lines at the very start of the deepcopy function: if isinstance(x, (type(None), int, long, float, bool, str)): return x This seems perfectly safe, should speed things up, keep the memo dict smaller, and be easy to add. Can someone add this to copy.py or point me to the proper procedure for requesting this change in copy.py? Thanks, -I.S.
From: Stefan Behnel on 16 Jul 2010 09:02 Inquisitive Scientist, 16.07.2010 14:45: > I am having problems with running copy.deepcopy on very large data > structures containing lots of numeric data: > > 1. copy.deepcopy can be very slow > 2. copy.deepcopy can cause memory errors even when I have plenty of > memory > > I think the problem is that the current implementation keeps a memo > for everything it copies even immutable types. In addition to being > slow, this makes the memo dict grow very large when there is lots of > simple numeric data to be copied. For long running programs, large > memo dicts seem to cause memory fragmentation and result in memory > errors. > > It seems like this could be easily fixed by adding the following lines > at the very start of the deepcopy function: > > if isinstance(x, (type(None), int, long, float, bool, str)): > return x > > This seems perfectly safe, should speed things up, keep the memo dict > smaller, and be easy to add. and - have you tried it? Stefan
From: Steven D'Aprano on 16 Jul 2010 09:59 On Fri, 16 Jul 2010 05:45:50 -0700, Inquisitive Scientist wrote: > I am having problems with running copy.deepcopy on very large data > structures containing lots of numeric data: [...] > This seems perfectly safe, should speed things up, keep the memo dict > smaller, and be easy to add. Can someone add this to copy.py or point me > to the proper procedure for requesting this change in copy.py? These are the minimum steps you can take: (1) Go to the Python bug tracker: http://bugs.python.org/ (2) If you don't already have one, create an account. (3) Create a new bug report, explaining why you think deepcopy is buggy, the nature of the bug, and your suggested fix. If you do so, it might be a good idea to post a link to the bug here, for interested people to follow up. However doing the minimum isn't likely to be very useful. Python is maintained by volunteers, and there are more bugs than person-hours available to fix them. Consequently, unless a bug is serious, high- profile, or affects a developer personally, it is likely to be ignored. Sometimes for years. Sad but true. You can improve the odds of having the bug (assuming you are right that it is a bug) fixed by doing more than the minimum. The more of these you can do, the better the chances: (4) Create a test that fails with the current code, following the examples in the standard library tests. Confirm that it fails with the existing module. (5) Patch the copy module to fix the bug. Confirm that the new test passes with your patch, and that you don't cause any regressions (failed tests). (6) Create a patch file that adds the new test and the patch. Upload it to the bug tracker. There's no point in writing the patch for Python 2.5 or 3.0, don't waste your time. Version 2.6 *might* be accepted. 2.7 and/or 3.1 should be, provided people agree that it is a bug. If you do all these things -- demonstrate successfully that this is a genuine bug, create a test for it, and fix the bug without breaking anything else, then you have a good chance of having the fix accepted. Good luck! Your first patch is always the hardest. -- Steven
From: Mark Lawrence on 16 Jul 2010 12:23
On 16/07/2010 14:59, Steven D'Aprano wrote: [snip] > However doing the minimum isn't likely to be very useful. Python is > maintained by volunteers, and there are more bugs than person-hours > available to fix them. Consequently, unless a bug is serious, high- > profile, or affects a developer personally, it is likely to be ignored. > Sometimes for years. Sad but true. > To give people an idea, here's the weekly Summary of Python tracker Issues on python-dev and timed at 17:07 today. " 2807 open (+44) / 18285 closed (+18) / 21092 total (+62) Open issues with patches: 1144 Average duration of open issues: 703 days. Median duration of open issues: 497 days. Open Issues Breakdown open 2765 (+42) languishing 14 ( +0) pending 27 ( +2) Issues Created Or Reopened (64) " I've spent a lot of time helping out in the last few weeks on the issue tracker. The oldest open issue I've come across was dated 2001, and there could be older. Unless more volunteers come forward, particularly to do patch reviews or similar, the situation as I see it can only get worse. Kindest regards. Mark Lawrence. |