Prev: An empty object with dynamic attributes (expando)
Next: Plain simple unix timestamp with an HTTP GET
From: dmtr on 3 Jun 2010 19:41 On Jun 3, 3:43 pm, "Emin.shopper Martinian.shopper" <emin.shop...(a)gmail.com> wrote: > Dear Experts, > > I am getting a MemoryError when creating a dict in a long running > process and suspect this is due to memory fragmentation. Any > suggestions would be welcome. Full details of the problem are below. > > I have a long running processing which eventually dies to a > MemoryError exception. When it dies, it is using roughly 900 MB on a 4 > GB Windows XP machine running Python 2.5.4. If I do "import pdb; Are you sure you have enough memory available? Dict memory usage can jump x2 during re-balancing. -- Dmitry P.S. Wish there was a google-sparsehash port for python....
From: dmtr on 3 Jun 2010 19:46 > I have a long running processing which eventually dies to a > MemoryError exception. When it dies, it is using roughly 900 MB on a 4 > GB Windows XP machine running Python 2.5.4. If I do "import pdb; BTW have you tried the same code with the Python 2.6.5? -- Dmitry
From: Bryan on 4 Jun 2010 12:06 Emin.shopper wrote: > dmtr wrote: > > I'm still unconvinced that it is a memory fragmentation problem. It's > > very rare. > > You could be right. I'm not an expert on python memory management. But > if it isn't memory fragmentation, then why is it that I can create > lists which use up 600 more MB but if I try to create a dict that uses > a couple more MB it dies? My guess is that python dicts want a > contiguous chunk of memory for their hash table. Is there a reason > that you think memroy fragmentation isn't the problem? Your logic makes some sense. You wrote that you can create a dict with 1300 items, but not 1400 items. If my reading of the Python source is correct, the dict type decides it's overloaded when 2/3 full, and enlarges by powers of two, so the 1366'th item will trigger allocation of an array of 4096 PyDictEntry's. http://svn.python.org/view/python/branches/release25-maint/Objects/dictnotes.txt?view=markup http://svn.python.org/view/python/branches/release25-maint/Objects/dictobject.c?view=markup On the other hand PyDictEntry is 12 bytes (on 32-bit Python), so the memory chunk needed is just 48 K. It doesn't seem plausible that you have have hundreds of megabytes available but can't allocate 48K in one chunk. Plus, unless I'm misreading the code, a Python list also uses one contiguous chunk of memory to store all the item references. I'm looking at PyList_New() and list_resize() in: http://svn.python.org/view/python/branches/release25-maint/Objects/listobject.c?view=markup and the memory allocators in: http://svn.python.org/view/python/branches/release25-maint/Include/pymem.h?view=markup > What else could it be? Unfortunately several things, most of them hard to diagnose. I'd suggest checking easy stuff first. Make sure 'dict' is still <type 'dict'>. If you can test again in the debugger in the error case, see how large a set you can make, as the set implementation is similar to dict except the hash table entries are one pointer shorter at 8 bytes. -- --Bryan Olson
From: Philip Semanchuk on 4 Jun 2010 12:21 On Jun 4, 2010, at 12:06 PM, Bryan wrote: > Emin.shopper wrote: >> dmtr wrote: >>> I'm still unconvinced that it is a memory fragmentation problem. >>> It's >>> very rare. >> >> You could be right. I'm not an expert on python memory management. >> But >> if it isn't memory fragmentation, then why is it that I can create >> lists which use up 600 more MB but if I try to create a dict that >> uses >> a couple more MB it dies? My guess is that python dicts want a >> contiguous chunk of memory for their hash table. Is there a reason >> that you think memroy fragmentation isn't the problem? > > Your logic makes some sense. You wrote that you can create a dict with > 1300 items, but not 1400 items. If my reading of the Python source is > correct, the dict type decides it's overloaded when 2/3 full, and > enlarges by powers of two, so the 1366'th item will trigger allocation > of an array of 4096 PyDictEntry's. At PyCon 2010, Brandon Craig Rhodes presented about how dictionaries work under the hood: http://python.mirocommunity.org/video/1591/pycon-2010-the-mighty-dictiona I found that very informative. There's also some slides if you don't like the video; I haven't looked at 'em myself. http://us.pycon.org/2010/conference/schedule/event/12/ Cheers Philip
From: Bryan on 4 Jun 2010 16:40 Philip Semanchuk wrote: > At PyCon 2010, Brandon Craig Rhodes presented about how dictionaries > work under the hood:http://python.mirocommunity.org/video/1591/pycon-2010-the-mighty-dict... > > I found that very informative. That's a fine presentation of hash tables in general and Python's choices in particular. Also highly informative, while easily readable, is the Objects/dictnotes.txt file in the Python source. Fine as those resources may be, the issue here stands. Most of my own Python issues turn out to be stupid mistakes, and the problem here might be on that level, but Emin seems to have worked his problem and gotten a bunch of stuff right. There is no good reason why constructing a 50 kilobyte dict should fail with a MemoryError while constructing 50 megabyte lists succeeds. -- --Bryan Olson
|
Next
|
Last
Pages: 1 2 Prev: An empty object with dynamic attributes (expando) Next: Plain simple unix timestamp with an HTTP GET |