Prev: How to match patterns like XX YY XX YY? (regex)
Next: How to grab a number from inside a .html file using regex
From: GZ on 7 Aug 2010 12:26 Hi All, I need to store a large number of large objects to file and then access them sequentially. I am talking about a few thousands of objects and each with size of a few hundred kilobytes, and total file size a few gigabytes. I tried shelve, but it is not good at sequentially accessing the data. In essence, shelve.keys() takes forever. I am wondering if there is a module that can persist a stream of objects without having to load everything into memory. (For this reason, I think Pickle is out, too, because it needs everything to be in memory.) Thanks, GZ
From: Alex Willmer on 7 Aug 2010 19:54 On Aug 7, 5:26 pm, GZ <zyzhu2...(a)gmail.com> wrote: > I am wondering if there is a module that can persist a stream of > objects without having to load everything into memory. (For this > reason, I think Pickle is out, too, because it needs everything to be > in memory.) From the pickle docs it looks like you could do something like: try: import cPickle as pickle except ImportError import pickle file_obj = open('whatever', 'wb') p = pickle.Pickler(file_obj) for x in stream_of_objects: p.dump(x) p.memo.clear() del p file_obj.close() then later file_obj = open('whatever', 'rb') p = pickle.Unpickler(file_obj) while True: try: x = p.load() do_something_with(x) except EOFError: break Your loading loop could be wrapped in a generator function, so only one object should be held in memory at once.
From: GZ on 9 Aug 2010 15:39
Hi Alex, On Aug 7, 6:54 pm, Alex Willmer <a...(a)moreati.org.uk> wrote: > On Aug 7, 5:26 pm, GZ <zyzhu2...(a)gmail.com> wrote: > > > I am wondering if there is a module that can persist a stream of > > objects without having to load everything into memory. (For this > > reason, I think Pickle is out, too, because it needs everything to be > > in memory.) > > From the pickle docs it looks like you could do something like: > > try: > import cPickle as pickle > except ImportError > import pickle > > file_obj = open('whatever', 'wb') > p = pickle.Pickler(file_obj) > > for x in stream_of_objects: > p.dump(x) > p.memo.clear() > > del p > file_obj.close() > > then later > > file_obj = open('whatever', 'rb') > p = pickle.Unpickler(file_obj) > > while True: > try: > x = p.load() > do_something_with(x) > except EOFError: > break > > Your loading loop could be wrapped in a generator function, so only > one object should be held in memory at once. This totally works! Thanks! |