Prev: Best practice way to open files in Python 2.6+?
Next: Windows: How to detect whether a Python app/script is runningin console/GUI mode?
From: harijay on 27 Jul 2010 11:22 I want to quickly bzip2 compress several hundred gigabytes of data using my 8 core , 16 GB ram workstation. Currently I am using a simple python script to compress a whole directory tree using bzip2 and a system call coupled to an os.walk call. I see that the bzip2 only uses a single cpu while the other cpus remain relatively idle. I am a newbie in queue and threaded processes . But I am wondering how I can implement this such that I can have four bzip2 running threads (actually I guess os.system threads ), each using probably their own cpu , that deplete files from a queue as they bzip them. Thanks for your suggestions in advance hari My single thread script is pasted here . import os import sys for roots, dirlist , filelist in os.walk(os.curdir): for file in [os.path.join(roots,filegot) for filegot in filelist]: if "bz2" not in file: print "Compressing %s" % (file) os.system("bzip2 %s" % file) print ":DONE"
From: MRAB on 27 Jul 2010 13:26 harijay wrote: > I want to quickly bzip2 compress several hundred gigabytes of data > using my 8 core , 16 GB ram workstation. > Currently I am using a simple python script to compress a whole > directory tree using bzip2 and a system call coupled to an os.walk > call. > > I see that the bzip2 only uses a single cpu while the other cpus > remain relatively idle. > > I am a newbie in queue and threaded processes . But I am wondering how > I can implement this such that I can have four bzip2 running threads > (actually I guess os.system threads ), each using probably their own > cpu , that deplete files from a queue as they bzip them. > > > Thanks for your suggestions in advance > [snip] Try this: import os import sys from threading import Thread, Lock from Queue import Queue def report(message): mutex.acquire() print message sys.stdout.flush() mutex.release() class Compressor(Thread): def __init__(self, in_queue, out_queue): Thread.__init__(self) self.in_queue = in_queue self.out_queue = out_queue def run(self): while True: path = self.in_queue.get() sys.stdout.flush() if path is None: break report("Compressing %s" % path) os.system("bzip2 %s" % path) report("Done %s" % path) self.out_queue.put(path) in_queue = Queue() out_queue = Queue() mutex = Lock() THREAD_COUNT = 4 worker_list = [] for i in range(THREAD_COUNT): worker = Compressor(in_queue, out_queue) worker.start() worker_list.append(worker) for roots, dirlist, filelist in os.walk(os.curdir): for file in [os.path.join(roots, filegot) for filegot in filelist]: if "bz2" not in file: in_queue.put(file) for i in range(THREAD_COUNT): in_queue.put(None) for worker in worker_list: worker.join()
From: harijay on 27 Jul 2010 14:10
Thanks a tonne..That code works perfectly and also shows me how to think of using queue and threads in my python programs Hari On Jul 27, 1:26 pm, MRAB <pyt...(a)mrabarnett.plus.com> wrote: > harijay wrote: > > I want to quickly bzip2 compress several hundred gigabytes of data > > using my 8 core , 16 GB ram workstation. > > Currently I am using a simple python script to compress a whole > > directory tree using bzip2 and a system call coupled to an os.walk > > call. > > > I see that the bzip2 only uses a single cpu while the other cpus > > remain relatively idle. > > > I am a newbie in queue and threaded processes . But I am wondering how > > I can implement this such that I can have four bzip2 running threads > > (actually I guess os.system threads ), each using probably their own > > cpu , that deplete files from a queue as they bzip them. > > > Thanks for your suggestions in advance > > [snip] > Try this: > > import os > import sys > from threading import Thread, Lock > from Queue import Queue > > def report(message): > mutex.acquire() > print message > sys.stdout.flush() > mutex.release() > > class Compressor(Thread): > def __init__(self, in_queue, out_queue): > Thread.__init__(self) > self.in_queue = in_queue > self.out_queue = out_queue > def run(self): > while True: > path = self.in_queue.get() > sys.stdout.flush() > if path is None: > break > report("Compressing %s" % path) > os.system("bzip2 %s" % path) > report("Done %s" % path) > self.out_queue.put(path) > > in_queue = Queue() > out_queue = Queue() > mutex = Lock() > > THREAD_COUNT = 4 > > worker_list = [] > for i in range(THREAD_COUNT): > worker = Compressor(in_queue, out_queue) > worker.start() > worker_list.append(worker) > > for roots, dirlist, filelist in os.walk(os.curdir): > for file in [os.path.join(roots, filegot) for filegot in filelist]: > if "bz2" not in file: > in_queue.put(file) > > for i in range(THREAD_COUNT): > in_queue.put(None) > > for worker in worker_list: > worker.join() |