Prev: process cannot access the file because it is being used by otherprocess
Next: Decode II (more complex)
From: dirknbr on 21 Jun 2010 08:35 Hi I have 2 files (done and outf), and I want to chose unique elements from the 2nd column in outf which are not in done. This code works but is not efficient, can you think of a quicker way? The a=1 is just a redundant task obviously, I put it this way around because I think 'in' is quicker than 'not in' - is that true? done_={} for line in done: done_[line.strip()]=0 print len(done_) universe={} for line in outf: if line.split(',')[1].strip() in universe.keys(): a=1 else: if line.split(',')[1].strip() in done_.keys(): a=1 else: universe[line.split(',')[1].strip()]=0 Dirk
From: Thomas Lehmann on 21 Jun 2010 09:12 > universe={} > for line in outf: > if line.split(',')[1].strip() in universe.keys(): > a=1 > else: > if line.split(',')[1].strip() in done_.keys(): > a=1 > else: > universe[line.split(',')[1].strip()]=0 > I can not say too much because I don't see what is processed but what I can say is: "line.split(',')[1].strip()" might be called three times so I would do it once only. And I would write it like this: for line in outf: key = line.split(',')[1].strip() if not (key in universe.keys()): if not (key in done_.keys()): universe[key] = 0
From: python on 21 Jun 2010 09:27 Use a set instead of a dictionary for done keys? Malcolm
From: Peter Otten on 21 Jun 2010 09:27 dirknbr wrote: > Hi > > I have 2 files (done and outf), and I want to chose unique elements > from the 2nd column in outf which are not in done. This code works but > is not efficient, can you think of a quicker way? The a=1 is just a > redundant task obviously, I put it this way around because I think > 'in' is quicker than 'not in' - is that true? > > done_={} > for line in done: > done_[line.strip()]=0 > > print len(done_) > > universe={} > for line in outf: > if line.split(',')[1].strip() in universe.keys(): > a=1 > else: > if line.split(',')[1].strip() in done_.keys(): > a=1 > else: > universe[line.split(',')[1].strip()]=0 Instead of if key in some_dict.keys(): #... which converts the keys in the dictionary to a list and then performs an O(N) lookup on that list you should use if key in some_dict: #... which doesn't build a list and looks up the key in constant time. Peter
From: Dave Angel on 21 Jun 2010 09:28 dirknbr wrote: > Hi > > I have 2 files (done and outf), and I want to chose unique elements > from the 2nd column in outf which are not in done. This code works but > is not efficient, can you think of a quicker way? The a=1 is just a > redundant task obviously, I put it this way around because I think > 'in' is quicker than 'not in' - is that true? > > done_={} > for line in done: > done_[line.strip()]=0 > > print len(done_) > > universe={} > for line in outf: > if line.split(',')[1].strip() in universe.keys(): > a=1 > else: > if line.split(',')[1].strip() in done_.keys(): > a=1 > else: > universe[line.split(',')[1].strip()]=0 > > Dirk > > Where you have a=1, one would normally use the "pass" statement. But you're wrong that 'not in' is less efficient than 'in'. If there's a difference, it's probably negligible, and almost certainly less than the extra else clause you're forcing here. When doing an 'in', do *not* use the keys() method, as you're replacing a fast lookup with a slow one, not to mention the time it takes to build the keys() list each time. In both these cases, you can use a set, rather than a dict. And there's no need to test whether the item is already in the set, just put it in again. Changing all that, you'll wind up with something like (untested) done_set = set() universe = set() for line in done: done_set.add(line.strip()) for line in outf: item = line.split(',')[1].strip() if item not in done_set universe.add(item) DaveA
|
Next
|
Last
Pages: 1 2 Prev: process cannot access the file because it is being used by otherprocess Next: Decode II (more complex) |