Prev: async notification handling w/o threads/polling (similiar tokill -hup)?
Next: importerror: module Gnuplot missing
From: vsoler on 7 Mar 2010 11:23 Hello, My code snippet reads data from excel ranges. First row and first column are column headers and row headers respectively. After reding the range I build a dict. .................'A'..............'B' 'ab'............3................5 'cd'............7................2 'cd'............9................1 'ac'............7................2 d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ... However, as you can see there are two rows that start with 'cd', and dicts, AFAIK do not accept duplicates. What is the best workaround for this? Should I discard dicts? Should I somehow have under 'cd'... a list of values? One of the difficulties I find here is that I want to be able to easily sum all the values for each row key: 'ab', 'cd' and 'ac'. However, using lists inside dicts makes it a difficult issue for me. What is the best approach for this problem? Can anybody help?
From: News123 on 7 Mar 2010 11:46 vsoler wrote: > Hello, > > My code snippet reads data from excel ranges. First row and first > column are column headers and row headers respectively. After reding > the range I build a dict. > > ................'A'..............'B' > 'ab'............3................5 > 'cd'............7................2 > 'cd'............9................1 > 'ac'............7................2 > > d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ... > > However, as you can see there are two rows that start with 'cd', and > dicts, AFAIK do not accept duplicates. Normall dicts are used if you want to access your data at a later point in time by the key name. Do you want to be able to do this? Then what would you expect to receive for d[('cd','A')] ? The first value? the second value? both values? Could you perhaps change further occurences of 'cd' with 'cd1' , 'cd2' , 'cd3', ... ? Not knowing your exact context makes it difficult to suggest solutions? perhaps you could switch to a list containing a tuple of (rowname,rowdict) l = [ ('ab', { 'A': 3 , 'B': 5 } ), 'cd', { 'A': 7 , 'B': 2 } ), 'cd', { 'A': 9 , 'B': 1 } ), 'ac', { ... } ] bye N > > What is the best workaround for this? Should I discard dicts? Should I > somehow have under 'cd'... a list of values? > > One of the difficulties I find here is that I want to be able to > easily sum all the values for each row key: 'ab', 'cd' and 'ac'. > However, using lists inside dicts makes it a difficult issue for me. > > What is the best approach for this problem? Can anybody help?
From: Steven D'Aprano on 7 Mar 2010 11:53 On Sun, 07 Mar 2010 08:23:13 -0800, vsoler wrote: > Hello, > > My code snippet reads data from excel ranges. First row and first column > are column headers and row headers respectively. After reding the range > I build a dict. > > ................'A'..............'B' > 'ab'............3................5 > 'cd'............7................2 > 'cd'............9................1 > 'ac'............7................2 > > d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ... > > However, as you can see there are two rows that start with 'cd', and > dicts, AFAIK do not accept duplicates. > One of the difficulties I find here is that I want to be able to easily > sum all the values for each row key: 'ab', 'cd' and 'ac'. However, > using lists inside dicts makes it a difficult issue for me. Given the sample above, what answer do you expect for summing the 'cd' row? There are four reasonable answers: 7 + 2 = 9 9 + 1 = 10 7 + 2 + 9 + 1 = 19 Error You need to decide what you want to do before asking how to do it. -- Steven
From: vsoler on 7 Mar 2010 12:13 On 7 mar, 17:53, Steven D'Aprano <st...(a)REMOVE-THIS- cybersource.com.au> wrote: > On Sun, 07 Mar 2010 08:23:13 -0800, vsoler wrote: > > Hello, > > > My code snippet reads data from excel ranges. First row and first column > > are column headers and row headers respectively. After reding the range > > I build a dict. > > > ................'A'..............'B' > > 'ab'............3................5 > > 'cd'............7................2 > > 'cd'............9................1 > > 'ac'............7................2 > > > d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ... > > > However, as you can see there are two rows that start with 'cd', and > > dicts, AFAIK do not accept duplicates. > > One of the difficulties I find here is that I want to be able to easily > > sum all the values for each row key: 'ab', 'cd' and 'ac'. However, > > using lists inside dicts makes it a difficult issue for me. > > Given the sample above, what answer do you expect for summing the 'cd' > row? There are four reasonable answers: > > 7 + 2 = 9 > 9 + 1 = 10 > 7 + 2 + 9 + 1 = 19 > Error > > You need to decide what you want to do before asking how to do it. > > -- > Steven Steven, What I need is that sum(('cd','A')) gives me 16, sum(('cd','B')) gives me 3. I apologize for not having made it clear.
From: Tim Chase on 7 Mar 2010 14:11
vsoler wrote: > On 7 mar, 17:53, Steven D'Aprano <st...(a)REMOVE-THIS- > cybersource.com.au> wrote: >> On Sun, 07 Mar 2010 08:23:13 -0800, vsoler wrote: >>> Hello, >>> My code snippet reads data from excel ranges. First row and first column >>> are column headers and row headers respectively. After reding the range >>> I build a dict. >>> ................'A'..............'B' >>> 'ab'............3................5 >>> 'cd'............7................2 >>> 'cd'............9................1 >>> 'ac'............7................2 >>> d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ... >>> However, as you can see there are two rows that start with 'cd', and >>> dicts, AFAIK do not accept duplicates. >>> One of the difficulties I find here is that I want to be able to easily >>> sum all the values for each row key: 'ab', 'cd' and 'ac'. However, >>> using lists inside dicts makes it a difficult issue for me. > > What I need is that sum(('cd','A')) gives me 16, sum(('cd','B')) gives > me 3. But you really *do* want lists inside the dict if you want to be able to call sum() on them. You want to map the tuple ('cd','A') to the list [7,9] so you can sum the results. And if you plan to sum the results, it's far easier to have one-element lists and just sum them, instead of having to special case "if it's a list, sum it, otherwise, return the value". So I'd use something like import csv f = file(INFILE, 'rb') r = csv.reader(f, ...) headers = r.next() # discard the headers d = defaultdict(list) for (label, a, b) in r: d[(label, 'a')].append(int(a)) d[(label, 'b')].append(int(b)) # ... for (label, col), value in d.iteritems(): print label, col, 'sum =', sum(value) Alternatively, if you don't need to store the intermediate values, and just want to store the sums, you can accrue them as you go along: d = defaultdict(int) for (label, a, b) in r: d[(label, 'a')] += int(a) d[(label, 'b')] += int(b) # ... for (label, col), value in d.iteritems(): print label, col, 'sum =', value Both are untested, but I'm pretty sure they're both viable, modulo my sleep-deprived eyes. -tkc |