Duplicate keys in dict? [Python]

Prev: async notification handling w/o threads/polling (similiar tokill -hup)?
Next: importerror: module Gnuplot missing

From: vsoler on 7 Mar 2010 11:23

Hello,

My code snippet reads data from excel ranges. First row and first
column are column headers and row headers respectively. After reding
the range I build a dict.

.................'A'..............'B'
'ab'............3................5
'cd'............7................2
'cd'............9................1
'ac'............7................2

d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...

However, as you can see there are two rows that start with 'cd', and
dicts, AFAIK do not accept duplicates.

What is the best workaround for this? Should I discard dicts? Should I
somehow have under 'cd'... a list of values?

One of the difficulties I find here is that I want to be able to
easily sum all the values for each row key: 'ab', 'cd' and 'ac'.
However, using lists inside dicts makes it a difficult issue for me.

What is the best approach for this problem? Can anybody help?

From: News123 on 7 Mar 2010 11:46

vsoler wrote:
> Hello,
>
> My code snippet reads data from excel ranges. First row and first
> column are column headers and row headers respectively. After reding
> the range I build a dict.
>
> ................'A'..............'B'
> 'ab'............3................5
> 'cd'............7................2
> 'cd'............9................1
> 'ac'............7................2
>
> d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...
>
> However, as you can see there are two rows that start with 'cd', and
> dicts, AFAIK do not accept duplicates.

Normall dicts are used if you want to access your data at a later point
in time by the key name.

Do you want to be able to do this?

Then what would you expect to receive for d[('cd','A')] ?

The first value? the second value? both values?

Could you perhaps change further occurences of 'cd' with 'cd1' , 'cd2' ,
'cd3', ... ?

Not knowing your exact context makes it difficult to suggest solutions?

perhaps you could switch to a list containing a tuple of (rowname,rowdict)

l = [ ('ab', { 'A': 3 , 'B': 5 } ),
'cd', { 'A': 7 , 'B': 2 } ),
'cd', { 'A': 9 , 'B': 1 } ),
'ac', { ... }
]

bye

N

>
> What is the best workaround for this? Should I discard dicts? Should I
> somehow have under 'cd'... a list of values?
>
> One of the difficulties I find here is that I want to be able to
> easily sum all the values for each row key: 'ab', 'cd' and 'ac'.
> However, using lists inside dicts makes it a difficult issue for me.
>
> What is the best approach for this problem? Can anybody help?

From: Steven D'Aprano on 7 Mar 2010 11:53

On Sun, 07 Mar 2010 08:23:13 -0800, vsoler wrote:

> Hello,
>
> My code snippet reads data from excel ranges. First row and first column
> are column headers and row headers respectively. After reding the range
> I build a dict.
>
> ................'A'..............'B'
> 'ab'............3................5
> 'cd'............7................2
> 'cd'............9................1
> 'ac'............7................2
>
> d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...
>
> However, as you can see there are two rows that start with 'cd', and
> dicts, AFAIK do not accept duplicates.

> One of the difficulties I find here is that I want to be able to easily
> sum all the values for each row key: 'ab', 'cd' and 'ac'. However,
> using lists inside dicts makes it a difficult issue for me.

Given the sample above, what answer do you expect for summing the 'cd'
row? There are four reasonable answers:

7 + 2 = 9
9 + 1 = 10
7 + 2 + 9 + 1 = 19
Error

You need to decide what you want to do before asking how to do it.

--
Steven

From: vsoler on 7 Mar 2010 12:13

On 7 mar, 17:53, Steven D'Aprano <st...(a)REMOVE-THIS-
cybersource.com.au> wrote:
> On Sun, 07 Mar 2010 08:23:13 -0800, vsoler wrote:
> > Hello,
>
> > My code snippet reads data from excel ranges. First row and first column
> > are column headers and row headers respectively. After reding the range
> > I build a dict.
>
> > ................'A'..............'B'
> > 'ab'............3................5
> > 'cd'............7................2
> > 'cd'............9................1
> > 'ac'............7................2
>
> > d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...
>
> > However, as you can see there are two rows that start with 'cd', and
> > dicts, AFAIK do not accept duplicates.
> > One of the difficulties I find here is that I want to be able to easily
> > sum all the values for each row key: 'ab', 'cd' and 'ac'. However,
> > using lists inside dicts makes it a difficult issue for me.
>
> Given the sample above, what answer do you expect for summing the 'cd'
> row? There are four reasonable answers:
>
> 7 + 2 = 9
> 9 + 1 = 10
> 7 + 2 + 9 + 1 = 19
> Error
>
> You need to decide what you want to do before asking how to do it.
>
> --
> Steven

Steven,

What I need is that sum(('cd','A')) gives me 16, sum(('cd','B')) gives
me 3.

I apologize for not having made it clear.

From: Tim Chase on 7 Mar 2010 14:11

vsoler wrote:
> On 7 mar, 17:53, Steven D'Aprano <st...(a)REMOVE-THIS-
> cybersource.com.au> wrote:
>> On Sun, 07 Mar 2010 08:23:13 -0800, vsoler wrote:
>>> Hello,
>>> My code snippet reads data from excel ranges. First row and first column
>>> are column headers and row headers respectively. After reding the range
>>> I build a dict.
>>> ................'A'..............'B'
>>> 'ab'............3................5
>>> 'cd'............7................2
>>> 'cd'............9................1
>>> 'ac'............7................2
>>> d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...
>>> However, as you can see there are two rows that start with 'cd', and
>>> dicts, AFAIK do not accept duplicates.
>>> One of the difficulties I find here is that I want to be able to easily
>>> sum all the values for each row key: 'ab', 'cd' and 'ac'. However,
>>> using lists inside dicts makes it a difficult issue for me.
>
> What I need is that sum(('cd','A')) gives me 16, sum(('cd','B')) gives
> me 3.

But you really *do* want lists inside the dict if you want to be
able to call sum() on them. You want to map the tuple ('cd','A')
to the list [7,9] so you can sum the results. And if you plan to
sum the results, it's far easier to have one-element lists and
just sum them, instead of having to special case "if it's a list,
sum it, otherwise, return the value". So I'd use something like

import csv
f = file(INFILE, 'rb')
r = csv.reader(f, ...)
headers = r.next() # discard the headers
d = defaultdict(list)
for (label, a, b) in r:
d[(label, 'a')].append(int(a))
d[(label, 'b')].append(int(b))
# ...
for (label, col), value in d.iteritems():
print label, col, 'sum =', sum(value)

Alternatively, if you don't need to store the intermediate
values, and just want to store the sums, you can accrue them as
you go along:

d = defaultdict(int)
for (label, a, b) in r:
d[(label, 'a')] += int(a)
d[(label, 'b')] += int(b)
# ...
for (label, col), value in d.iteritems():
print label, col, 'sum =', value

Both are untested, but I'm pretty sure they're both viable,
modulo my sleep-deprived eyes.

-tkc

| Next | Last
Pages: 1 2
Prev: async notification handling w/o threads/polling (similiar tokill -hup)?
Next: importerror: module Gnuplot missing