negative "counts" in collections.Counter? [Python]

Prev: click me
Next: killing own process in windows

From: Steven D'Aprano on 8 Mar 2010 07:21

On Sun, 07 Mar 2010 22:31:00 -0800, Raymond Hettinger wrote:

> On Mar 7, 5:46 pm, Steven D'Aprano <st...(a)REMOVE-THIS-
> cybersource.com.au> wrote:
>> Given that Counter supports negative counts, it looks to me that the
>> behaviour of __add__ and __sub__ is fundamentally flawed. You should
>> raise a bug report (feature enhancement) on the bug tracker.
>
> It isn't a bug. I designed it that way. There were several possible
> design choices, each benefitting different use cases.

Thanks for the explanation Raymond. A few comments follow:

> FWIW, here is the reasoning behind the design.
>
> The basic approach to Counter() is to be a dict subclass that supplies
> zero for missing values. This approach places almost no restrictions
> on what can be stored in it. You can store floats, decimals, fractions,
> etc. Numbers can be positive, negative, or zero.

Another way of using default values in a dict. That's five that I know
of: dict.get, dict.setdefault, dict.pop, collections.defaultdict, and
collections.Counter. And the Perl people criticise Python for having
"only one way to do it" *wink*

(That's not meant as a criticism, merely an observation.)

[...]
> One possible choice (the one preferred by the OP) was to has addition
> and subtraction be straight adds and subtracts without respect to sign
> and to not support __and__ and __or__. Straight addition was already
> supported via the update() method. But no direct support was provided
> for straight subtractions that leave negative values. Sorry about that.

Would you consider a feature enhancement adding an additional method,
analogous to update(), to perform subtractions? I recognise that it's
easy to subclass and do it yourself, but there does seem to be some
demand for it, and it is an obvious feature given that Counter does
support negative counts.

> Instead the choice was to implement the four methods as multiset
> operations. As such, they need to correspond to regular set operations.

Personally, I think the behaviour of + and - would be far less surprising
if the class was called Multiset. Intuitively, one would expect counters
to be limited to ints, and to support negative counts when adding and
subtracting. In hindsight, do you think that Multiset would have been a
better name?

--
Steven

From: Raymond Hettinger on 8 Mar 2010 14:24

[Steven D'Aprano]
> Thanks for the explanation Raymond. A few comments follow:

You're welcome :-)

> Would you consider a feature enhancement adding an additional method,
> analogous to update(), to perform subtractions? I recognise that it's
> easy to subclass and do it yourself, but there does seem to be some
> demand for it, and it is an obvious feature given that Counter does
> support negative counts.

Will continue to mull it over.

Instinct says that conflating two models can be worse for usability
than just picking one of the models and excluding the other.

If I had it to do over, there is a reasonable case that elementwise
vector methods (__add__, __sub__, and __mul__) may have been a more
useful choice than multiset methods (__add__, __sub__, __and__,
__or__).

That being said, the multiset approach was the one that was chosen.
It was indicated for people who have experience with bags or
multisets in other languages. It was also consistent with the naming
of the class as tool for counting things (i.e. it handles counting
numbers right out of the box). No explicit support is provided
for negative values, but it isn't actively hindered either.

For applications needing elementwise vector operations and signed
arithmetic, arguably they should be using a more powerful toolset,
perhaps supporting a full-range of elementwise binary and unary
operations and a dotproduct() method. Someone should write that
class and post it to the ASPN Cookbook to see if there is any uptake.

> Personally, I think the behaviour of + and - would be far less surprising
> if the class was called Multiset. Intuitively, one would expect counters
> to be limited to ints, and to support negative counts when adding and
> subtracting. In hindsight, do you think that Multiset would have been a
> better name?

The primary use case for Counter() is to count things (using the
counting numbers).
The term Multiset is more obscure and only applies
to the four operations that eliminate non-positive results.
So, I'm somewhat happy with the current name.

FWIW, the notion of "what is surprising" often depends on the
observer's
background and on the problem they are currently trying to solve.
If you need negative counts, then Counter.__sub__() is surprising.
If your app has no notion of a negative count, then it isn't.
The docs, examples, and docstrings are very clear about the behavior,
so the "surprise" is really about wanting it to do something other
than what it currently does ;-)

Raymond

From: Raymond Hettinger on 8 Mar 2010 16:44

[Vlastimil Brom]
> Thank you very much for the exhaustive explanation Raymond!

You're welcome.

> I am by far not able to follow all of the mathematical background, but
> even for zero-truncating multiset, I would expect the truncation on
> input rather than on output of some operations.

I debated about this and opted for be-loose-in-receiving-and-strict-on-
output.
One thought is that use cases for multisets would have real multisets
as inputs (no negative counts) and as outputs. The user controls
the inputs, and the method only has a say in what its outputs are.

Also, truncating input would complicate the mathematical definition
of
what is happening. Compare:

r = a[x] - b[x]
if r > 0:
emit(r)

vs.

r = max(0, a[x]) - max(0, b[x])
if r > 0:
emit(r)

Also, the design parallels what is done in the decimal module
where rounding is applied only to the results of operations,
not to the inputs.

> Probably a kind of negative_update() or some better named method will
> be handy, like the one you supplied or simply the current module code
> without the newcount > 0: ... condition.

See my other post on this subject. There is no doubt that
such a method would be handy for signed arithmetic.
The question is whether conflating two different models hurts
the API more than it helps. Right now, the Counter() class
has no explicit support for negative values. It is
designed around natural numbers and counting numbers.

> Or would it be an option to
> have a keyword argument like zero_truncate=False which would influence
> this behaviour?

Guido's thoughts on behavior flags is that they are usually a signal
that you need two different classes. That is why itertools has
ifilter() and ifilterfalse() or izip() and izip_longest() instead
of having behavior flags.

In this case, we have an indication that what you really want is
a separate class supporting elementwise binary and unary operations
on vectors (where the vector fields are accessed by a dictionary
key instead of a positional value).

> Additionally, were issubset and issuperset considered for this
> interface (not sure whether symmetric_difference would be applicable)?

If the need arises, these could be included. Right now, you
can get the same result with: "if a - b: ..."

FWIW, I never liked those two method names. Can't remember whether
a.issubset(b) means "a is a subset of b" or "b issubset of a'.

Raymond

From: Gregory Ewing on 8 Mar 2010 17:22

Raymond Hettinger wrote:

> Instead the choice was to implement the four methods as
> multiset operations. As such, they need to correspond
> to regular set operations.

Seems to me you're trying to make one data type do the
work of two, and ending up with something inconsistent.

I think you should be providing two types: one is a
multiset, which disallows negative counts altogether;
the other behaves like a sparse vector with appropriate
arithmetic operations.

--
Greg

From: Vlastimil Brom on 8 Mar 2010 17:24

2010/3/8 Raymond Hettinger <python(a)rcn.com>:
....
[snip detailed explanations]
>...
> In this case, we have an indication that what you really want is
> a separate class supporting elementwise binary and unary operations
> on vectors (where the vector fields are accessed by a dictionary
> key instead of a positional value).
>
>
>> Additionally, were issubset and issuperset considered for this
>> interface (not sure whether symmetric_difference would be applicable)?
>
> If the need arises, these could be included. Right now, you
> can get the same result with: "if a - b: ..."
>
> FWIW, I never liked those two method names. Can't remember whether
> a.issubset(b) means "a is a subset of b" or "b issubset of a'.
>
>
> Raymond
> --
>
Thanks for the further remarks Raymond,
initially I thought while investigating new features of python 3, this
would be a case for replacing the "home made" solutions with the
standard module functionality.
Now I can see, it probably wouldn't be an appropriate decision in this
case, as the expected usage of Counter with its native methods is
different.

As for the issubset, issuperset method names, I am glad, a far more
skilled person has the same problem like me :-) In this case the
operators appear to be clearer than the method names...

regards,
vbr

First | Prev | Next | Last
Pages: 1 2 3
Prev: click me
Next: killing own process in windows