Prev: click me
Next: killing own process in windows
From: Steven D'Aprano on 8 Mar 2010 07:21 On Sun, 07 Mar 2010 22:31:00 -0800, Raymond Hettinger wrote: > On Mar 7, 5:46 pm, Steven D'Aprano <st...(a)REMOVE-THIS- > cybersource.com.au> wrote: >> Given that Counter supports negative counts, it looks to me that the >> behaviour of __add__ and __sub__ is fundamentally flawed. You should >> raise a bug report (feature enhancement) on the bug tracker. > > It isn't a bug. I designed it that way. There were several possible > design choices, each benefitting different use cases. Thanks for the explanation Raymond. A few comments follow: > FWIW, here is the reasoning behind the design. > > The basic approach to Counter() is to be a dict subclass that supplies > zero for missing values. This approach places almost no restrictions > on what can be stored in it. You can store floats, decimals, fractions, > etc. Numbers can be positive, negative, or zero. Another way of using default values in a dict. That's five that I know of: dict.get, dict.setdefault, dict.pop, collections.defaultdict, and collections.Counter. And the Perl people criticise Python for having "only one way to do it" *wink* (That's not meant as a criticism, merely an observation.) [...] > One possible choice (the one preferred by the OP) was to has addition > and subtraction be straight adds and subtracts without respect to sign > and to not support __and__ and __or__. Straight addition was already > supported via the update() method. But no direct support was provided > for straight subtractions that leave negative values. Sorry about that. Would you consider a feature enhancement adding an additional method, analogous to update(), to perform subtractions? I recognise that it's easy to subclass and do it yourself, but there does seem to be some demand for it, and it is an obvious feature given that Counter does support negative counts. > Instead the choice was to implement the four methods as multiset > operations. As such, they need to correspond to regular set operations. Personally, I think the behaviour of + and - would be far less surprising if the class was called Multiset. Intuitively, one would expect counters to be limited to ints, and to support negative counts when adding and subtracting. In hindsight, do you think that Multiset would have been a better name? -- Steven
From: Raymond Hettinger on 8 Mar 2010 14:24 [Steven D'Aprano] > Thanks for the explanation Raymond. A few comments follow: You're welcome :-) > Would you consider a feature enhancement adding an additional method, > analogous to update(), to perform subtractions? I recognise that it's > easy to subclass and do it yourself, but there does seem to be some > demand for it, and it is an obvious feature given that Counter does > support negative counts. Will continue to mull it over. Instinct says that conflating two models can be worse for usability than just picking one of the models and excluding the other. If I had it to do over, there is a reasonable case that elementwise vector methods (__add__, __sub__, and __mul__) may have been a more useful choice than multiset methods (__add__, __sub__, __and__, __or__). That being said, the multiset approach was the one that was chosen. It was indicated for people who have experience with bags or multisets in other languages. It was also consistent with the naming of the class as tool for counting things (i.e. it handles counting numbers right out of the box). No explicit support is provided for negative values, but it isn't actively hindered either. For applications needing elementwise vector operations and signed arithmetic, arguably they should be using a more powerful toolset, perhaps supporting a full-range of elementwise binary and unary operations and a dotproduct() method. Someone should write that class and post it to the ASPN Cookbook to see if there is any uptake. > Personally, I think the behaviour of + and - would be far less surprising > if the class was called Multiset. Intuitively, one would expect counters > to be limited to ints, and to support negative counts when adding and > subtracting. In hindsight, do you think that Multiset would have been a > better name? The primary use case for Counter() is to count things (using the counting numbers). The term Multiset is more obscure and only applies to the four operations that eliminate non-positive results. So, I'm somewhat happy with the current name. FWIW, the notion of "what is surprising" often depends on the observer's background and on the problem they are currently trying to solve. If you need negative counts, then Counter.__sub__() is surprising. If your app has no notion of a negative count, then it isn't. The docs, examples, and docstrings are very clear about the behavior, so the "surprise" is really about wanting it to do something other than what it currently does ;-) Raymond
From: Raymond Hettinger on 8 Mar 2010 16:44 [Vlastimil Brom] > Thank you very much for the exhaustive explanation Raymond! You're welcome. > I am by far not able to follow all of the mathematical background, but > even for zero-truncating multiset, I would expect the truncation on > input rather than on output of some operations. I debated about this and opted for be-loose-in-receiving-and-strict-on- output. One thought is that use cases for multisets would have real multisets as inputs (no negative counts) and as outputs. The user controls the inputs, and the method only has a say in what its outputs are. Also, truncating input would complicate the mathematical definition of what is happening. Compare: r = a[x] - b[x] if r > 0: emit(r) vs. r = max(0, a[x]) - max(0, b[x]) if r > 0: emit(r) Also, the design parallels what is done in the decimal module where rounding is applied only to the results of operations, not to the inputs. > Probably a kind of negative_update() or some better named method will > be handy, like the one you supplied or simply the current module code > without the newcount > 0: ... condition. See my other post on this subject. There is no doubt that such a method would be handy for signed arithmetic. The question is whether conflating two different models hurts the API more than it helps. Right now, the Counter() class has no explicit support for negative values. It is designed around natural numbers and counting numbers. > Or would it be an option to > have a keyword argument like zero_truncate=False which would influence > this behaviour? Guido's thoughts on behavior flags is that they are usually a signal that you need two different classes. That is why itertools has ifilter() and ifilterfalse() or izip() and izip_longest() instead of having behavior flags. In this case, we have an indication that what you really want is a separate class supporting elementwise binary and unary operations on vectors (where the vector fields are accessed by a dictionary key instead of a positional value). > Additionally, were issubset and issuperset considered for this > interface (not sure whether symmetric_difference would be applicable)? If the need arises, these could be included. Right now, you can get the same result with: "if a - b: ..." FWIW, I never liked those two method names. Can't remember whether a.issubset(b) means "a is a subset of b" or "b issubset of a'. Raymond
From: Gregory Ewing on 8 Mar 2010 17:22 Raymond Hettinger wrote: > Instead the choice was to implement the four methods as > multiset operations. As such, they need to correspond > to regular set operations. Seems to me you're trying to make one data type do the work of two, and ending up with something inconsistent. I think you should be providing two types: one is a multiset, which disallows negative counts altogether; the other behaves like a sparse vector with appropriate arithmetic operations. -- Greg
From: Vlastimil Brom on 8 Mar 2010 17:24
2010/3/8 Raymond Hettinger <python(a)rcn.com>: .... [snip detailed explanations] >... > In this case, we have an indication that what you really want is > a separate class supporting elementwise binary and unary operations > on vectors (where the vector fields are accessed by a dictionary > key instead of a positional value). > > >> Additionally, were issubset and issuperset considered for this >> interface (not sure whether symmetric_difference would be applicable)? > > If the need arises, these could be included. Right now, you > can get the same result with: "if a - b: ..." > > FWIW, I never liked those two method names. Can't remember whether > a.issubset(b) means "a is a subset of b" or "b issubset of a'. > > > Raymond > -- > Thanks for the further remarks Raymond, initially I thought while investigating new features of python 3, this would be a case for replacing the "home made" solutions with the standard module functionality. Now I can see, it probably wouldn't be an appropriate decision in this case, as the expected usage of Counter with its native methods is different. As for the issubset, issuperset method names, I am glad, a far more skilled person has the same problem like me :-) In this case the operators appear to be clearer than the method names... regards, vbr |