Performance of list vs. set equality operations [Python]

Prev: imports again
Next: python as pen and paper substitute

From: Gabriel Genellina on 9 Apr 2010 14:07

En Thu, 08 Apr 2010 21:02:23 -0300, Patrick Maupin <pmaupin(a)gmail.com>
escribi�:
> On Apr 8, 6:35 pm, "Gabriel Genellina" <gagsl-...(a)yahoo.com.ar> wrote:
>
>> The CPython source contains lots of shortcuts like that. Perhaps the
>> checks should be stricter in some cases, but I imagine it's not so easy
>> to fix: lots of code was written in the pre-2.2 era, assuming that
>> internal types were not subclassable.
>
> I don't know if it's a good "fix" anyway. If you subclass an internal
> type, you can certainly supply your own rich comparison methods, which
> would (IMO) put the CPU computation burden where it belongs if you
> decide to do something goofy like subclass a list and then override
> __len__.

We're all consenting adults, that's the Python philosophy, isn't it?
If I decide to make stupid things, it's my fault. I don't see why Python
should have to prevent that.

--
Gabriel Genellina

From: Patrick Maupin on 9 Apr 2010 14:41

On Apr 9, 1:07 pm, "Gabriel Genellina" <gagsl-...(a)yahoo.com.ar> wrote:
> En Thu, 08 Apr 2010 21:02:23 -0300, Patrick Maupin <pmau...(a)gmail.com>
> escribió:
>
> > On Apr 8, 6:35 pm, "Gabriel Genellina" <gagsl-...(a)yahoo.com.ar> wrote:
>
> >> The CPython source contains lots of shortcuts like that. Perhaps the
> >> checks should be stricter in some cases, but I imagine it's not so easy
> >> to fix: lots of code was written in the pre-2.2 era, assuming that
> >> internal types were not subclassable.
>
> > I don't know if it's a good "fix" anyway. If you subclass an internal
> > type, you can certainly supply your own rich comparison methods, which
> > would (IMO) put the CPU computation burden where it belongs if you
> > decide to do something goofy like subclass a list and then override
> > __len__.
>
> We're all consenting adults, that's the Python philosophy, isn't it?
> If I decide to make stupid things, it's my fault. I don't see why Python
> should have to prevent that.
>
> --
> Gabriel Genellina

Exactly. I think we're in violent agreement on this issue ;-)

From: Raymond Hettinger on 10 Apr 2010 01:46

> > I don't know if it's a good "fix" anyway. If you subclass an internal
> > type, you can certainly supply your own rich comparison methods, which
> > would (IMO) put the CPU computation burden where it belongs if you
> > decide to do something goofy like subclass a list and then override
> > __len__.
>
> We're all consenting adults, that's the Python philosophy, isn't it?
> If I decide to make stupid things, it's my fault. I don't see why Python
> should have to prevent that.

Perhaps so for pure python classes, but the C builtins are another
story.

The C containers directly reference underlying structure and methods
for several reasons. The foremost reason is that if their internal
invariants are violated, they can segfault. A list's __getitem__
method needs to know the real length (not what you report in __len__)
if it is to avoid writing objects outside of its allocated memory
range. Another reason is efficiency -- the cost of attribute lookups
is high and would spoil the performance of the builtins if they could
not access their underlying structure and friend methods directly.
It is important to have those perform well because they are used
heavily
in everyday programming.

There are also couple of OOP design considerations. The
http://en.wikipedia.org/wiki/Open/closed_principle is one example.

Encapsulation is another example. If you override __len__
in order to influence the behavior of __eq__, then you're
relying on an implementation detail, not the published interface.
Eventhough the length check is an obvious optimization
for list equality and set equality, there is no guarantee
that other implementations of Python use that same pattern.

my-two-cents-ly yours,

Raymond

From: Stefan Behnel on 10 Apr 2010 08:32

Steven D'Aprano, 08.04.2010 03:41:
> On Wed, 07 Apr 2010 10:55:10 -0700, Raymond Hettinger wrote:
>
>> [Gustavo Nare]
>>> In other words: The more different elements two collections have, the
>>> faster it is to compare them as sets. And as a consequence, the more
>>> equivalent elements two collections have, the faster it is to compare
>>> them as lists.
>>>
>>> Is this correct?
>>
>> If two collections are equal, then comparing them as a set is always
>> slower than comparing them as a list. Both have to call __eq__ for
>> every element, but sets have to search for each element while lists can
>> just iterate over consecutive pointers.
>>
>> If the two collections have unequal sizes, then both ways immediately
>> return unequal.
>
>
> Perhaps I'm misinterpreting what you are saying, but I can't confirm that
> behaviour, at least not for subclasses of list:
>
> >>> class MyList(list):
> ... def __len__(self):
> ... return self.n
> ...
> >>> L1 = MyList(range(10))
> >>> L2 = MyList(range(10))
> >>> L1.n = 9
> >>> L2.n = 10
> >>> L1 == L2
> True
> >>> len(L1) == len(L2)
> False

This code incorrectly assumes that overriding __len__ has an impact on the
equality of two lists. If you want to influence the equality, you need to
override __eq__. If you don't, the original implementation is free to do
whatever it likes to determine if it is equal to another value or not. If
it uses __len__ for that or not is only an implementation detail that can't
be relied upon.

Stefan

From: Terry Reedy on 10 Apr 2010 21:30

On 4/10/2010 8:32 AM, Stefan Behnel wrote:
> Steven D'Aprano, 08.04.2010 03:41:
>> On Wed, 07 Apr 2010 10:55:10 -0700, Raymond Hettinger wrote:

>>> If the two collections have unequal sizes, then both ways immediately
>>> return unequal.
>>
>>
>> Perhaps I'm misinterpreting what you are saying, but I can't confirm that
>> behaviour, at least not for subclasses of list:
>>
>> >>> class MyList(list):
>> ... def __len__(self):
>> ... return self.n
>> ...
>> >>> L1 = MyList(range(10))
>> >>> L2 = MyList(range(10))
>> >>> L1.n = 9
>> >>> L2.n = 10
>> >>> L1 == L2
>> True
>> >>> len(L1) == len(L2)
>> False
>
> This code incorrectly assumes that overriding __len__ has an impact on
> the equality of two lists. If you want to influence the equality, you
> need to override __eq__. If you don't, the original implementation is
> free to do whatever it likes to determine if it is equal to another
> value or not. If it uses __len__ for that or not is only an
> implementation detail that can't be relied upon.

After reading the responses of both you and Raymond, I realized that a)
there is a real difference between 'checking lengths' and 'calling
__len__', which I (and apparently the example) had seen as the same and
b) that the example shows that assuming that they are the same is a
mistake. Thank you both for the clarification.

Terry Jan Reedy

First | Prev |
Pages: 1 2 3 4
Prev: imports again
Next: python as pen and paper substitute