Prev: imports again
Next: python as pen and paper substitute
From: Gustavo Narea on 6 Apr 2010 14:11 Hello! Could you please confirm whether my understanding of equality operations in sets and lists is correct? This is how I think things work, partially based on experimentation and the online documentation for Python: When you compare two lists, *every* element of one of the lists is compared against the element at the same position in the other list; that comparison is done by the __eq__() method (or the equivalent for builtin types). This is interrupted when a result is False. When you compare two sets, there's a loop over all the elements of the first set, where the hash for that element is looked up in the second set: - If this hash matches the hash for one or more elements in the second set, the element in the first set is compared (with __eq__ or equivalent) against the elements in the second set which have the same hash. When a result is True, nothing else is done on that element and the loop takes the next element in the first set; when all the results are False, the loop ends and the two sets are not equivalent. - If the hash doesn't match that of an element in the second set, then the loop ends and the two sets are not equivalent. So this means that: 1.- When you have two collections which have the same elements, the equality operation will *always* be faster with lists. 2.- When you have two collections with different elements, the equality operation *may* be faster with sets. For example, if you have two collections of 1,000 elements each and 998 of them are equivalent, comparing both collections as sets will be slower than doing it with lists. But if you have two collections of 1,000 elements each and 998 of them are not equivalent, then comparing both collections as lists will be slower than doing it with sets. The performance of equality operations on sets is directly proportional to the amount of different elements in both sets, while the performance of equality operations on lists is simply proportional to the cardinality of the collection. In other words: The more different elements two collections have, the faster it is to compare them as sets. And as a consequence, the more equivalent elements two collections have, the faster it is to compare them as lists. Is this correct? This is why so many people advocate the use of sets instead of lists/ tuples in similar situations, right? Cheers, - Gustavo.
From: Chris Colbert on 6 Apr 2010 14:28 the proof is in the pudding: In [1]: a = range(10000) In [2]: s = set(a) In [3]: s2 = set(a) In [5]: b = range(10000) In [6]: a == b Out[6]: True In [7]: s == s2 Out[7]: True In [8]: %timeit a == b 1000 loops, best of 3: 204 us per loop In [9]: %timeit s == s2 10000 loops, best of 3: 124 us per loop On Tue, Apr 6, 2010 at 2:11 PM, Gustavo Narea <me(a)gustavonarea.net> wrote: > Hello! > > Could you please confirm whether my understanding of equality > operations in sets and lists is correct? This is how I think things > work, partially based on experimentation and the online documentation > for Python: > > When you compare two lists, *every* element of one of the lists is > compared against the element at the same position in the other list; > that comparison is done by the __eq__() method (or the equivalent for > builtin types). This is interrupted when a result is False. > > When you compare two sets, there's a loop over all the elements of the > first set, where the hash for that element is looked up in the second > set: > - If this hash matches the hash for one or more elements in the > second set, the element in the first set is compared (with __eq__ or > equivalent) against the elements in the second set which have the same > hash. When a result is True, nothing else is done on that element and > the loop takes the next element in the first set; when all the results > are False, the loop ends and the two sets are not equivalent. > - If the hash doesn't match that of an element in the second set, > then the loop ends and the two sets are not equivalent. > > So this means that: > 1.- When you have two collections which have the same elements, the > equality operation will *always* be faster with lists. > 2.- When you have two collections with different elements, the > equality operation *may* be faster with sets. > > For example, if you have two collections of 1,000 elements each and > 998 of them are equivalent, comparing both collections as sets will be > slower than doing it with lists. But if you have two collections of > 1,000 elements each and 998 of them are not equivalent, then comparing > both collections as lists will be slower than doing it with sets. > > The performance of equality operations on sets is directly > proportional to the amount of different elements in both sets, while > the performance of equality operations on lists is simply proportional > to the cardinality of the collection. > > In other words: The more different elements two collections have, the > faster it is to compare them as sets. And as a consequence, the more > equivalent elements two collections have, the faster it is to compare > them as lists. > > Is this correct? > > This is why so many people advocate the use of sets instead of lists/ > tuples in similar situations, right? > > Cheers, > > - Gustavo. > -- > http://mail.python.org/mailman/listinfo/python-list >
From: Gustavo Narea on 6 Apr 2010 17:18 On Apr 6, 7:28 pm, Chris Colbert <sccolb...(a)gmail.com> wrote: > the proof is in the pudding: > > In [1]: a = range(10000) > > In [2]: s = set(a) > > In [3]: s2 = set(a) > > In [5]: b = range(10000) > > In [6]: a == b > Out[6]: True > > In [7]: s == s2 > Out[7]: True > > In [8]: %timeit a == b > 1000 loops, best of 3: 204 us per loop > > In [9]: %timeit s == s2 > 10000 loops, best of 3: 124 us per loop I think you meant to set "s2 = set(b)": ===== In [1]: a = range(10000) In [2]: b = range(10000) In [3]: s1 = set(a) In [4]: s2 = set(a) In [5]: s3 = set(b) In [6]: %timeit a == b 10000 loops, best of 3: 191 us per loop In [7]: %timeit s1 == s2 10000 loops, best of 3: 118 us per loop In [8]: %timeit s1 == s3 1000 loops, best of 3: 325 us per loop ===== Cheers.
From: Chris Colbert on 6 Apr 2010 17:46 :slaps forehead: good catch. On Tue, Apr 6, 2010 at 5:18 PM, Gustavo Narea <me(a)gustavonarea.net> wrote: > On Apr 6, 7:28 pm, Chris Colbert <sccolb...(a)gmail.com> wrote: >> the proof is in the pudding: >> >> In [1]: a = range(10000) >> >> In [2]: s = set(a) >> >> In [3]: s2 = set(a) >> >> In [5]: b = range(10000) >> >> In [6]: a == b >> Out[6]: True >> >> In [7]: s == s2 >> Out[7]: True >> >> In [8]: %timeit a == b >> 1000 loops, best of 3: 204 us per loop >> >> In [9]: %timeit s == s2 >> 10000 loops, best of 3: 124 us per loop > > > I think you meant to set "s2 = set(b)": > ===== > In [1]: a = range(10000) > > In [2]: b = range(10000) > > In [3]: s1 = set(a) > > In [4]: s2 = set(a) > > In [5]: s3 = set(b) > > In [6]: %timeit a == b > 10000 loops, best of 3: 191 us per loop > > In [7]: %timeit s1 == s2 > 10000 loops, best of 3: 118 us per loop > > In [8]: %timeit s1 == s3 > 1000 loops, best of 3: 325 us per loop > ===== > > Cheers. > -- > http://mail.python.org/mailman/listinfo/python-list >
From: Jack Diederich on 6 Apr 2010 23:33
On Tue, Apr 6, 2010 at 2:11 PM, Gustavo Narea <me(a)gustavonarea.net> wrote: > Hello! > > Could you please confirm whether my understanding of equality > operations in sets and lists is correct? This is how I think things > work, partially based on experimentation and the online documentation > for Python: > > When you compare two lists, *every* element of one of the lists is > compared against the element at the same position in the other list; > that comparison is done by the __eq__() method (or the equivalent for > builtin types). This is interrupted when a result is False. > > When you compare two sets, there's a loop over all the elements of the > first set, where the hash for that element is looked up in the second > set: [snip] > In other words: The more different elements two collections have, the > faster it is to compare them as sets. And as a consequence, the more > equivalent elements two collections have, the faster it is to compare > them as lists. > > Is this correct? Yes, but faster isn't the same thing as free. I still get bitten occasionally by code that blows away the difference by including a set-wise assert() in a loop. Also, lists can have mutable members so there are times you really /do/ want to compare lists instead of hashes. -Jack |