iterator invalidation guarantees in tr1::unordered

Prev: conversion operator ambiguity
Next: GCC (G++) "#if x1 == x2" bug

From: ldh on 9 Jul 2010 23:07

Hello-

Can anyone please clarify for me, does tr1::unordered_map guarantee
that iterators remain valid always (except iterators to items that
have been removed?). This was definitely the case with the old SGI
implementation. I am trying to transition from gcc's
__gnu_cxx::hash_map to std::tr1::unordered_map in gcc 4.5, and I am
getting crashes that seem like it's not safe to store an iterator to
objects and use it later (after many other insertions and erasures of
other objects) to delete that object. Before I look deeper into the
gcc implementation I just wanted to make sure it's the case, that
according to the standard the iterators should remain valid. Thanks!

-Lewis

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Pedro Lamarão on 11 Jul 2010 04:47

On 10 jul, 11:07, ldh <lhy...(a)gmail.com> wrote:

> Can anyone please clarify for me, does tr1::unordered_map guarantee
> that iterators remain valid always (except iterators to items that
> have been removed?). This was definitely the case with the old SGI
> implementation. I am trying to transition from gcc's
> __gnu_cxx::hash_map to std::tr1::unordered_map in gcc 4.5, and I am
> getting crashes that seem like it's not safe to store an iterator to
> objects and use it later (after many other insertions and erasures of
> other objects) to delete that object. Before I look deeper into the
> gcc implementation I just wanted to make sure it's the case, that
> according to the standard the iterators should remain valid. Thanks!

I'm not sure about TR1, but the draft for ISO C++0x dated 2010-03-26
says in section [unord.req], paragraphs 13 and 14:

13. The insert members shall not affect the validity of references to
container elements, but may invalidate all iterators to the container.
The erase members shall invalidate only iterators and references to
the erased elements.

14. The insert members shall not affect the validity of iterators if (N
+n) < z * B, where N is the number of elements in the container prior
to the insert operation, n is the number of elements inserted, B is
the container�s bucket count, and z is the container�s maximum load
factor.

Note this section is actually about models of Unordered Associative
Containers, which includes unordered_map, unordered_set etc.

GCC's implementation for TR1 and C++0x is probably the same.

--
P.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Marek Vondrak on 11 Jul 2010 18:52

> > Can anyone please clarify for me, does tr1::unordered_map guarantee
> > that iterators remain valid always (except iterators to items that
> > have been removed?). This was definitely the case with the old SGI
> > implementation.

I posted exactly the same question two years ago and unfortunatelly
did not get any satisfactory answer. Yes, TR1 iterators get
invalidated also on inserts, but I do not see why it has to be so. SGI
style hash (or the ones from Dinkumware) containers do not have these
restrictions and, consequently, provide the same guarantees on
iterators like std::map which is really convenient.

-Marek

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: ldh on 12 Jul 2010 17:33

On Jul 12, 5:52 am, Marek Vondrak <marek.vond...(a)gmail.com> wrote:
> > > Can anyone please clarify for me, does tr1::unordered_map guarantee
> > > that iterators remain valid always (except iterators to items that
> > > have been removed?). This was definitely the case with the old SGI
> > > implementation.
>
> I posted exactly the same question two years ago and unfortunatelly
> did not get any satisfactory answer. Yes, TR1 iterators get
> invalidated also on inserts, but I do not see why it has to be so. SGI
> style hash (or the ones from Dinkumware) containers do not have these
> restrictions and, consequently, provide the same guarantees on
> iterators like std::map which is really convenient.

OK thanks for the responses. I think I understand the reason, it is
much more flexible and perhaps more efficient not to require the
iterator to remain always valid; in order to guarantee validity of
existing iterators in the presence of rehashing, one must avoid
storing a pointer to the element's bucket inside the iterator. The SGI
implementation of an iterator just contains a pointer to the node, and
a pointer to the hash table; when you increment it and it hits the end
of the bucket, it has to re-calculate the hash value and look up the
correct bucket so it can find the next one. This makes calling the
form of erase() that takes an iterator less efficient than you might
think -- you still have to hash the key to find out what bucket you
are erasing from, so that you can re-link the other nodes in that
bucket correctly, and you still have to iterate through the whole list
unless it is doubly linked. But the main reason to force always-valid
iterators is presumably to enable calling erase() with that iterator
later, so it may end up giving you very little benefit for the price
you pay.

In any case, it seems clear what the newly standardized behavior will
be, so that's what I needed. Thanks.

-Lewis

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Marek Vondrak on 13 Jul 2010 11:51

> The SGI
> implementation of an iterator just contains a pointer to the node, and
> a pointer to the hash table; when you increment it and it hits the end
> of the bucket, it has to re-calculate the hash value and look up the
> correct bucket so it can find the next one.

As I argued last year, it does not have to be done this way. Each
iterator only needs to point to a node object. Node objects can be
spliced into buckets using an intrusive list and at the same time they
can be spliced to a global list of all nodes in the container. At any
time instant, the global list equals some concatenation of bucket
lists (in some random order). The global list is used to enumerate all
objects in the container, bucket lists are used to find ranges of
objects with the same key (subsets of bucket lists). Using the global
list, one can hop from a bucket to bucket without doing any additional
searches or evaluations of node's hash values and using the node's
bucket lists, one can efficiently find places to insert new nodes to.
I used to have an implementation of hash maps and multimaps that
worked this way. The time complexities were as follows:
insert: time to find a bucket O( 1 ) + time to find a place within
bucket \approx number_of_items_in_container / number_of_buckets (when
using a hash function that distributes elements uniformly)
erase: O( 1 ), invalidates only iterators and pointers pointing to
the deleted object
find: time to find a bucket O( 1 ) + time to find an element within
bucket \approx number_of_items_in_container / number_of_buckets
find in multimap: as above
iterator::operator++() and operator--(): O( 1 )

-Marek

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

| Next | Last
Pages: 1 2
Prev: conversion operator ambiguity
Next: GCC (G++) "#if x1 == x2" bug

iterator invalidation guarantees in tr1::unordered_map