mmap(MAP_SHARED) and msync(MS_INVALIDATE) [Unix Programming]

Prev: how to handle socket timeout?
Next: gdb vs fortran RTL -- fight for SIGSEGV

From: Phil on 21 Jun 2008 11:46

Dear Experts,

I have a program which mmap()s its read-mostly data file.

If I run two instances of the program concurrently, I want the changes
made by one to be visible to the other. So I call mmap with the
MAP_SHARED flag.

After I make a change to the data I call msync(). Since I'm using
MAP_SHARED I don't believe that msync() should be necessary in order for
the other instance to see the changes; however, I also want to ensure
that the change is stored on the disk so that it won't be lost if the
program terminates and msync() seems to be the right way to do this.

According to the Linux man page for msync(), there's a flag
MS_INVALIDATE that I can pass to it that "asks to invalidate other
mappings of the same file (so that they can be updated with the fresh
values just written)". [I should say that I'm running this on Linux,
but portable code is always better.] This seems to be suggesting that
if I don't set this flag, other mappings of the file (i.e. presumably
mappings in other processes) won't see the new values. But that's not
what MAP_SHARED is supposed to do, is it?

Here's the POSIX description of MS_INVALIDATE:

"When MS_INVALIDATE is specified, msync() shall invalidate all cached
copies of mapped data that are inconsistent with the permanent storage
locations such that subsequent references shall obtain data that was
consistent with the permanent storage locations sometime between the
call to msync() and the first subsequent memory reference to the data."

What seems to happen in practice is that, after I msync(MS_INVALIDATE)
the small range of pages that I've changed, all of the rest of the
file's pages are lost from the cache; when I subsequently read the file
it brings it in from the disk again [and this is how I first noticed
that something was wrong, as I experienced a peculiar pause while it did
this].

Does anyone know what's going on with these calls?

Thanks,

Phil.

From: Moi on 21 Jun 2008 12:43

On Sat, 21 Jun 2008 16:46:03 +0100, Phil wrote:

> Dear Experts,
>
> I have a program which mmap()s its read-mostly data file.
>
> If I run two instances of the program concurrently, I want the changes
> made by one to be visible to the other. So I call mmap with the
> MAP_SHARED flag.
>
> After I make a change to the data I call msync(). Since I'm using
> MAP_SHARED I don't believe that msync() should be necessary in order for
> the other instance to see the changes; however, I also want to ensure
> that the change is stored on the disk so that it won't be lost if the
> program terminates and msync() seems to be the right way to do this.

The msync should not be necessary. Both programs have the *same* buffer
mapped into their address space. They should see the same data.

>
> According to the Linux man page for msync(), there's a flag
> MS_INVALIDATE that I can pass to it that "asks to invalidate other
> mappings of the same file (so that they can be updated with the fresh
> values just written)". [I should say that I'm running this on Linux,
> but portable code is always better.] This seems to be suggesting that
> if I don't set this flag, other mappings of the file (i.e. presumably
> mappings in other processes) won't see the new values. But that's not
> what MAP_SHARED is supposed to do, is it?
>
> Here's the POSIX description of MS_INVALIDATE:
>
> "When MS_INVALIDATE is specified, msync() shall invalidate all cached
> copies of mapped data that are inconsistent with the permanent storage
> locations such that subsequent references shall obtain data that was
> consistent with the permanent storage locations sometime between the
> call to msync() and the first subsequent memory reference to the data."
>
> What seems to happen in practice is that, after I msync(MS_INVALIDATE)
> the small range of pages that I've changed, all of the rest of the
> file's pages are lost from the cache; when I subsequently read the file
> it brings it in from the disk again [and this is how I first noticed
> that something was wrong, as I experienced a peculiar pause while it did
> this].

As I understand it, the MS_INVALIDATE flag just means:
if there is a buffer present for the affected pages:
if (this buffer is dirty) {
- mark this buffer as NON_DIRTY
- mark this buffer as NON_VALID.
}
else {
-- do nothing
}

Marking the buffer invalid will cause it to be read in from disk whenever
it is referenced again. Marking it as nondirty will cause any changes to
the page to be lost. (but they *might* have been written to disk *before*
the msync(... , MS_INVALIDATE) -call. The system may write to backing
storage whenever it wishes)

HTH,
AvK

From: guenther on 21 Jun 2008 17:55

On Jun 21, 9:46 am, Phil <spam_from_usene...(a)chezphil.org> wrote:
> If I run two instances of the program concurrently, I want the
> changes made by one to be visible to the other. So I call
> mmap with the MAP_SHARED flag.
>
> After I make a change to the data I call msync(). Since I'm
> using MAP_SHARED I don't believe that msync() should be
> necessary in order for the other instance to see the changes;

Agreed.

> however, I also want to ensure that the change is stored on
> the disk so that it won't be lost if the program terminates
> and msync() seems to be the right way to do this.

Actually, I believe that should be unnecessary. Writes to mappings
obtained via mmap(MAP_SHARED) should be visible in other processes
under the same rules that cover when writes made in one thread are
visible to another thread in the same process, ala:

http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap04.html#tag_04_10

msync(MS_ASYNC) and msync(MS_SYNC) are for making sure that changes
are written to permanent storage, ala fsync() or fdatasync().

> According to the Linux man page for msync(), there's a flag
> MS_INVALIDATE that I can pass to it that "asks to invalidate
> other mappings of the same file (so that they can be updated
> with the fresh values just written)". <...> This seems to be
> suggesting that if I don't set this flag, other mappings of the
> file (i.e. presumably mappings in other processes) won't see
> the new values. But that's not what MAP_SHARED is supposed
> to do, is it?

The Linux description seems 'off' to me. My expectation from the
SUSv3 description and from experience is that for a MAP_SHARED
mappings of a normal file, msync(MS_INVALIDATE) is only necessary if
changes are being made to the file using interfaces other than a
shared mapping. I.e., if process A uses write() to change the file
contents, then process B might not see the change in its shared
mapping until it invalidates the 'cached contents' using msync().
Changes made to the file using other shared mapping should be visible
immediately.

At least that's been my experience across the following platforms:
Linux, Solaris, AIX, OpenBSD, FreeBSD. Most of those have shared
buffer caches such that msync(MS_INVALIDATE) is never actually needed
for shared mappings: changes made via write() are instantly visible
via shared mappings.

If my memory serves, the real exception was HP-UX on PA-RISC, where
there was NO SUPPORTED METHOD to make a change visible via write()
visible to a shared mapping. Completely unmapping the file and
remapping it could leave you with old data in the mapping! There were
other horrible restrictions on shared mappings (they had to have the
same VM address *in all processes*) that made using mmap() there
completely unworkable for files that needed to grow while being
shared.

Philip Guenther

From: Phil on 22 Jun 2008 10:24

guenther(a)gmail.com wrote:
> On Jun 21, 9:46 am, Phil <spam_from_usene...(a)chezphil.org> wrote:
>> According to the Linux man page for msync(), there's a flag
>> MS_INVALIDATE that I can pass to it that "asks to invalidate
>> other mappings of the same file (so that they can be updated
>> with the fresh values just written)". <...> This seems to be
>> suggesting that if I don't set this flag, other mappings of the
>> file (i.e. presumably mappings in other processes) won't see
>> the new values. But that's not what MAP_SHARED is supposed
>> to do, is it?
>
> The Linux description seems 'off' to me. My expectation from the
> SUSv3 description and from experience is that for a MAP_SHARED
> mappings of a normal file, msync(MS_INVALIDATE) is only necessary if
> changes are being made to the file using interfaces other than a
> shared mapping. I.e., if process A uses write() to change the file
> contents, then process B might not see the change in its shared
> mapping until it invalidates the 'cached contents' using msync().
> Changes made to the file using other shared mapping should be visible
> immediately.

Right. Agreed.

I've had a look at the Linux source code (in mm/msync.c) and I believe
that MS_INVALIDATE actually does nothing at all. (What's more, it uses
the msync address range to determine which mapped files are affected,
but then syncs the *whole* of each of those files!)

Now I have the further challenge of working out how much of this still
works if the mmap()ed files are on NFS (or CIFS, or whatever). I think
that msync() is necessary to ensure that the changes have reached the
server; then, you have to wait for maybe a 30 second timeout before
another client will re-validate and discover the changes; but
unfortunately it doesn't know about the address range that changed, so
it invalidates all of its pages for that file. Has anyone ever tried
this sort of thing?

Cheers,

Phil.

From: phil-news-nospam on 26 Jun 2008 13:41

On Sat, 21 Jun 2008 14:55:27 -0700 (PDT) guenther(a)gmail.com <guenther(a)gmail.com> wrote:

| If my memory serves, the real exception was HP-UX on PA-RISC, where
| there was NO SUPPORTED METHOD to make a change visible via write()
| visible to a shared mapping. Completely unmapping the file and
| remapping it could leave you with old data in the mapping! There were
| other horrible restrictions on shared mappings (they had to have the
| same VM address *in all processes*) that made using mmap() there
| completely unworkable for files that needed to grow while being
| shared.

My understanding of these limitations is that it is a hardware issue with
regard to cache line size being larger than a page size. It would be
possible to map a given page at a different virtual memory address, whether
in the same process or a different once, provided that its offset relative
to the cache line size was the same. Linux solved this for ARM architecture
(cache line size 16K, page size 4K) by means of enforcements in mmap() that
would refuse to do the mapping if the offset was wrong for a requested
address. The OS on the HP-UX and PA-RISC architectures may have elected
to not do this at all. I believe the cache line size for HP-UX is 1M.
I'd have to go find old notes, but there is a macro symbol for Linux that
tells an application what the mapping alignment base is, which would be
the larger of the page size and the cache line size. I encountered this
because glibc and uClibc both at one time had this value set wrong for ARM
while Linux had it set right and was enforcing it. I only encountered it
because of the double mapping in my VRB library ( http://vrb.slashusr.org/ ).
I have not yet tested this at all on HP-UX or PA-RISC (anyone have a portable
emulator for these?).

--
|WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
| by the abuse department, bellsouth.net is blocked. If you post to |
| Usenet from these places, find another Usenet provider ASAP. |
| Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |

| Next | Last
Pages: 1 2
Prev: how to handle socket timeout?
Next: gdb vs fortran RTL -- fight for SIGSEGV