From: Eric Sosman on
Martin Gregorie wrote:
> On Sat, 05 Dec 2009 10:11:30 +0000, Tom Anderson wrote:
>
>> Although this still doesn't handle crashes. I think there is a trick you
>> can do on unix to have files deleted even when the process crashes -
>> something like create, open, then delete the directory entry, so that
>> the only reference keeping the file alive is from the open filehandle,
>> which will die when the process exits - but i don't know if there's a
>> way to use it from java. Or even that this is definitely correct.
>>
> Thats correct. Its the standard UNIX idiom for making sure that temporary
> files don't outlive the process that created them no matter how it dies.
>
> It should work from Java since its not language-dependent, though of
> course its not portable outside outside the *nix world.
>
>> However, by default, createTempFile puts files in java.io.tmpdir, which
>> on unix machines will typically be /tmp. Files there are subject to
>> deletion at the whim of the OS, so to an extent, you can delegate the
>> problem of worrying about deleting files to that.
>>
> You should attempt to delete them at some stage because there's no
> guarantee that the OS will. Its merely a way of guaranteeing that the
> tempfile has a unique name no matter how many copies of the process are
> running.
>
> A more useful approach would be to start the process(es) from a shell
> script or control process whose first action is to delete all temporary
> files it finds that are used by the processes it controls: this will be
> portable provided the script/control process is portable: no reason it
> shouldn't be written in Java or a portable scripting language like Groovy
> or Python.
>
>> That said, i'm not sure what current unixen's policies towards /tmp are;
>> i believe linux will only delete things at reboot, not during normal
>> operation, which makes this less useful.
>>
> I'm not certain that temp files are necessarily deleted at boot because
> that does slow down crash recovery. Since a file in temp will survive
> until its closed, its equally likely that there's a cron job that runs
> 'rm -rf /tmp/*' sometime after midnight each day. The real caveat is that
> no program creating files in /tmp should expect them to be there after it
> terminates, i.e. don't pass them to another program started after the
> first ends.

(Marginally topical) On Solaris, the Unix flavor I'm most
familiar with, /tmp is usually mounted on a tmpfs file system.
This is a memory-resident file system to the extent possible,
spilling over into swap space as needed. Nothing special needs
to happen at reboot to "clean out" tmpfs, no more than anything
special needs to happen to "clean out" swap files: The newly-
booted system just initializes its metadata to say "empty," and
everything from prior incarnations is gone.

Also, it's a *very* bad idea to purge /tmp blindly, even if
you're careful only to purge files that haven't been modified
in a while. I recall working with a server application that put
files in /tmp and mmap'ed them to share memory between its multiple
processes. Since simple paging I/O to and from a file opened a
week ago doesn't change the files' modification date, along came
the customer's /tmp-purging cron job and BLOOEY went the server ...

--
Eric Sosman
esosman(a)ieee-dot-org.invalid
From: Martin Gregorie on
On Sat, 05 Dec 2009 18:30:26 -0500, Eric Sosman wrote:

>
> Also, it's a *very* bad idea to purge /tmp blindly, even if
> you're careful only to purge files that haven't been modified in a
> while. I recall working with a server application that put files in
> /tmp and mmap'ed them to share memory between its multiple processes.
> Since simple paging I/O to and from a file opened a week ago doesn't
> change the files' modification date, along came the customer's
> /tmp-purging cron job and BLOOEY went the server ...
>
Good point.

When I've needed to do this on Unices I've used the Unix IPC library
functions to hand references to shared memory segments between programs.
But you can't do that in Java AFAIK.

The only place I've used mmap files was on Tandem (now HP) Guardian fault
tolerant systems where an mmapped file was the only possibility because
the sharing processes were almost guaranteed to be on different CPUs with
nothing in common except shared disk. If I used them on a *NIX system
they'd probably need to be persistent data and so would be sitting in
normal directories alongside named pipes.


--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
From: Tom Anderson on
On Sat, 5 Dec 2009, Eric Sosman wrote:

> Martin Gregorie wrote:
>> On Sat, 05 Dec 2009 10:11:30 +0000, Tom Anderson wrote:
>>
>>> However, by default, createTempFile puts files in java.io.tmpdir, which
>>> on unix machines will typically be /tmp. Files there are subject to
>>> deletion at the whim of the OS, so to an extent, you can delegate the
>>> problem of worrying about deleting files to that.
>>
>> You should attempt to delete them at some stage because there's no
>> guarantee that the OS will. Its merely a way of guaranteeing that the
>> tempfile has a unique name no matter how many copies of the process are
>> running.
>
> (Marginally topical) On Solaris, the Unix flavor I'm most familiar
> with, /tmp is usually mounted on a tmpfs file system. This is a
> memory-resident file system to the extent possible, spilling over into
> swap space as needed. Nothing special needs to happen at reboot to
> "clean out" tmpfs, no more than anything special needs to happen to
> "clean out" swap files: The newly- booted system just initializes its
> metadata to say "empty," and everything from prior incarnations is gone.

Aha, interesting. Seems like a good scheme.

> Also, it's a *very* bad idea to purge /tmp blindly, even if you're
> careful only to purge files that haven't been modified in a while. I
> recall working with a server application that put files in /tmp and
> mmap'ed them to share memory between its multiple processes. Since
> simple paging I/O to and from a file opened a week ago doesn't change
> the files' modification date, along came the customer's /tmp-purging
> cron job and BLOOEY went the server ...

Hang on, the files were open, right? So how could they be deleted? Or is
the point that the directory entries were deleted, so when new processes
were spawned, they couldn't open the file? And since when did writing to a
file via an mmap not change its modification time, anyway?

Either way, i'd suggest the bad idea here was putting critical long-lived
files in /tmp. Yes, they're temporary, but not that temporary!

tom

--
an optical recording release. copyright digitally mastered. .,
From: Martin Gregorie on
On Wed, 09 Dec 2009 15:49:26 +0000, Tom Anderson wrote:

> Hang on, the files were open, right? So how could they be deleted?
>
Some non-*nix OSes (OS/400, VMS?) refuse to accept the delete() operation
if the file is open. Of course its irrelevant for single tasking OSen
(i.e. DOS) and can cause havoc where the 'OS' is a multi-tasking shell
sitting on a single-tasking kernel (Win 3.1 thru ME fit this description).

> Or is the point that the directory entries were deleted, so when new
> processes were spawned, they couldn't open the file?
>
This is what I was muttering about. In the *nix family, which includes
VOS, AIX, Linux and Free BSD:

- deleting a file removes the directory entry, reducing the link count
for the file (the link count is the number of directory entries
pointing at the file's inode). A directory entry only holds a name
and an inode reference: everything else (ownership, permissions,
timestamps, pointers to the data blocks) is in the inode.

- the inode and associated storage is only deleted when the link count is
zero and no processes have the file open.

> And since when did writing to
> a file via an mmap not change its modification time, anyway?
>
Works OK for Linux and most *nixen, don't know about others.


> Either way, i'd suggest the bad idea here was putting critical
> long-lived files in /tmp. Yes, they're temporary, but not that
> temporary!
>
Exactly so.

A good precautionary design would clear out unwanted files as its first
action as well as deleting surplus files as its last action.

Its probably also a good idea to store the activity status (starting/
running/clean-up/done) in a permanent file so the process knows whether
its doing a normal start or a restart and, depending on what the program
is doing, it may also be useful to build a list of files to be deleted,
backed up, etc. This type of information makes restarts *much* easier if
you're dealing with high volume, long running applications and is
probably essential if any part of it involves parallel tasks.


--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
From: Eric Sosman on
On 12/9/2009 10:49 AM, Tom Anderson wrote:
> On Sat, 5 Dec 2009, Eric Sosman wrote:
>>
>> (Marginally topical) [...]
>
>> Also, it's a *very* bad idea to purge /tmp blindly, even if you're
>> careful only to purge files that haven't been modified in a while. I
>> recall working with a server application that put files in /tmp and
>> mmap'ed them to share memory between its multiple processes. Since
>> simple paging I/O to and from a file opened a week ago doesn't change
>> the files' modification date, along came the customer's /tmp-purging
>> cron job and BLOOEY went the server ...
>
> Hang on, the files were open, right? So how could they be deleted? Or is
> the point that the directory entries were deleted, so when new processes
> were spawned, they couldn't open the file? And since when did writing to
> a file via an mmap not change its modification time, anyway?

(The topicality margin gets even thinner)

You're right: An open file can't be deleted. However, its
directory entry is removed. Then, when the application spawns a
new process and that new process tries to share memory with the
others by opening and mmap'ing the now-unfindable file, BLOOEY.
(When the customer first reported trouble, I immediately asked
whether there was a cron job or some such that periodically purged
old files from /tmp. Customer asserted -- vehemently and a bit
angrily -- that OF COURSE there wasn't. So we cobbled together
some DTrace to monitor file deletions in /tmp, and caught the
non-existent cron job red-handed ...)

As for file modification times, I confess an incomplete grasp
of exactly which operations do and do not update them. However,
just poking a new value into a page that's mmap'ed from a file is
not enough to update the time stamp. Can you imagine the overhead
if every memory write trapped to the kernel to update the time?

> Either way, i'd suggest the bad idea here was putting critical
> long-lived files in /tmp. Yes, they're temporary, but not that temporary!

It wasn't my choice. It wasn't even my company's choice.
The third party who wrote the application chose to do things that
way, and even went so far as to include "do_not_delete" as part
of the files' names.

--
Eric Sosman
esosman(a)ieee-dot-org.invalid