Flushing file writes to disk with 100% reliability [Unix Programming]

Prev: Learn about proxy sites and how to use them to open blocked sites unlimited downloads from RapidShare and megaupload and increase the speed of the Internet with new sites for free
Next: How reliable are named pipes?

From: Phil Carmody on 11 Apr 2010 05:41

Ian Collins <ian-news(a)hotmail.com> writes:
> On 04/ 9/10 01:21 PM, Peter Olcott wrote:
>> "Ian Collins"<ian-news(a)hotmail.com> wrote:
>>> Have you benchmarked this? Even if that is so, it might
>>> still be 10x faster than is required.

Lack of 'yes' or 'no' answer to Ian's question noted.

>> My time budget is no time at all, (over and above the 10 ms
>> that my OCR process already used) and I want to get as close
>> to this as possible. Because of the file caching that you
>> mentioned it is possible that SQL might be faster.
>>
>> If there was only a way to have records numbered in
>> sequential order, and directly access this specific record
>> by its record number. It seems so stupid that you have to
>> build, access and maintain a whole index just to access
>> records by record number.
>
> You don't. The database engine does.

You seem to have forgotten that you're talking to Peter
'reinvent-the-wheel' Olcott. I'm sure if he builds and
maintains the index himself, it will be 872x faster than
any database.

Phil
--
I find the easiest thing to do is to k/f myself and just troll away
-- David Melville on r.a.s.f1

From: Peter Olcott on 11 Apr 2010 08:41

"Phil Carmody" <thefatphil_demunged(a)yahoo.co.uk> wrote in
message news:87tyriba2d.fsf(a)kilospaz.fatphil.org...
> Ian Collins <ian-news(a)hotmail.com> writes:
>> On 04/ 9/10 01:21 PM, Peter Olcott wrote:
>>> "Ian Collins"<ian-news(a)hotmail.com> wrote:
>>>> Have you benchmarked this? Even if that is so, it
>>>> might
>>>> still be 10x faster than is required.
>
> Lack of 'yes' or 'no' answer to Ian's question noted.
>
>>> My time budget is no time at all, (over and above the 10
>>> ms
>>> that my OCR process already used) and I want to get as
>>> close
>>> to this as possible. Because of the file caching that
>>> you
>>> mentioned it is possible that SQL might be faster.
>>>
>>> If there was only a way to have records numbered in
>>> sequential order, and directly access this specific
>>> record
>>> by its record number. It seems so stupid that you have
>>> to
>>> build, access and maintain a whole index just to access
>>> records by record number.
>>
>> You don't. The database engine does.

This would still triple the overhead associated with each
transaction. Disk seeks are the most expensive part of this
overhead so making three times as many slows down processing
overhead by a factor of three. Reliability can not be
ensured unless all disk writes are immediate, thus file
caching can not help with this. Disk reads can be speeded up
by file caching.

>
> You seem to have forgotten that you're talking to Peter
> 'reinvent-the-wheel' Olcott. I'm sure if he builds and
> maintains the index himself, it will be 872x faster than
> any database.
>
> Phil
> --
> I find the easiest thing to do is to k/f myself and just
> troll away
> -- David Melville on r.a.s.f1

From: David Schwartz on 12 Apr 2010 19:43

On Apr 10, 6:26 pm, "Peter Olcott" <NoS...(a)OCR4Screen.com> wrote:

> That may be the case,but, this is not how it was related to
> me on this thread. It was related to me on this thread as
> two distinctly different and separate issues. The one that
> you just mentioned, and also in addition to this the issues
> that fsync() itself is often broken. fsync() is ONLY
> supposed to flush the OS kernel buffers. The application
> buffers and the application buffers as well as the drive
> cache are both supposed to be separate issues.

Unfortunately, the reality is that if this is hard requirement, you
have two choices:

1) Design a system that provides this inherently, such as using a
separate transaction backup logging system. Make sure the first system
applies a transaction identifier to each transaction and the logging
system plays them back, say, an hour later. If any transaction is
missing on the primary, the backup re-applies it.

This is less easy than it sounds. For example, if you add $100 to my
account and then log the transaction, what if power is lost and the
log is kept but the add is lost? Make sure the add *is* the log.

2) Build a system and test it. If you require 99.9% reliability that a
transaction not be lost if the plug is pulled, you will have to hire
someone to fire test transactions at the machine and pull the plug a
few thousand times to confirm. Any hardware changes will require re-
testing.

There really is no other way. Assuming the system as a whole provides
the reliability you expect is not going to work.

DS

From: Golden California Girls on 12 Apr 2010 21:26

Peter Olcott wrote:
> That may be the case,but, this is not how it was related to
> me on this thread. It was related to me on this thread as
> two distinctly different and separate issues. The one that
> you just mentioned, and also in addition to this the issues
> that fsync() itself is often broken. fsync() is ONLY
> supposed to flush the OS kernel buffers. The application
> buffers and the application buffers as well as the drive
> cache are both supposed to be separate issues.

Does it matter where the break is?

I suspect the documentation for fsync may reflect the fact that hardware does
not universally support it. I'd be a bit surprised if fsync didn't at least ask
that the buffers be written, at least on any decent distro. Or it may reflect
the reality of a remote file system being mounted and the impossibility of
sending a sync command to the remote system.

Obviously you can get the source and read it to find out if the correct commands
are sent to an attached local drive. That's the easy part. Proving that the
drive obeys the commands is another story. Would a SSD have a buffer?

So my suggestion is to assume the data doesn't make it to the platter and build
your error recovery so as to not depend upon that. Or admit it, get dual power
supplies, dual UPSs and a backup generator and pray the janitor doesn't pull
both plugs. Or spend the time to read enough source code and documentation to
prove everything works or a specific distro with specific equipment.

Another possible I suppose is to not mount the disk as a file system but do raw
I/O. That way you know what buffers you have and know that you called for them
to be written. If you know the drive obeys a no buffer command, you may have
the assurance you need.

From: Peter Olcott on 12 Apr 2010 23:57

"Golden California Girls" <gldncagrls(a)aol.com.mil> wrote in
message news:hq0h7u$sp1$1(a)speranza.aioe.org...
> Peter Olcott wrote:
>> That may be the case,but, this is not how it was related
>> to
>> me on this thread. It was related to me on this thread as
>> two distinctly different and separate issues. The one
>> that
>> you just mentioned, and also in addition to this the
>> issues
>> that fsync() itself is often broken. fsync() is ONLY
>> supposed to flush the OS kernel buffers. The application
>> buffers and the application buffers as well as the drive
>> cache are both supposed to be separate issues.
>
> Does it matter where the break is?

If one must find all breaks and fix then, yes.

>
> I suspect the documentation for fsync may reflect the fact
> that hardware does
> not universally support it. I'd be a bit surprised if
> fsync didn't at least ask
> that the buffers be written, at least on any decent
> distro. Or it may reflect
> the reality of a remote file system being mounted and the
> impossibility of
> sending a sync command to the remote system.

http://linux.die.net/man/2/fsync
The biggest caveat that this mentioned was the hard drive
cache.

> Obviously you can get the source and read it to find out
> if the correct commands
> are sent to an attached local drive. That's the easy
> part. Proving that the
> drive obeys the commands is another story. Would a SSD
> have a buffer?

I think that it has to because it has to write blocks of a
fixed size.

>
> So my suggestion is to assume the data doesn't make it to
> the platter and build
> your error recovery so as to not depend upon that. Or
> admit it, get dual power
> supplies, dual UPSs and a backup generator and pray the
> janitor doesn't pull
> both plugs. Or spend the time to read enough source code
> and documentation to
> prove everything works or a specific distro with specific
> equipment.
>
> Another possible I suppose is to not mount the disk as a
> file system but do raw
> I/O. That way you know what buffers you have and know
> that you called for them
> to be written. If you know the drive obeys a no buffer
> command, you may have
> the assurance you need.

Probably comprehensive testing of one sort or another will
help.