Flushing file writes to disk with 100% reliability [Unix Programming]

Prev: Learn about proxy sites and how to use them to open blocked sites unlimited downloads from RapidShare and megaupload and increase the speed of the Internet with new sites for free
Next: How reliable are named pipes?

From: Peter Olcott on 8 Apr 2010 20:43

"Ian Collins" <ian-news(a)hotmail.com> wrote in message
news:827aifFp8jU17(a)mid.individual.net...
> On 04/ 9/10 12:00 PM, Peter Olcott wrote:
>> "Ian Collins"<ian-news(a)hotmail.com> wrote in message
>> news:8278l3Fp8jU16(a)mid.individual.net...
>>> On 04/ 9/10 11:48 AM, Peter Olcott wrote:
>>>>
>>>> I am trying to have completely reliable writes to a
>>>> transaction log. This transaction log includes
>>>> financial
>>>> transactions. Even if someone pulls the plug in the
>>>> middle
>>>> of a transaction I want to only lose this single
>>>> transaction
>>>> and not have and other missing or corrupted data.
>>>>
>>>> One aspect of the solution to this problem is to make
>>>> sure
>>>> that all disk write are immediately flushed to the
>>>> actual
>>>> platters. It is this aspect of this problem that I am
>>>> attempting to solve in this thread.
>>>
>>> Can't you rely on your database to manage this for you?
>>
>> Not for the transaction log because it will not be in a
>> database. The transaction log file will be the primary
>> means of IPC. Named pipes will provide event notification
>> of
>> changes to the log file, and the file offset of these
>> changes.
>
> It sounds very much (here and in other threads) like you
> are trying to reinvent database transactions. just sore
> everything in a database and signal watchers when data is
> updated. Databases had atomic transactions licked decades
> ago!
>
> --
> Ian Collins

I don't want to make the system much slower than necessary
merely to avoid learning how to do completely reliable file
writes.

There is too much overhead in a SQL database for this
purpose because SQL has no means to directly seek a specific
record, all the overhead of accessing and maintaining
indices would be required. I want to plan on 100
transactions per second on a single core processor because
that is the maximum speed of my OCR process on a single page
of data. I want to spend an absolute minimum time on every
other aspect of processing, and file I/O generally tends to
be the primary bottleneck to performance.

The fastest possible persistent mechanism would be a binary
file that is not a part of a SQL database. All access to
records in this file would be by direct file offset.
Implementing this in SQL could have a tenfold degradation in
performance.

I will be using a SQL database for my user login and account
information.

From: Ian Collins on 8 Apr 2010 21:08

On 04/ 9/10 12:43 PM, Peter Olcott wrote:
> "Ian Collins"<ian-news(a)hotmail.com> wrote:
>>
>> It sounds very much (here and in other threads) like you
>> are trying to reinvent database transactions. just sore
>> everything in a database and signal watchers when data is
>> updated. Databases had atomic transactions licked decades
>> ago!
>>
> I don't want to make the system much slower than necessary
> merely to avoid learning how to do completely reliable file
> writes.

The magic word there is "necessary". It's not just the file writes but
whole business with named pipes.

> There is too much overhead in a SQL database for this
> purpose because SQL has no means to directly seek a specific
> record, all the overhead of accessing and maintaining
> indices would be required. I want to plan on 100
> transactions per second on a single core processor because
> that is the maximum speed of my OCR process on a single page
> of data. I want to spend an absolute minimum time on every
> other aspect of processing, and file I/O generally tends to
> be the primary bottleneck to performance.

100 transactions per second isn't that great a demand. Most databases
have RAM based tables, so the only file access would the write through.
The MySQL InnoDB storage engine is optimised for this.

> The fastest possible persistent mechanism would be a binary
> file that is not a part of a SQL database. All access to
> records in this file would be by direct file offset.
> Implementing this in SQL could have a tenfold degradation in
> performance.

Have you benchmarked this? Even if that is so, it might still be 10x
faster than is required.

> I will be using a SQL database for my user login and account
> information.

So you have to opportunity to do some benchmarking.

--
Ian Collins

From: Peter Olcott on 8 Apr 2010 21:21

"Ian Collins" <ian-news(a)hotmail.com> wrote in message
news:827d0gFp8jU18(a)mid.individual.net...
> On 04/ 9/10 12:43 PM, Peter Olcott wrote:
>> "Ian Collins"<ian-news(a)hotmail.com> wrote:
>>>
>>> It sounds very much (here and in other threads) like you
>>> are trying to reinvent database transactions. just sore
>>> everything in a database and signal watchers when data
>>> is
>>> updated. Databases had atomic transactions licked
>>> decades
>>> ago!
>>>
>> I don't want to make the system much slower than
>> necessary
>> merely to avoid learning how to do completely reliable
>> file
>> writes.
>
> The magic word there is "necessary". It's not just the
> file writes but whole business with named pipes.

Yeah, but why did you bring this up, aren't named pipes
trivial and fast?

>
>> There is too much overhead in a SQL database for this
>> purpose because SQL has no means to directly seek a
>> specific
>> record, all the overhead of accessing and maintaining
>> indices would be required. I want to plan on 100
>> transactions per second on a single core processor
>> because
>> that is the maximum speed of my OCR process on a single
>> page
>> of data. I want to spend an absolute minimum time on
>> every
>> other aspect of processing, and file I/O generally tends
>> to
>> be the primary bottleneck to performance.
>
> 100 transactions per second isn't that great a demand.
> Most databases have RAM based tables, so the only file
> access would the write through. The MySQL InnoDB storage
> engine is optimised for this.

Exactly how fault tolerant is it with the server's power
cord yanked from the wall?

>
>> The fastest possible persistent mechanism would be a
>> binary
>> file that is not a part of a SQL database. All access to
>> records in this file would be by direct file offset.
>> Implementing this in SQL could have a tenfold degradation
>> in
>> performance.
>
> Have you benchmarked this? Even if that is so, it might
> still be 10x faster than is required.

My time budget is no time at all, (over and above the 10 ms
that my OCR process already used) and I want to get as close
to this as possible. Because of the file caching that you
mentioned it is possible that SQL might be faster.

If there was only a way to have records numbered in
sequential order, and directly access this specific record
by its record number. It seems so stupid that you have to
build, access and maintain a whole index just to access
records by record number.

>
>> I will be using a SQL database for my user login and
>> account
>> information.
>
> So you have to opportunity to do some benchmarking.
>
> --
> Ian Collins

From: Ian Collins on 8 Apr 2010 21:42

On 04/ 9/10 01:21 PM, Peter Olcott wrote:
> "Ian Collins"<ian-news(a)hotmail.com> wrote in message
> news:827d0gFp8jU18(a)mid.individual.net...
>> On 04/ 9/10 12:43 PM, Peter Olcott wrote:
>>> "Ian Collins"<ian-news(a)hotmail.com> wrote:
>>>>
>>>> It sounds very much (here and in other threads) like you
>>>> are trying to reinvent database transactions. just sore
>>>> everything in a database and signal watchers when data
>>>> is
>>>> updated. Databases had atomic transactions licked
>>>> decades
>>>> ago!
>>>>
>>> I don't want to make the system much slower than
>>> necessary
>>> merely to avoid learning how to do completely reliable
>>> file
>>> writes.
>>
>> The magic word there is "necessary". It's not just the
>> file writes but whole business with named pipes.
>
> Yeah, but why did you bring this up, aren't named pipes
> trivial and fast?

I don't use them. But I'm sure the time spent on your named pipe thread
would have been plenty of time for benchmarking!

>>> There is too much overhead in a SQL database for this
>>> purpose because SQL has no means to directly seek a
>>> specific
>>> record, all the overhead of accessing and maintaining
>>> indices would be required. I want to plan on 100
>>> transactions per second on a single core processor
>>> because
>>> that is the maximum speed of my OCR process on a single
>>> page
>>> of data. I want to spend an absolute minimum time on
>>> every
>>> other aspect of processing, and file I/O generally tends
>>> to
>>> be the primary bottleneck to performance.
>>
>> 100 transactions per second isn't that great a demand.
>> Most databases have RAM based tables, so the only file
>> access would the write through. The MySQL InnoDB storage
>> engine is optimised for this.
>
> Exactly how fault tolerant is it with the server's power
> cord yanked from the wall?

As good as any. If you want 5 nines reliability you have to go a lot
further than synchronous writes. My main server has highly redundant
raid (thinks to ZFS), redundant PSUs and a UPS. I'm not quite at the
generator stage yet, our power here is very dependable :).

>>> The fastest possible persistent mechanism would be a
>>> binary
>>> file that is not a part of a SQL database. All access to
>>> records in this file would be by direct file offset.
>>> Implementing this in SQL could have a tenfold degradation
>>> in
>>> performance.
>>
>> Have you benchmarked this? Even if that is so, it might
>> still be 10x faster than is required.
>
> My time budget is no time at all, (over and above the 10 ms
> that my OCR process already used) and I want to get as close
> to this as possible. Because of the file caching that you
> mentioned it is possible that SQL might be faster.
>
> If there was only a way to have records numbered in
> sequential order, and directly access this specific record
> by its record number. It seems so stupid that you have to
> build, access and maintain a whole index just to access
> records by record number.

You don't. The database engine does.

--
Ian Collins

From: Peter Olcott on 8 Apr 2010 21:58

"Ian Collins" <ian-news(a)hotmail.com> wrote in message
news:827evjFp8jU20(a)mid.individual.net...
> On 04/ 9/10 01:21 PM, Peter Olcott wrote:
>> "Ian Collins"<ian-news(a)hotmail.com> wrote in message
>> news:827d0gFp8jU18(a)mid.individual.net...
>>> On 04/ 9/10 12:43 PM, Peter Olcott wrote:
>>>> "Ian Collins"<ian-news(a)hotmail.com> wrote:
>>> 100 transactions per second isn't that great a demand.
>>> Most databases have RAM based tables, so the only file
>>> access would the write through. The MySQL InnoDB storage
>>> engine is optimised for this.
>>
>> Exactly how fault tolerant is it with the server's power
>> cord yanked from the wall?
>
> As good as any. If you want 5 nines reliability you have
> to go a lot further than synchronous writes. My main
> server has highly redundant raid (thinks to ZFS),
> redundant PSUs and a UPS. I'm not quite at the generator
> stage yet, our power here is very dependable :).
>
>>>> The fastest possible persistent mechanism would be a
>>>> binary
>>>> file that is not a part of a SQL database. All access
>>>> to
>>>> records in this file would be by direct file offset.
>>>> Implementing this in SQL could have a tenfold
>>>> degradation
>>>> in
>>>> performance.
>>>
>>> Have you benchmarked this? Even if that is so, it might
>>> still be 10x faster than is required.
>>
>> My time budget is no time at all, (over and above the 10
>> ms
>> that my OCR process already used) and I want to get as
>> close
>> to this as possible. Because of the file caching that
>> you
>> mentioned it is possible that SQL might be faster.
>>
>> If there was only a way to have records numbered in
>> sequential order, and directly access this specific
>> record
>> by its record number. It seems so stupid that you have to
>> build, access and maintain a whole index just to access
>> records by record number.
>
> You don't. The database engine does.

Does the MySQL InnoDB storage engine have a journal file
like SQLite for crash recovery?

>
> --
> Ian Collins

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Prev: Learn about proxy sites and how to use them to open blocked sites unlimited downloads from RapidShare and megaupload and increase the speed of the Internet with new sites for free
Next: How reliable are named pipes?