From: Heikki Linnakangas on
Fujii Masao wrote:
> As I pointed out previously, the standby might restore a partially-filled
> WAL file that is being archived by the primary, and cause a FATAL error.
> And this happened in my box when I was testing the SR.
>
> sby [20088] FATAL: archive file "000000010000000000000087" has
> wrong size: 14139392 instead of 16777216
> sby [20076] LOG: startup process (PID 20088) exited with exit code 1
> sby [20076] LOG: terminating any other active server processes
> act [18164] LOG: received immediate shutdown request
>
> If the startup process is in standby mode, I think that it should retry
> starting replication instead of emitting an error when it finds a
> partially-filled file in the archive. Then if the replication has been
> terminated, it has only to restore the archived file again. Thought?

Hmm, so after running restore_command, check the file size and if it's
too short, treat it the same as if restore_command returned non-zero?
And it will be retried on the next iteration. Works for me, though OTOH
it will then fail to complain about a genuinely WAL file that's
truncated for some reason. I guess there's no way around that, even if
you have a script as restore_command that does the file size check, it
will have the same problem.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers