From: Fujii Masao on
On Thu, Jun 17, 2010 at 4:02 PM, Rafael Martinez
<r.m.guerrero(a)usit.uio.no> wrote:
> I tested this yesterday and I could not get any reaction from the wal
> receiver even after using minimal values compared to the default values �.
>
> The default values in linux for tcp_keepalive_time, tcp_keepalive_intvl
> and tcp_keepalive_probes are 7200, 75 and 9. I reduced these values to
> 60, 3, 3 and nothing happened, it continuous with status ESTABLISHED
> after 60+3*3 seconds.
>
> I did not restart the network after I changed these values on the fly
> via /proc. I wonder if this is the reason the connection didn't die
> neither with the new keppalive values after the connection was broken. I
> will check this later today.

Walreceiver uses libpq to communicate with the master. But keepalive is not
enabled in libpq currently. That is libpq code doesn't call something like
setsockopt(SOL_SOCKET, SO_KEEPALIVE). So even if you change the kernel options
for keepalive, it has no effect on walreceiver.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Magnus Hagander on
On Thu, Jun 17, 2010 at 09:20, Fujii Masao <masao.fujii(a)gmail.com> wrote:
> On Thu, Jun 17, 2010 at 4:02 PM, Rafael Martinez
> <r.m.guerrero(a)usit.uio.no> wrote:
>> I tested this yesterday and I could not get any reaction from the wal
>> receiver even after using minimal values compared to the default values �.
>>
>> The default values in linux for tcp_keepalive_time, tcp_keepalive_intvl
>> and tcp_keepalive_probes are 7200, 75 and 9. I reduced these values to
>> 60, 3, 3 and nothing happened, it continuous with status ESTABLISHED
>> after 60+3*3 seconds.
>>
>> I did not restart the network after I changed these values on the fly
>> via /proc. I wonder if this is the reason the connection didn't die
>> neither with the new keppalive values after the connection was broken. I
>> will check this later today.
>
> Walreceiver uses libpq to communicate with the master. But keepalive is not
> enabled in libpq currently. That is libpq code doesn't call something like
> setsockopt(SOL_SOCKET, SO_KEEPALIVE). So even if you change the kernel options
> for keepalive, it has no effect on walreceiver.

Yeah, there was a patch submitted for this - I think it's on the CF
page for 9.1... I guess if we really need it walreceiver could enable
it - just get the socket with PQsocket().

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Fujii Masao <masao.fujii(a)gmail.com> writes:
> On Thu, Jun 17, 2010 at 5:26 AM, Robert Haas <robertmhaas(a)gmail.com> wrote:
>> The real problem here is that we're sending records to the slave which
>> might cease to exist on the master if it unexpectedly reboots. �I
>> believe that what we need to do is make sure that the master only
>> sends WAL it has already fsync'd (Tom suggested on another thread that
>> this might be necessary, and I think it's now clear that it is 100%
>> necessary).

> The attached patch changes walsender so that it always sends WAL up to
> LogwrtResult.Flush instead of LogwrtResult.Write.

Applied, along with some minor comment improvements of my own.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers