From: Tom Lane on 22 Jun 2010 12:50 Robert Haas <robertmhaas(a)gmail.com> writes: > What does bother me is the fact that we are engineering a critical > aspect of our system reliability around vendor-specific implementation > details of the TCP stack, and that if any version of any operating > system that we support (or ever wish to support in the future) fails > to have a reliable implementation of this feature AND configurable > knobs that we can tune to suit our needs, then we're screwed. Does > anyone want to argue that this is NOT a house of cards? By that argument, we need to be programming to bare metal on every disk access. Does anyone want to argue that depending on vendor-specific filesystem functionality is not a house of cards? (And unfortunately, that's much too close to the truth ... but yet we're not going there.) As for the original point: *of course* we are going to have to expose the keepalive parameters. The default timeouts are specified by RFC, and they're of the order of hours. That's not going to satisfy anyone for this usage. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 22 Jun 2010 13:04 Robert Haas <robertmhaas(a)gmail.com> wrote: > What does bother me is the fact that we are engineering a critical > aspect of our system reliability around vendor-specific > implementation details of the TCP stack, and that if any version > of any operating system that we support (or ever wish to support > in the future) fails to have a reliable implementation of this > feature AND configurable knobs that we can tune to suit our needs, > then we're screwed. Does anyone want to argue that this is NOT a > house of cards? [/me raises hand] TCP keepalive has been available and a useful part of my reliability solutions since I had so find a way to clean up zombie database connections caused by clients powering down their workstations without closing their apps -- that was in OS/2 circa 1990. I'm pretty sure I've also used it on HP-UX, whatever Unix flavor was on our Sun SPARC servers, several versions of Windows, and several versions of Linux. As far as I can recall, the default was always two hours before doing anything, followed by nine small packets sent over the course of ten minutes before giving up (if none were answered). I'm not sure whether the timings were controllable through the applications, because we generally changed the OS defaults. Even so, recovery after two hours and ten minutes is way better than waiting for eternity. As someone else said, we may want to add some sort of keepalive- style ping to our application's home-grown protocol; but I don't see that as an argument to suppress a very widely supported standard protocol. These address slightly different problem sets, let's solve the one that came up in testing for the vast majority of runtime environments by turning on TCP keepalives. No, I don't see it as a house of cards. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Magnus Hagander on 22 Jun 2010 12:32 On Tue, Jun 22, 2010 at 18:16, Robert Haas <robertmhaas(a)gmail.com> wrote: > On Tue, Jun 22, 2010 at 9:27 AM, Magnus Hagander <magnus(a)hagander.net> wrote: >>> I am inclined to punt the keepalives_interval, keepalives_idle, and >>> keepalives_count parameters to 9.1. �If these are needed for >>> walreciever to work reliably, this whole approach is a dead-end, >>> because those parameters are not portable. �I will post a patch later >>> today along these lines. >> >> Do we know how unportable? If it still helps the majority, it might be >> worth doing. But I agree, if it's not really needed for walreceiver, >> then it should be punted to 9.1. > > This might not be such a good idea as I had thought. �It looks like > the default parameters on Linux (Fedora 12) are: > > tcp_keepalive_intvl:75 > tcp_keepalive_probes:9 > tcp_keepalive_time:7200 > > [ See also http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html ] > > That's clearly better than no keepalives, but I venture to say it's > not going to be anything close to the behavior people want for > walreceiver... �I think we're going to need to either vastly reduce > the keepalive time and interval, or abandon the strategy of using TCP > keepalives completely. > > Which brings us to the question of portability. �A quick search around > the Internet suggests that this is supported on recent versions of > Linux, Free/OpenBSD, AIX, and HP/UX, and it appears to work on my Mac > also. �I'm not clear how long it's been implemented on each of these > platforms, though. �With respect to Windows, it looks like there are > registry settings for all of these parameters, but I'm unclear whether > they can be set on a per-connection basis and what's required to make > this happen. I looked around quickly earlier when we chatted about this, and I think I found an API call to change them for a socket as well - but a Windows specific one, not the ones you'd find on Unix... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 22 Jun 2010 12:43 On Tue, Jun 22, 2010 at 12:32 PM, Magnus Hagander <magnus(a)hagander.net> wrote: >> Which brings us to the question of portability. �A quick search around >> the Internet suggests that this is supported on recent versions of >> Linux, Free/OpenBSD, AIX, and HP/UX, and it appears to work on my Mac >> also. �I'm not clear how long it's been implemented on each of these >> platforms, though. �With respect to Windows, it looks like there are >> registry settings for all of these parameters, but I'm unclear whether >> they can be set on a per-connection basis and what's required to make >> this happen. > > I looked around quickly earlier when we chatted about this, and I > think I found an API call to change them for a socket as well - but a > Windows specific one, not the ones you'd find on Unix... That, in itself, doesn't bother me, especially if you're willing to write and test a patch that uses them. What does bother me is the fact that we are engineering a critical aspect of our system reliability around vendor-specific implementation details of the TCP stack, and that if any version of any operating system that we support (or ever wish to support in the future) fails to have a reliable implementation of this feature AND configurable knobs that we can tune to suit our needs, then we're screwed. Does anyone want to argue that this is NOT a house of cards? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 22 Jun 2010 13:14 Robert Haas <robertmhaas(a)gmail.com> writes: > On Tue, Jun 22, 2010 at 12:50 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas(a)gmail.com> writes: >> By that argument, we need to be programming to bare metal on every disk >> access. �Does anyone want to argue that depending on vendor-specific >> filesystem functionality is not a house of cards? �(And unfortunately, >> that's much too close to the truth ... but yet we're not going there.) > I think you're making my argument for me. The file system API is far > more portable than the behavior we're proposing to depend on here, and > yet it's only arguably good enough to meet our needs. Uh, it's not API that's at issue here, and as for "not portable" I think you have failed to make that case. It is true that there are some old platforms where keepalive isn't adjustable, but I doubt that anything anyone is likely to be running mission-critical PG 9.0 on will lack it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: what exactly is a PlaceHolderVar? Next: [HACKERS] Parallel pg_restore versus old dump files |