beta3 & the open items list [PgSql]

Prev: About tapes
Next: [HACKERS] beta3 & the open items list

From: Robert Haas on 20 Jun 2010 23:54

On Sun, Jun 20, 2010 at 9:31 PM, Greg Stark <gsstark(a)mit.edu> wrote:
> On Mon, Jun 21, 2010 at 12:42 AM, Florian Pflug <fgp(a)phlo.org> wrote:
>> I'd buy that if all timeouts and retry counts would default to +infinity. But they don't, and hence sufficiently long network outages *will* cause connection aborts anyway. That a particular connection might survive due to inactivity proves nothing, since whether the connection is active or inactive during an outage is usually outside of anyone's control.
>>
>> I really fail to see why anyone would prefer connections (and therefore transactions!) getting stuck forever over a few spurious disconnects. The former always require manual intervention and cause all sorts of performance and disk-space issues, while the latter won't even be an issue for well-written clients who just reconnect and retry.
>>
>
> So just as a data point I'm routinely annoyed by reopening my screen
> session and finding various session sessions have died since the day
> before. Usually this is caused by broken firewalls but there are also
> a bunch of SSH options which some servers have enabled which cause my
> sessions to never survive very long if there are any network outages.
> Servers where those options are disabled work fine.
>
> I admit this is a very different use case though and since we have
> control over the behaviour when the connection breaks perhaps the
> analogy falls apart completely. I'm not sure we can guarantee that
> reconnecting is always so simple though. What if the user set up an
> SSH gateway or needs some extra authentication to make the connection.
> Are users expecting the slave to randomly disconnect and reconnect
> willy nilly or are they expecting that once it connects it'll keep
> using that connection forever?

I feel like we're getting off in the weeds, here. Obviously, the user
would ideally like the connection to the master to last forever, but
equally obviously, if the master unexpectedly reboots, they'd like the
slave to notice - ideally within some reasonable time period - that it
needs to reconnect. There's no perfect way to distinguish "the master
croaked" from "the network administrator unplugged the Ethernet cable
and is planning to plug it back in any hour now", so we'll just need
to pick some reasonable timeout and go with it. To my way of
thinking, if the master hasn't responded in a minute or two, that's a
sign that it's time to declare the connection dead. Retrying the
connection *should* be cheap. If the user has set things up so that a
TCP connection from slave to master is not straightforward, the user
has configured it incorrectly, and no matter what we do it's not going
to be reliable.

I still think there's a decent argument that we might want to have a
protocol-level heartbeat rather than a TCP-level heartbeat. But doing
the latter is, I think, good enough for 9.0. We're pretty much
speculating about what the problems with that approach might be, so
getting too worked up about fixing them at this point seems premature.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 21 Jun 2010 07:11

On Mon, Jun 21, 2010 at 4:37 AM, Greg Stark <gsstark(a)mit.edu> wrote:
> On Mon, Jun 21, 2010 at 4:54 AM, Robert Haas <robertmhaas(a)gmail.com> wrote:
>> I feel like we're getting off in the weeds, here. �Obviously, the user
>> would ideally like the connection to the master to last forever, but
>> equally obviously, if the master unexpectedly reboots, they'd like the
>> slave to notice - ideally within some reasonable time period - that it
>> needs to reconnect.
>
>
>
>> �There's no perfect way to distinguish "the master
>> croaked" from "the network administrator unplugged the Ethernet cable
>> and is planning to plug it back in any hour now", so we'll just need
>> to pick some reasonable timeout and go with it.

Eh... was there supposed to be some text here?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on 21 Jun 2010 04:37

On Mon, Jun 21, 2010 at 4:54 AM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> I feel like we're getting off in the weeds, here. �Obviously, the user
> would ideally like the connection to the master to last forever, but
> equally obviously, if the master unexpectedly reboots, they'd like the
> slave to notice - ideally within some reasonable time period - that it
> needs to reconnect.

> �There's no perfect way to distinguish "the master
> croaked" from "the network administrator unplugged the Ethernet cable
> and is planning to plug it back in any hour now", so we'll just need
> to pick some reasonable timeout and go with it.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 21 Jun 2010 12:45

On Sun, Jun 20, 2010 at 5:52 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> On a quick read, I think I see a problem with this: if a parameter is
>> specified with a non-zero value and there is no OS support available
>> for that parameter, it's an error. �Presumably, for our purposes here,
>> we'd prefer to simply ignore any parameters for which OS support is
>> not available. �Given the nature of these parameters, one might argue
>> that's a more useful behavior in general.
>
>> Also, what about Windows?
>
> Well, of course that patch hasn't been reviewed yet ... but shouldn't we
> just be copying the existing server-side behavior, as to both points?

The existing server-side behavior is apparently to do elog(LOG) if a
given parameter is unsupported; I'm not sure what the equivalent for
libpq would be.

The current code does not seem to have any special cases for Windows
in this area, but that doesn't tell me whether it works or not. It
looks like Windows must at least report success when you ask to turn
on keepalives, but whether it actually does anything, and whether
there extra parameters exist/work, I can't tell.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Thom Brown on 28 Jun 2010 04:40

On 19 June 2010 14:43, Robert Haas <robertmhaas(a)gmail.com> wrote:
> It would be nice if we could make a final push to get these issues
> resolved and another beta out the door before the end of the month...

So should we expect beta3 imminently, or are these issues still outstanding?

Thanks

Thom

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev |
Pages: 1 2 3 4 5 6
Prev: About tapes
Next: [HACKERS] beta3 & the open items list