Remaining Streaming Replication Open Items [PgSql]

Prev: [HACKERS] Hot Standby: Startup at shutdown checkpoint
Next: SELECT constant; takes 15x longer on 9.0?

From: Fujii Masao on 6 Apr 2010 05:31

On Tue, Apr 6, 2010 at 4:09 PM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> I triaged the list of open items on the Streaming Replication wiki page.
> I propose that we drop the ones I've marked as Drop below, and move the
> remaining items to the main Open Items page for better visibility. And
> of course try to resolve them as quickly as possible.

Thanks so much!!

>> * Walsender and dblink are not interruptible on win32. - related thread
>
> I'd actually be happy to just leave it for 9.0, but it seems like
> consensus has been reached on how to fix it, and Fujii is working on a
> patch, so let's follow that through.

Yeah, I'm reworking the patch, but I'd like to take aim at only walreceiver
because the change for dblink might become too big at this point. Since no
one has complained about the long-term problem of dblink, I'm no sure it
really should be fixed right now.

>> * Add the GUC parameter to specify the maximum number of log file segments held in pg_xlog directory to send to the standby server. Which is useful to avoid disk full in the primary.
>
> Not only to avoid disk full in primary but also to make it feasible to
> use streaming replication without archiving. It's a small change, we
> should do it.

Yep.

>> * pg_xlogfile_name(pg_last_xlog_receive/replay_location()) might report the wrong name. Because a backend cannot know the actual timeline which is related to the location.
>
> Drop. It's not clear which timeline those functions should return in
> boundary cases, when replaying records from a log file where the
> timeline-switch occurs.

OK, but we need to add the note about that confusing behavior.
How about?:

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 57163da..da3253f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -13206,6 +13206,8 @@ postgres=# SELECT * FROM
pg_xlogfile_name_offset(pg_stop_backup());
This is usually the desired behavior for managing transaction log archiving
behavior, since the preceding file is the last one that currently
needs to be archived.
+ Note that <function>pg_xlogfile_name</> and
<function>pg_xlogfile_name_offset</>
+ always return an inaccurate result during recovery.
</para>

<para>
@@ -13279,6 +13281,11 @@ postgres=# SELECT * FROM
pg_xlogfile_name_offset(pg_stop_backup());
</table>

<para>
+ Note that <function>pg_xlogfile_name</> and
<function>pg_xlogfile_name_offset</>
+ always return an inaccurate result from any of the above locations.
+ </para>
+
+ <para>
The functions shown in <xref linkend="functions-admin-dbsize"> calculate
the disk space usage of database objects.
</para>

>> * The documentation needs to be improved.
>
> I've done as much as I can on my own, what we need now is feedback on
> what needs to be improved. So I'd like to drop this, but let's add new
> more specific items about what needs to be improved, as people speak up.

Yep.

>> * Redefine smart shutdown in standby mode?
>
> Drop. Too big a change at this point.

I don't think that it's too big, but OK. And, ISTM we need to add the note
about the longstanding confusing behavior if it's dropped. How about?:

diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 594bd7d..f8899e4 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1339,6 +1339,7 @@ echo -17 > /proc/self/oom_adj
active, new connections will still be allowed, but only to superusers
(this exception allows a superuser to connect to terminate
online backup mode).
+ If the server is in recovery, it additionally waits for recovery to end.
</para>
</listitem>
</varlistentry>

>> * Quotes can't be escaped in recovery.conf
>
> Under discussion. Not specific to streaming replication, and it's a
> pre-existing issue, but should be fixed IMHO.

Yep.

>> * Change the "standby mode" name.
>
> Bikeshedding without consensus. I like the "standby mode" the best as
> discussed on that thread, better than any of the proposed alternatives.
> Drop this item.

Yep.

>> * Fix things so that any such variables inherited from the server environment are intentionally *NOT* used for making SR connections.
>
> Drop. Besides, we have the same problem with dblink, and I don't recall
> anyone complaining.

Yep, but I don't think that dblink has the same issue because it's often
used to connect to another database on the same postgres instance, which
seems proper method. The problem is that walreceiver might wrongly connect
to *its* server and get stuck because no WAL records arrive for ever.
Since currently we don't allow the standby to accept the replication
connection, the problem will not happen in 9.0, and ISTM we don't need
to address it right now. So I agree to drop.

>> * If standby_mode is enabled, and neither primary_conninfo nor restore_command are set, the standby would get stuck.
>
> It's not really stuck, it will replay any WAL files you drop into
> pg_xlog. I concur with Robert Haas though that it shouldn't print the
> message to the log every few seconds. It should print a message the
> first time it hits the end of WAL, but subsequent messages should be
> suppressed until some progress has been made.

Yep, but it seems difficult to implement that. So I'd drop the suppression.

>> * Remove the unnecessary section about HS from recovery.conf.sample
>
> Yeah, let's do it.

Yep.

>> * The replication connections consume superuser_reserved_connections slots.
>
> I'd still like to change this slightly, per my suggestion on that
> thread, but I don't feel strongly about it. It doesn't seem like a very
> big change to me, but Tom felt otherwise.

I feel the same.

>> * Add missing description about WAL-logging.
>
> Small documentation change. Needs to be done I guess.

Yep.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 6 Apr 2010 08:08

I wrote my previous email before reading this.

On Tue, Apr 6, 2010 at 3:09 AM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> I triaged the list of open items on the Streaming Replication wiki page.
> I propose that we drop the ones I've marked as Drop below, and move the
> remaining items to the main Open Items page for better visibility. And
> of course try to resolve them as quickly as possible.
>
>> * Walsender and dblink are not interruptible on win32. - related thread
>
> I'd actually be happy to just leave it for 9.0, but it seems like
> consensus has been reached on how to fix it, and Fujii is working on a
> patch, so let's follow that through.

Agree.

>> * Add the GUC parameter to specify the maximum number of log file segments held in pg_xlog directory to send to the standby server. Which is useful to avoid disk full in the primary.
>
> Not only to avoid disk full in primary but also to make it feasible to
> use streaming replication without archiving. It's a small change, we
> should do it.

Do we have a working patch?

>> * pg_xlogfile_name(pg_last_xlog_receive/replay_location()) might report the wrong name. Because a backend cannot know the actual timeline which is related to the location.
>
> Drop. It's not clear which timeline those functions should return in
> boundary cases, when replaying records from a log file where the
> timeline-switch occurs.

Agree.

>> * The documentation needs to be improved.
>
> I've done as much as I can on my own, what we need now is feedback on
> what needs to be improved. So I'd like to drop this, but let's add new
> more specific items about what needs to be improved, as people speak up.

Agree. It's hard to think of this as a beta-blocker without more
specific feedback.

>> * Redefine smart shutdown in standby mode?
>
> Drop. Too big a change at this point.

We have a working patch for this - I want to commit it. I don't think
it's a big change, and the current behavior is extremely pathological.

>> * Quotes can't be escaped in recovery.conf
>
> Under discussion. Not specific to streaming replication, and it's a
> pre-existing issue, but should be fixed IMHO.

Fine with me.

>> * Change the "standby mode" name.
>
> Bikeshedding without consensus. I like the "standby mode" the best as
> discussed on that thread, better than any of the proposed alternatives.
> Drop this item.

OK.

>> * Fix things so that any such variables inherited from the server environment are intentionally *NOT* used for making SR connections.
>
> Drop. Besides, we have the same problem with dblink, and I don't recall
> anyone complaining.

Agree. I think that whole issue is bikeshedding.

>> * If standby_mode is enabled, and neither primary_conninfo nor restore_command are set, the standby would get stuck.
>
> It's not really stuck, it will replay any WAL files you drop into
> pg_xlog. I concur with Robert Haas though that it shouldn't print the
> message to the log every few seconds. It should print a message the
> first time it hits the end of WAL, but subsequent messages should be
> suppressed until some progress has been made.

Any idea how to implement this?

>> * Remove the unnecessary section about HS from recovery.conf.sample
>
> Yeah, let's do it.

Don't care.

>> * The replication connections consume superuser_reserved_connections slots.
>
> I'd still like to change this slightly, per my suggestion on that
> thread, but I don't feel strongly about it. It doesn't seem like a very
> big change to me, but Tom felt otherwise.

Agree, we should fix it.

>> * Add missing description about WAL-logging.
>
> Small documentation change. Needs to be done I guess.

No strong feelings.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on 6 Apr 2010 10:36

Robert Haas wrote:
> On Tue, Apr 6, 2010 at 3:09 AM, Heikki Linnakangas
> <heikki.linnakangas(a)enterprisedb.com> wrote:
>>> * Add the GUC parameter to specify the maximum number of log file segments held in pg_xlog directory to send to the standby server. Which is useful to avoid disk full in the primary.
>> Not only to avoid disk full in primary but also to make it feasible to
>> use streaming replication without archiving. It's a small change, we
>> should do it.
>
> Do we have a working patch?

No.

>>> * Redefine smart shutdown in standby mode?
>> Drop. Too big a change at this point.
>
> We have a working patch for this - I want to commit it. I don't think
> it's a big change, and the current behavior is extremely pathological.

Oh, ok. I didn't look at the latest patch, if it looks good to you, fine
with me.

>>> * If standby_mode is enabled, and neither primary_conninfo nor restore_command are set, the standby would get stuck.
>> It's not really stuck, it will replay any WAL files you drop into
>> pg_xlog. I concur with Robert Haas though that it shouldn't print the
>> message to the log every few seconds. It should print a message the
>> first time it hits the end of WAL, but subsequent messages should be
>> suppressed until some progress has been made.
>
> Any idea how to implement this?

I'll take a look. It shouldn't be too hard.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 6 Apr 2010 11:06

On Tue, Apr 6, 2010 at 10:36 AM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> Robert Haas wrote:
>> On Tue, Apr 6, 2010 at 3:09 AM, Heikki Linnakangas
>> <heikki.linnakangas(a)enterprisedb.com> wrote:
>>>> * Add the GUC parameter to specify the maximum number of log file segments held in pg_xlog directory to send to the standby server. Which is useful to avoid disk full in the primary.
>>> Not only to avoid disk full in primary but also to make it feasible to
>>> use streaming replication without archiving. It's a small change, we
>>> should do it.
>>
>> Do we have a working patch?
>
> No.

:-(

>>>> * Redefine smart shutdown in standby mode?
>>> Drop. Too big a change at this point.
>>
>> We have a working patch for this - I want to commit it. I don't think
>> it's a big change, and the current behavior is extremely pathological.
>
> Oh, ok. I didn't look at the latest patch, if it looks good to you, fine
> with me.

I'll commit it tonight.

>>>> * If standby_mode is enabled, and neither primary_conninfo nor restore_command are set, the standby would get stuck.
>>> It's not really stuck, it will replay any WAL files you drop into
>>> pg_xlog. I concur with Robert Haas though that it shouldn't print the
>>> message to the log every few seconds. It should print a message the
>>> first time it hits the end of WAL, but subsequent messages should be
>>> suppressed until some progress has been made.
>>
>> Any idea how to implement this?
>
> I'll take a look. It shouldn't be too hard.

The tricky part, I believe, is that there's more than one message that
can potentially be emitted, and you don't want ANY of them to repeat
every 2 s, so some thought needs to be given to where to hook in the
logic.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 6 Apr 2010 12:07

Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes:
> I triaged the list of open items on the Streaming Replication wiki page.
> I propose that we drop the ones I've marked as Drop below, and move the
> remaining items to the main Open Items page for better visibility.

By "drop" do you mean "move to TODO"? At least some of these issues
should be addressed in 9.1 or later. Perhaps some can really be
dropped, but it's not clear which.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2 3 4 5 6
Prev: [HACKERS] Hot Standby: Startup at shutdown checkpoint
Next: SELECT constant; takes 15x longer on 9.0?