Streaming replication and non-blocking I/O [PgSql]

Prev: Sought after architectures for the PostgreSQL buildfarm?
Next: [HACKERS] 答复: questions about concurrency control in Postgresql

From: Heikki Linnakangas on 12 Dec 2009 03:09

Fujii Masao wrote:
> On Thu, Dec 10, 2009 at 12:00 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>>> The OS buffer is expected to be able to store a large number of
>>> XLogRecPtr messages, because its size is small. So it's also OK
>>> to just drop it.
>> It certainly seems to be something we could improve later, when and
>> if evidence emerges that it's a real-world problem. For now,
>> simple is beautiful.
>
> I just dropped the backend libpq changes related to non-blocking I/O.
>
> git://git.postgresql.org/git/users/fujii/postgres.git
> branch: replication

Thanks, much simpler now.

Changing the finish_time argument to pqWaitTimed into timeout_ms changes
the behavior connect_timeout option to PQconnectdb. It should wait for
max connect_timeout seconds in total, but now it is waiting for
connect_timeout seconds at each step in the connection process: opening
a socket, authenticating etc.

Could we change the API of PQgetXLogData to be more like PQgetCopyData?
I'm thinking of removing the timeout argument, and instead looping with
select/poll and PQconsumeInput in the caller. That probably means
introducing a new state analogous to PGASYNC_COPY_IN. I haven't thought
this fully through yet, but it seems like it would be good to have a
consistent API.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 12 Dec 2009 10:19

Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes:
> Changing the finish_time argument to pqWaitTimed into timeout_ms changes
> the behavior connect_timeout option to PQconnectdb. It should wait for
> max connect_timeout seconds in total, but now it is waiting for
> connect_timeout seconds at each step in the connection process: opening
> a socket, authenticating etc.

Refresh my memory as to why this patch is touching any of that code at
all?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on 12 Dec 2009 15:42

Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes:
>> Changing the finish_time argument to pqWaitTimed into timeout_ms changes
>> the behavior connect_timeout option to PQconnectdb. It should wait for
>> max connect_timeout seconds in total, but now it is waiting for
>> connect_timeout seconds at each step in the connection process: opening
>> a socket, authenticating etc.
>
> Refresh my memory as to why this patch is touching any of that code at
> all?

Walreceiver wants to wait for data to arrive from the master or a
signal. PQgetXLogData(), which is the libpq function to read a piece of
WAL, takes a timeout argument to support that. Walreceiver calls
PQgetXLogData() in an endless loop, checking for a received sighup or
death of postmaster at every iteration.

In the synchronous replication mode, I presume it's also going to listen
for a signal from the startup process, so that it can send a
acknowledgment to the master as soon as a COMMIT record has been
replayed that a backend on the master is waiting for.

To implement the timeout in PQgetXLogData(), pqWaitTimed() was changed
to take a timeout instead of finishing_time argument. Which is a mistake
because it breaks PQconnectdb, and as I said I don't think
PQgetXLogData(9 should have a timeout argument to begin with. Instead,
it should have a boolean 'async' argument to return immediately if
there's no data, and walreceiver main loop should call poll()/select()
to wait. Ie. just like PQgetCopyData() works.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on 13 Dec 2009 21:20

On Sun, Dec 13, 2009 at 5:42 AM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> Walreceiver wants to wait for data to arrive from the master or a
> signal. PQgetXLogData(), which is the libpq function to read a piece of
> WAL, takes a timeout argument to support that. Walreceiver calls
> PQgetXLogData() in an endless loop, checking for a received sighup or
> death of postmaster at every iteration.
>
> In the synchronous replication mode, I presume it's also going to listen
> for a signal from the startup process, so that it can send a
> acknowledgment to the master as soon as a COMMIT record has been
> replayed that a backend on the master is waiting for.

Right.

> To implement the timeout in PQgetXLogData(), pqWaitTimed() was changed
> to take a timeout instead of finishing_time argument. Which is a mistake
> because it breaks PQconnectdb, and as I said I don't think
> PQgetXLogData(9 should have a timeout argument to begin with. Instead,
> it should have a boolean 'async' argument to return immediately if
> there's no data, and walreceiver main loop should call poll()/select()
> to wait. Ie. just like PQgetCopyData() works.

Seems good. I'll revise the code.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 13 Dec 2009 21:38

Fujii Masao <masao.fujii(a)gmail.com> writes:
> On Sun, Dec 13, 2009 at 5:42 AM, Heikki Linnakangas
> <heikki.linnakangas(a)enterprisedb.com> wrote:
>> To implement the timeout in PQgetXLogData(), pqWaitTimed() was changed
>> to take a timeout instead of finishing_time argument. Which is a mistake
>> because it breaks PQconnectdb, and as I said I don't think
>> PQgetXLogData(9 should have a timeout argument to begin with. Instead,
>> it should have a boolean 'async' argument to return immediately if
>> there's no data, and walreceiver main loop should call poll()/select()
>> to wait. Ie. just like PQgetCopyData() works.

> Seems good. I'll revise the code.

Do we need a new "PQgetXLogData" function at all? Seems like you could
shove the data through the COPY protocol and not have to touch libpq
at all, rather than duplicating a nontrivial amount of code there.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: Sought after architectures for the PostgreSQL buildfarm?
Next: [HACKERS] 答复: questions about concurrency control in Postgresql