bg worker: overview [PgSql]

Prev: ERROR: argument to pg_get_expr() must come from system catalogs
Next: [HACKERS] testing plpython3u on 9.0beta3

From: Simon Riggs on 17 Jul 2010 09:56

On Sat, 2010-07-17 at 13:47 +0200, Markus Wanner wrote:

> Are the descriptive mails I sent for each patch going into the right
> direction and just need to be extended, in your opinion? Or are you
> really missing something in there?

Not detailed enough, for me, by a long way. Your notes read like an
update for someone that's been following your work in detail up to this
point. I apologise that I have not been able to do that.

If I was going to write a module that used the facilities you are
providing, what would I need to know?
e.g. http://developer.postgresql.org/pgdocs/postgres/indexam.html

> It's easier to answer more specific questions.

Agreed. I don't have enough information to ask any. It's hard to write
simple and clear design specs but no harder than writing the code;
reading someone else's code to discover what it does is very hard (for
me).

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Dimitri Fontaine on 23 Jul 2010 15:45

Markus Wanner <markus(a)bluegap.ch> writes:
> Daemon code? That sounds like it could be an addition to the
> coordinator, which I'm somewhat hesitant to extend, as it's a pretty
> critical process (especially for Postgres-R).
[...]
> However, note that the coordinator is designed to be just a message
> passing or routing process, which should not do any kind of time
> consuming processing. It must *coordinate* things (well, jobs) and react
> promptly. Nothing else.

Yeah, I guess user daemons would have to be workers, not plugins you
want to load into the coordinator.

> On the other side, the background workers have a connection to exactly
> one database. They are supposed to do work on that database.

Is that because of the way backends are started, and to avoid having to
fork new ones too often?

> The background workers can easily load external libraries - just as a
> normal backend can with LOAD. That would also provide better
> encapsulation (i.e. an error would only tear down that backend, not the
> coordinator). You'd certainly have to communicate between the
> coordinator and the background worker. I'm not sure how match that fits
> your use case.

Pretty well I think.

> The thread on -performance is talking quite a bit about connection
> pooling. The only way I can imagine some sort of connection pooling to
> be implemented on top of bgworkers would be to let the coordinator
> listen on an additional port and pass on all requests to the bgworkers
> as jobs (using imessages). And of course send back the responses to the
> client. I'm not sure how that overhead compares to using pgpool or
> pgbouncer. Those are also separate processes through which all of your
> data must flow. They use plain system sockets, imessages use signals and
> shared memory.

Yeah. The connection pool is better outside of code. Let's think PGQ and
a internal task scheduler first, if we think at any generalisation.

Regards,
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Markus Wanner on 24 Jul 2010 10:28

Hi,

On 07/23/2010 09:45 PM, Dimitri Fontaine wrote:
> Yeah, I guess user daemons would have to be workers, not plugins you
> want to load into the coordinator.

Okay.

>> On the other side, the background workers have a connection to exactly
>> one database. They are supposed to do work on that database.
>
> Is that because of the way backends are started, and to avoid having to
> fork new ones too often?

For one, yes, I want to avoid having to start ones too often. I did look
into letting these background workers switch the database connection,
but that turned out not to be worth the effort.

Would you prefer a background worker that's not connected to a database,
or why are you asking?

>> The background workers can easily load external libraries - just as a
>> normal backend can with LOAD. That would also provide better
>> encapsulation (i.e. an error would only tear down that backend, not the
>> coordinator). You'd certainly have to communicate between the
>> coordinator and the background worker. I'm not sure how match that fits
>> your use case.
>
> Pretty well I think.

Go ahead, re-use the background workers. That's what I've published them
for ;-)

> Yeah. The connection pool is better outside of code. Let's think PGQ and
> a internal task scheduler first, if we think at any generalisation.

To be honest, I still don't quite grok the concept behind PGQ. So I
cannot really comment on this.

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Dimitri Fontaine on 24 Jul 2010 13:26

Markus Wanner <markus(a)bluegap.ch> writes:
> For one, yes, I want to avoid having to start ones too often. I did look
> into letting these background workers switch the database connection, but
> that turned out not to be worth the effort.
>
> Would you prefer a background worker that's not connected to a database, or
> why are you asking?

Trying to figure out how it would fit the PGQ and pgagent needs. But
maybe user defined daemons should be sub-coordinators (I used to think
about them as "supervisors") able to talk to the coordinator to get a
backend connected to some given database and distribute work to it.

You're using iMessage as the data exchange, how are you doing the work
distribution? What do you use to tell the backend what is the processing
you're interrested into?

> Go ahead, re-use the background workers. That's what I've published
> them for

Hehe :) The aim of this thread would be to have your input as far as
designing an API would go, now that we're about on track as to what the
aim is.

> To be honest, I still don't quite grok the concept behind PGQ. So I cannot
> really comment on this.

In very short, the idea is a clock that ticks and associate
current_txid() to now(), so that you're able to say "give me 3s worth of
transactions activity from this queue". It then provides facilities to
organise a queue into batches at consumer request, and for more details,
see there:

http://github.com/markokr/skytools-dev/blob/master/sql/ticker/pgqd.c
http://github.com/markokr/skytools-dev/blob/master/sql/ticker/ticker.c

But the important thing as far as making it a child of the coordinator
goes would be, I guess, that it's some C code running as a deamon and
running SQL queries from time to time. The SQL queries are calling C
user defined functions, provided by the PGQ backend module.

Regards,
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Markus Wanner on 26 Jul 2010 02:48

Hey Dimitri,

On 07/24/2010 07:26 PM, Dimitri Fontaine wrote:
> Trying to figure out how it would fit the PGQ and pgagent needs. But
> maybe user defined daemons should be sub-coordinators (I used to think
> about them as "supervisors") able to talk to the coordinator to get a
> backend connected to some given database and distribute work to it.

Hm.. sounds like an awful lot of work to me, but if you need the
separation and security of a separate process...

To simplify, you might want to start a bgworker on database 'postgres',
which then acts as a sub-coordinator (and doesn't really need to use its
database connection).

> You're using iMessage as the data exchange, how are you doing the work
> distribution? What do you use to tell the backend what is the processing
> you're interrested into?

Well, there are different types of imessages defined in imsg.h. If you
are coding something within Postgres, you'd just add all the required
messages types there. There's no such thing as an external registration
for new message types.

For example, for autovacuum, there are two message types:
IMSGT_PERFORM_VACUUM, that's sent from the coordinator to a bgworker,
and initiates a vacuum job there. Then there's IMSGT_FORCE_VACUUM, which
is sent from a backend to the coordinator to inform it that a certain
database urgently needs vacuuming.

For Postgres-R, things are a bit more complicated. The first IMSGT_CSET
messages starts the application of a remote transaction. Further
IMSGT_CSET messages may follow. The IMSGT_ORDERING message finally
completes the job.

So, imessage types cannot be mapped to jobs directly. See
include/postmaster/coordinator.h, enum worker_state. Those are the
possible states a worker can be in (job types).

Adding a job would consist of adding a worker_state, plus at least one
imessage type. Once the worker is done with its job, it returns
IMSGT_READY to the coordinator.

I'm open to refinements, such as assigning a certain range of message
types to external use or some such. However, have no idea how to avoid
clashing message type ids, then. Maybe those should still be part of imsg.h?

>> Go ahead, re-use the background workers. That's what I've published
>> them for
>
> Hehe :) The aim of this thread would be to have your input as far as
> designing an API would go, now that we're about on track as to what the
> aim is.

Oh, sure. :-)

> In very short, the idea is a clock that ticks and associate
> current_txid() to now(), so that you're able to say "give me 3s worth of
> transactions activity from this queue". It then provides facilities to
> organise a queue into batches at consumer request, and for more details,
> see there:
>
> http://github.com/markokr/skytools-dev/blob/master/sql/ticker/pgqd.c
> http://github.com/markokr/skytools-dev/blob/master/sql/ticker/ticker.c

Okay, thanks for the pointers. However, comments are relatively sparse
in there as well...

> But the important thing as far as making it a child of the coordinator
> goes would be, I guess, that it's some C code running as a deamon and
> running SQL queries from time to time. The SQL queries are calling C
> user defined functions, provided by the PGQ backend module.

You could certainly define jobs, which don't ever terminate. And calling
SQL queries certainly sounds more like a background job to me, than
something belonging to the sphere of the coordinator. Sorry, if my first
impulse has been misleading.

So, the bgworker infrastructure could probably satisfy the internal
communication needs. But how does this ticker daemon talk to the
outside? Does it need to open a socket and listen there? Or do the
requests to that queue come in via SQL?

Regards

Markus

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: ERROR: argument to pg_get_expr() must come from system catalogs
Next: [HACKERS] testing plpython3u on 9.0beta3