gSoC - ADD MERGE COMMAND - code patch submission [PgSql]

Prev: Reworks of DML permission checks
Next: pgsql: Add support for TCPkeepalives on Windows, both for backend and

From: Greg Smith on 12 Jul 2010 17:16

Peter Eisentraut wrote:
> I think it's better to share code that doesn't mean project guidelines
> and solicit advice rather than not to share anything.
>

I feel the assumption that code is so valuable that it should be shared
regardless of whether it meets conventions is a flawed one for this
project. There are already dozens, if not hundreds, of useful patch
submissions that have been sent to this list, consumed time, and then
gone nowhere because they didn't happen in a way that the community was
able to integrate them properly. For anyone who isn't producing
commiter quality patches, the process is far more important than the
code if you want to get something non-trivial accomplished.

Also, producing code in whatever format you want and dumping that on the
community so that people like David Fetter waste their time cleaning it
up is not the way the GSoC work is supposed to happen. I didn't want
any other current or potential future participants in that program to
get the wrong idea from that example.

There is a brief "get to know the community" period at the beginning of
the summer schedule. I think that next year this project would be well
served to give each student a small patch to review during that time, as
a formal intro to the community process. The tendency among students to
just wander off coding without doing any interaction like that is both
common and counterproductive, given how patches to PostgreSQL actually
shuffle along toward becoming commit quality code. Far as I'm
concerned, a day spent working with the patch review checklist on
someone else's patch pays for itself tenfold when it comes time to
produce patches that others will be able to review.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 12 Jul 2010 18:59

Greg Smith <greg(a)2ndquadrant.com> writes:
> There is a brief "get to know the community" period at the beginning of
> the summer schedule. I think that next year this project would be well
> served to give each student a small patch to review during that time, as
> a formal intro to the community process. The tendency among students to
> just wander off coding without doing any interaction like that is both
> common and counterproductive, given how patches to PostgreSQL actually
> shuffle along toward becoming commit quality code. Far as I'm
> concerned, a day spent working with the patch review checklist on
> someone else's patch pays for itself tenfold when it comes time to
> produce patches that others will be able to review.

That seems like a great idea.

Is there a specific period when that's supposed to happen for GSoC
students? Can we arrange for a commitfest to be running then?
(I guess it'd need to be early in the fest, else the low-hanging
fruit will be gone already.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Boxuan Zhai on 15 Jul 2010 20:26

Dear Hackers

I considered my situation. And I found that I didn't communicate well with
you, as makes you have little confidence on my project. Most of the time I
just work by myself and not report to you frequently. I always want to
finish a solid stage progress before do a submission. This may be a bad
habit in the remote project.

In fact, I have a detailed design on how to implement the command and I am
working hard these days to catch the schedule.

In my design,
1. the merge command is firstly transformed to a "MergeStmt" node in
parser. And analyzer will generate a left out join query as the top query
(or main query). This query is similar to a SELECT command query, but I set
target relation in it. The top query will drive the scanning and joining
over target and source tables.

The merge actions are transformed into lower level queries. I create a Query
node for each of them and append them in a newly create List field
mergeActQry. The action queries have different command type and specific
target list and qual list, according to their declaration by user. But they
all share the same range table. This is because we don't need the action
queries to be planned latter. The joining strategy is decided by the top
query. We are only interest in their specific action qualifications. In
other words, these action queries are only containers for their target list
and qualifications.

2. When the query is ready, it will be send to rewriter. In this part, we
can call RewriteQuery() to handle the action queries. The UPDATE action will
trigger rules on UPDATE, and so on. What need to be noticed are: 1. the
actions of the same type should not be rewritten repeatedly. If there are
two UPDATE actions in merge command, we should not trigger the ON UPDATE
rules twice. 2. if an action type is fully replaced by rules, we should
remove all actions of this type from the action list.
Rewriter will also do some process on the target list of each action.

The first submission has finished the above part.

3. In planner, the top level query is handled in a normal way. Since it has
almost the same structure as a SELECT query, the planner() function can work
on it straight forward. However, we need a small change here. The merge
command has a target relation, which need a ctid junk attribute in the
target list. The ctid is required by the UPDATE and DELETE actions.

Besides, for each of the action queries, we also need to create a Plan node.
We don't need to do a full plan on the action queries. The crucial point is
to preprocess the target list and qualification of each action. (Explanation
for this point. The execution of a merge action is composed by two parts.
The top plan will be executed in the main loop, and return the joined tuples
one by one. And a action will apply its qualification on the returned
tuples. If succeed, it will take the action and do corresponding
modification on the target table. Thus, even we have a Plan node created for
each action, we don't want to throw it directly into Planner() function.
That will generate a new plan over the tables in Range Table, which is very
probably different with the top-level plan. If we run the action plans
directly, they will be confilict with each other).

I create a function merge_action_planner() to do this job. This part is
added at the end of standard_planner(). After that, all the plans of merge
actions are linked into a new List filed in PlannedStmt result of the top
plan.

4. When planner is finished, the plan will be send to executor through
PortalRun(). As a new command, merge will chose the PORTAL_MULTI_QUERY
strategy, and be sent to ProcessQuery() function.

5. As in the ExecutorStart() part, we need to set junkfilter for merge
command, since we have a ctid junk attr in target list. And, the merge
action plans should also be initialized and transformed into PlanState
nodes. However, the initialization over action plan is only focus on the
target list and quals. We don't need other part of traditional plan
initialization, since these action plans are not for scanning or joining
(this is the job of top plan). We only want to transform the action
information into standard format that can be used by qualification evaluator
in executor.
I HAVE DONE ALL THE ABOVE IN A SECOND SUBMISSION.

6. In ExecutorRun() part, the top plan will be passed into ExecutePlan().
The action planstates can be found in the
estate->es_plannedstmt field.
The top plan can return tuples of the left out join on source table and
target table. (I can see the tuple be returned in my codes). Thus, the
design is correct. At least the top plan can do its work well. In the
junkfilter, if we can find a non-null ctid, it is a matched tuple, or else,
it is a NOT MATCHED tuple. Then we need to evaluate the additional quals of
the actions one by one. If the evaluations of one action succeed, we will
take this action and skip the remaining ones.

Since the target list and qual expressions are all processed by rewriter,
planner and InitPlan(), I think they will be accepted by the ExecQual()
function without many problems.

This is the last step, and I am still working on it.

PS: Heikki asked me about what the "EXPLAIN MERGE ..." command will do.
Well, I have not test it, but it may through an error or just explain the
top plan, since I put the action plans in a new field, which cannot be
recognized by old functions.

Thanks!

Yours Boxuan.

From: Heikki Linnakangas on 16 Jul 2010 05:53

On 16/07/10 12:26, Boxuan Zhai wrote:
> For the EXPLAIN MERGE command, I expect it to return a result similar to
> that of a SELECT command.
>
> I think the EXPLAIN command is to show how the tables in a query is scaned
> and joined. In my design, the merge command will generate a top-level query
> (and plan) as the main query. It is in fact a left join select query over
> the source and target tables. This main query (plan) decides how the tables
> are scanned. The merge actions will not effect this process. So when we
> explain the merge command, a similar result will be returned.
>
> For example the command
> EXPLAIN
> MERGE INTO Stock USING Sale ON Stock.stock_id = Sale.sale_id
> WHEN MATCHED THEN UPDATE SET balance = balance + sale.vol;
> WHEN ....
> .....
>
> Will return a result just like that of the following command:
>
> EXPLAIN
> SELECT * FROM Sale LEFT JOIN Stock ON stock_id = sale_id;

You really need to look at the changes in 9.0 in this area, you now have
a Update/Delete/Insert node (implemented in
src/backend/executor/nodeModifyTable.c) at the top of the plan for
update/insert/delete commands:

postgres=# explain UPDATE foo SET id = 456 WHERE id = 123;
QUERY PLAN
-----------------------------------------------------------
Update (cost=0.00..40.00 rows=12 width=6)
-> Seq Scan on foo (cost=0.00..40.00 rows=12 width=6)
Filter: (id = 123)
(3 rows)

I would expect there to be a Merge node similar to that, with
Update/Insert/Delete subnodes for each action.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Boxuan Zhai on 16 Jul 2010 05:26

Hi,

For the EXPLAIN MERGE command, I expect it to return a result similar to
that of a SELECT command.

I think the EXPLAIN command is to show how the tables in a query is scaned
and joined. In my design, the merge command will generate a top-level query
(and plan) as the main query. It is in fact a left join select query over
the source and target tables. This main query (plan) decides how the tables
are scanned. The merge actions will not effect this process. So when we
explain the merge command, a similar result will be returned.

For example the command
EXPLAIN
MERGE INTO Stock USING Sale ON Stock.stock_id = Sale.sale_id
WHEN MATCHED THEN UPDATE SET balance = balance + sale.vol;
WHEN ....
......

Will return a result just like that of the following command:

EXPLAIN
SELECT * FROM Sale LEFT JOIN Stock ON stock_id = sale_id;

Yours Boxuan.

2010/7/16 Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com>

> On 16/07/10 03:26, Boxuan Zhai wrote:
>
>> PS: Heikki asked me about what the "EXPLAIN MERGE ..." command will do.
>> Well, I have not test it, but it may through an error or just explain the
>> top plan, since I put the action plans in a new field, which cannot be
>> recognized by old functions.
>>
>
> I meant what EXPLAIN MERGE output will look like after the project is
> finished, not what it will do at this stage. I was trying to get a picture
> of how you're thinking to implement the executor, what nodes there is in a
> MERGE plan.
>
> --
> Heikki Linnakangas
>
> EnterpriseDB http://www.enterprisedb.com
>

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: Reworks of DML permission checks
Next: pgsql: Add support for TCPkeepalives on Windows, both for backend and