From: Josh Berkus on
On 12/15/09 1:05 AM, Takahiro Itagaki wrote:
> Here is an updated patch rebased to the latest CVS HEAD.
>
> One remaining concern is VERBOSE. Log messages by FULL (rewrite) are less
> verbose than FULL INPLACE. The same can be said for CLUSTER VERBOSE, though.
> I don't have any plans to make CLUSTER more verbose in the patch, but
> "more verbose CLUSTER" could be a TODO item.

That's of necessity; the new CLUSTER isn't checking the contents of the
table. However, it could report:

Size of table before VF
Size of table after VF
Space reclaimed
Index space reclaimed (per index)

--Josh Berkus


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Takahiro Itagaki on

Alvaro Herrera <alvherre(a)commandprompt.com> wrote:

> Hmm. With this patch, if I do "vacuumdb -f" it will not vacuum the
> special system catalogs that can only be vacuumed with INPLACE, correct?

No. It will vacuum normal tables with FULL (rewrite), and system catalogs
with FULL INPLACE. FULL vacuums for system catalogs always fall back to
INPLACE vacuum silently.

But certainly we cannot recommend to use daily database-wide VACUUM FULLs
because they have higher costs than repeated FULL INPLACE vacuums.
FULL (rewrite) will not be cheaper for tables that have little dead tuples.
Just an idea, something like "vacuumdb -f --threshold=<some baseline>"
might be useful for users running daily "vacuumdb -f".

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Mon, 2009-12-07 at 16:55 +0900, Itagaki Takahiro wrote:
> Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>
> > You should take those out again; if I am the committer I certainly will.
> > Such a test will guarantee complete instability of every other
> > regression test, and it's not worth it.
>
> I read the original comment was saying to add regression tests for
> database-wide vacuums. But I'll reduce the range of vacuum if they
> are not acceptable.
>
> The new patch contains only table-based vacuum for local tables and some of
> system tables to test non-INPLACE vacuum are not used for system tables.
> VACUUM FULL pg_am;
> VACUUM FULL pg_class;
> VACUUM FULL pg_database;

Thanks for adding those additional tests.

I notice that during copy_heap_data() we make no attempt to skip pages
that are all visible according to the visibilitymap. It seems like it
would be a substantial win to copy whole blocks if all the
pre-conditions are met (I see what they are). I'm surprised to see that
neither CLUSTER nor VACUUM FULL made use of this previously. I think we
either need to implement that or document that vacuum will not skip
all-visible pages when running VACUUM FULL.

Also, I notice that if we perform new VACUUM FULL on a table it will
fully re-write the table and rebuild indexes, even if it hasn't found a
single row to remove.

Old VACUUM FULL was substantially faster than this on tables that had
nothing to remove. We aren't asking users to recode anything, so many
people will be performing "VACUUM FULL;" as usual every night or
weekend. If they do that it will result in substantially longer run
times in many databases, all while holding AccessExclusiveLocks.

Please can you arrange for the cluster operation to skip rebuilding
indexes if the table is not reduced in size?

Part of the reason why these happen is that the code hasn't been
refactored much at all from its original use for cluster. There are
almost zero comments to explain the additional use of this code for
VACUUM, or at least to explain it still all works even when there is no
index. e.g. check_index_is_clusterable() ought not to be an important
routine when there is no index being clustered. I'm seeing that the code
all works but that this patch isn't yet a sufficiently permanent change
to the code for me to commit, though it could be soon.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on
Simon Riggs wrote:
> I notice that during copy_heap_data() we make no attempt to skip pages
> that are all visible according to the visibilitymap. It seems like it
> would be a substantial win to copy whole blocks if all the
> pre-conditions are met (I see what they are). I'm surprised to see that
> neither CLUSTER nor VACUUM FULL made use of this previously. I think we
> either need to implement that or document that vacuum will not skip
> all-visible pages when running VACUUM FULL.

Unfortunately the visibility map isn't completely crash-safe at the
moment (see comments in visibilitymap.c for details). So it's not safe
to use it for such purposes. I was planning to address that in 8.5 but
it seems I won't have the time.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on
On Mon, Dec 21, 2009 at 12:56 PM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> Simon Riggs wrote:
>> I notice that during copy_heap_data() we make no attempt to skip pages
>> that are all visible according to the visibilitymap. It seems like it
>> would be a substantial win to copy whole blocks if all the
>> pre-conditions are met (I see what they are). I'm surprised to see that
>> neither CLUSTER nor VACUUM FULL made use of this previously. I think we
>> either need to implement that or document that vacuum will not skip
>> all-visible pages when running VACUUM FULL.
>
> Unfortunately the visibility map isn't completely crash-safe at the
> moment (see comments in visibilitymap.c for details). So it's not safe
> to use it for such purposes. I was planning to address that in 8.5 but
> it seems I won't have the time.

Well since we're going to have to read in the page to do the copy we
could just use the page header flag PD_ALL_VISIBLE instead.

But sequential scans already use that bit and I'm assuming but haven't
checked that these access paths do use the same underlying access path
as sequential scans. In which case it won't really save much since the
main advantage would be skipping the visibility checks. Saving the
actual work to copy tuples retail instead of the whole block wholesale
seems unlikely to buy much and would result in us not compacting space
on the page and storing accurate free space map values which I think
people would expect from both of these commands.

If I'm wrong and these commands are not using a sequential scan under
the hood or the fact that they're using SNAPSHOT_ANY defeats that
optimization then perhaps there is something there. On the third hand
presumably all the hint bits will be set if the page bit is set so
perhaps there's nothing there even so.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers