patch - per-tablespace random_page_cost/seq_page

Prev: DTrace compiler warnings
Next: add more frame types in window functions (ROWS)

From: Robert Haas on 3 Dec 2009 11:00

On Sat, Nov 28, 2009 at 9:54 PM, David Rowley <dgrowley(a)gmail.com> wrote:
> Robert Haas Wrote:
>> Hmm. I'm not able to reliably detect a performance difference between
>> unpatched CVS HEAD (er... git master branch) and same with spcoptions-
>> v2.patch applied. I figured that if there were going to be an impact,
>> it would be most likely to manifest itself in a query that touches lots
>> and lots of tables but does very little actual work. So I used the
>> attached script to create 200 empty tables, 100 in the default
>> tablespace and 100 in tablespace "dork" (also known as, why I am
>> working on this at 11 PM on Thanksgiving). Then I did:
>>
>> SELECT * FROM a1, a2, a3, ..., a100;
>
> (I've not read the patch, but I've just read the thread)
> If you're just benchmarking the planner times to see if the extra lookups
> are affecting the planning times, would it not be better to benchmark
> EXPLAIN SELECT * FROM a1, a2, a3, ..., a100; ?
> Otherwise any small changes might be drowned out in the execution time.
> Scanning 100 relations even if they are empty could account for quite a bit
> of that time, right?

Possibly, but even if I can measure a difference doing it that way,
it's not clear that it matters. It's fairly certain that there will
be a performance degradation if we measure carefully enough, but if
that difference is imperceptible in real-world scanerios, then it's
not worth worrying about. Still, I probably will test it just to see.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 17 Dec 2009 20:57

On Thu, Dec 3, 2009 at 11:00 AM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> On Sat, Nov 28, 2009 at 9:54 PM, David Rowley <dgrowley(a)gmail.com> wrote:
>> Robert Haas Wrote:
>>> Hmm. I'm not able to reliably detect a performance difference between
>>> unpatched CVS HEAD (er... git master branch) and same with spcoptions-
>>> v2.patch applied. I figured that if there were going to be an impact,
>>> it would be most likely to manifest itself in a query that touches lots
>>> and lots of tables but does very little actual work. So I used the
>>> attached script to create 200 empty tables, 100 in the default
>>> tablespace and 100 in tablespace "dork" (also known as, why I am
>>> working on this at 11 PM on Thanksgiving). Then I did:
>>>
>>> SELECT * FROM a1, a2, a3, ..., a100;
>>
>> (I've not read the patch, but I've just read the thread)
>> If you're just benchmarking the planner times to see if the extra lookups
>> are affecting the planning times, would it not be better to benchmark
>> EXPLAIN SELECT * FROM a1, a2, a3, ..., a100; ?
>> Otherwise any small changes might be drowned out in the execution time.
>> Scanning 100 relations even if they are empty could account for quite a bit
>> of that time, right?
>
> Possibly, but even if I can measure a difference doing it that way,
> it's not clear that it matters. It's fairly certain that there will
> be a performance degradation if we measure carefully enough, but if
> that difference is imperceptible in real-world scanerios, then it's
> not worth worrying about. Still, I probably will test it just to see.

I did some fairly careful benchmarking of EXPLAIN SELECT * FROM a1,
a2, a3, ..., a100. I explained this query 100 times via DBD::Pg and
used time to measure how long the script took to run. I ran the
script three times. And the result is... the unpatched version came
out 1.7% SLOWER than the patched version. This seems difficult to
take seriously, since it can't possibly be faster to do a syscache
lookup and parse an array than it is to fetch a constant from a known
memory address, but that's what I got. At any rate, it seems pretty
clear that it's not hurting much.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 17 Dec 2009 21:15

On Thu, Nov 26, 2009 at 4:25 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> On Sat, Nov 14, 2009 at 4:58 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> I don't think there's even a
>> solid consensus right now on which GUCs people would want to set at the
>> tablespace level.
>
> This seems like an important point that we need to nail down. The
> original motivation for this patch was based on seq_page_cost and
> random_page_cost, to cover the case where, for example, one tablespace
> is on an SSD and another tablespace is on a RAID array.
>
> Greg Stark proposed adding effective_io_concurrency, and that makes
> plenty of sense to me, but I'm sort of disinclined to attempt to
> implement that as part of this patch because I have no familiarity
> with that part of the code and no hardware that I can use to test
> either the current behavior or the modified behavior. Since I'm
> recoding this to use the reloptions mechanism, a patch to add support
> for that should be pretty easy to write as a follow-on patch once this
> goes in.
>
> Any other suggestions?

Going once... going twice... since no one has suggested anything or
spoken against the proposal above, I'm just going to implement
seq_page_cost and random_page_cost for now.

> Current version of patch is attached. I've revised it to use the
> reloptions stuff, but I don't think it's committable as-is because it
> currently thinks that extracting options from a pg_tablespace tuple is
> a cheap operation, which was true in the non-reloptions-based
> implementation but is less true now. At least, some benchmarking
> needs to be done to figure out whether and to what extent this is an
> issue.

Per the email that I just sent a few minutes ago, there doesn't appear
to be a performance impact in doing this even in a relatively stupid
way - every call that requires seq_page_cost and/or random_page_cost
results in a syscache lookup and then uses the relcache machinery to
parse the returned array.

That leaves the question of what the most elegant design is here. Tom
suggested upthread that we should tag every RelOptInfo - and,
presumably, IndexOptInfo, though it wasn't discussed - with this
information. I don't however much like the idea of adding identically
named members in both places. Should the number of options expand in
the future, this will become silly very quickly. One option is to
define a struct with seq_page_cost and random_page_cost that is then
included in RelOptInfo and IndexOptInfo. It would seem to make sense
to make the struct, rather than a pointer to the struct, the member,
because it makes the copyfuncs/equalfuncs stuff easier to handle, and
there's not really any benefit in incurring more palloc overhead.

However, I'm sort of inclined to go ahead and invent a mini-cache for
tablespaces. It avoids the (apparently insignificant) overhead of
reparsing the array multiple times, but it also avoids bloating
RelOptInfo and IndexOptInfo with more members than really necessary.
It seems like a good idea to add one member to those structures
anyway, for reltablespace, but copying all the values into every one
we create just seems silly. Admittedly there are only two values
right now, but again we may want to add more someday, and caching at
the tablespace level just seems like the right way to do it.

Thoughts?

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jaime Casanova on 3 Jan 2010 18:56

On Mon, Dec 28, 2009 at 2:52 AM, Robert Haas <robertmhaas(a)gmail.com> wrote:
>
> Hearing no thoughts, I have implemented as per the above. Â PFA the
> latest version. Â Any reviews, comments, feedback, etc. much
> appreciated.
>

btw, you need to change

STATRELATT,

for

STATRELATTINH,

in syscache.c

--
Atentamente,
Jaime Casanova
Soporte y capacitaciÃ³n de PostgreSQL
AsesorÃa y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Alvaro Herrera on 3 Jan 2010 23:19

> --- 49,63 ----
> * ----------------
> */
>
> ! #define Natts_pg_tablespace 6

Should be 5?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: DTrace compiler warnings
Next: add more frame types in window functions (ROWS)

patch - per-tablespace random_page_cost/seq_page_cost