SQL compatibility reminder: MySQL vs PostgreSQL [PgSql]

Prev: Visual Studio 2005, C-language function - avoiding hacks?
Next: [HACKERS] Core dump running PL/Perl installcheck with bleadperl [PATCH]

From: Andrew Dunstan on 6 Mar 2010 16:01

François Pérou wrote:
>
> My opinion is that PostgreSQL should accept any MySQL syntax and return
> warnings. I believe that we should access even innodb syntax and turn it
> immediately into PostgreSQL tables. This would allow people with no
> interest in SQL to migrate from MySQL to PostgreSQL without any harm.
>

This is just fantasy. Doing this will destabilize Postgres, cost us
hugely in maintenance effort and LOSE us users.

If we do this why the heck should we stop there? Why shouldn't we
replicate the broken behaviour of every major database out there?

It's really time for you to stop making this suggestion, once and for
all. It is just not going to happen. Moreover MySQL appears to be
fracturing into a bunch of different forks, so why now, of all times,
would we want to adopt its broken syntax?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Pierre C" on 8 Mar 2010 04:17

> My opinion is that PostgreSQL should accept any MySQL syntax and return
> warnings. I believe that we should access even innodb syntax and turn it
> immediately into PostgreSQL tables. This would allow people with no
> interest in SQL to migrate from MySQL to PostgreSQL without any harm.

A solution would be a SQL proxy (a la pgpool) with query rewriting.

> PHP developers don't have time to invest in learning deep SQL.

This is true, and it is a big problem IMHO. It results in lots of slow,
broken, insecure database designs.

ALL the web apps that I've done "CPR ressuscitation" on follow the same
schema :
- devs are database noobs
- generous use of MyISAM
- numerous queries, most of them unoptimized and/or useless
- use of Apache/mod_php instead of fastcgi
- sometimes, use of a huge slow bloated CMS/"framework" which issues even
more unoptimized and/or useless SQL queries
- site gains popularity
- huge traffic takes an unprepared team by surprise (never heard of stuff
like concurrency or scaling)
- site fails horribly

That said, I've got a 150.000+ members forum running on MySQL with sub 5
ms page times on a low-end server, it works if you do it right.

Most opensource PHP apps developers have to expend lots of efforts to work
on MyISAM that doesn't support foreign keys or constraints.
If those resources could be directed to useful work instead of wasted like
this, the result would be a lot better.
The irony is that even with all that effort, you can't make a web app work
without transactions, sooner or later your database integrity will fail.

My theory on this is simple :

- PHP is a very weak language, not suited to implementation of really
useful frameworks (unlike Python / Ruby)
example : Find an ORM for PHP that is as good as sqlalchemy. It does not
exist, because it is impossible to do.
-> really smart programmers dislike PHP because it is a pretty weak
language, so they all flee to Python, Ruby, etc
All big PHP applications turn into a huge "usine à gaz", impossible to
understand code, because of language weakness.
- really smart DBAs dislike MySQL (unless they have a nice paying job at
facebook or flickr)

So, it is very difficult to find good PHP developers, and especially with
database knowledge.

> IMHO, PostgreSQL has to be more flexible (in
> psychological terms) to understand MySQL user needs and answer them,
> just to give them a choice to migrate to PostgreSQL.

Problem is, as you mentioned above, most PHP developers don't know what
their "needs" are because they have little database expertise.

About stuff MySQL does that I would like postgres to implement, I'd focus
more on features, not syntax :

- some form of index-only scans or equivalent (visibility map would
probably suffice)
- some form of INSERT ON DUPLICATE KEY UPDATE or equivalent (merge...)
where the DB, not me, takes care of concurrency
- some way to "SELECT a,b,c,d GROUP BY a" when it can be determined that
it is equivalent to "GROUP BY a,b,c,d", ie a is UNIQUE NOT NULL
- index skip scans (well, MySQL doesn't really do index skip scans, but
since it can do index-only scans, it's an approximation)
- simpler syntax for DELETEs using JOINs

And while I'm at it, I'll add my pet feature :

An extremely fast form of temporary storage.

Table main is referenced by tables child1, child2, ... childN

- SELECT ... FROM main
WHERE (very complex condition involving gist coordinates search etc)
ORDER BY

Then I want the rows from child tables which reference those results.
If I add a lot of JOINs to my query, it's entirely possible that the (very
complex condition involving gist coordinates search etc) is mis-estimated
.. This is generally not a problem since it usually uses bitmap index scans
which can survive lots of abuse. However it causes mis-planning of the
JOINs which is a problem.

Besides, some of the child tables have few rows, but lots of columns, so
it complicates the query and returns many times the same data, which the
ORM doesn't care about since it would rather instanciate 1 object per
referenced table row instead of 1 object per main table row.

I would like to do :

CREATE TEMP TABLE foo AS SELECT ... FROM main
WHERE (very complex condition involving gist coordinates search etc);

ANALYZE foo;
SELECT * FROM foo ORDER BY ...
SELECT c.* FROM foo JOIN child1 ON (...)
SELECT c.* FROM foo JOIN child2 ON (...)

etc

This splits the query into much easier to manage fragments, and the
results are easier to use, too.
I can store in the application only 1 object per child table row.
But I can't do this because it causes an update of system catalogs (slow,
iowait, and bloat).

Basically it would be nice to have "something" (temp table, cursor, CTE,
tuplestore, whatever) that can hold a short-lived result set, can be used
like a table, can have accurate statistics, and can be used in several
queries, without disk writes.

Note this would completely solve the set-returning functions stats problem
since you could store and analyze the function result in an efficient way.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Marko Tiikkaja on 8 Mar 2010 04:38

On 2010-03-08 11:17 +0200, Pierre C wrote:
> - index skip scans (well, MySQL doesn't really do index skip scans, but
> since it can do index-only scans, it's an approximation)

As far as I can tell, we already do index skip scans:

=> create index foo_a_b_idx on foo(a,b);
CREATE INDEX

=> explain analyze select * from foo where b = 2;

QUERY PLAN
---------------------------------------------------------------------------
Index Scan using foo_a_b_idx on foo (cost=0.00..20.30 rows=5 width=8)
(actual time=0.027..0.057 rows=1 loops=1)
Index Cond: (b = 2)
Total runtime: 0.075 ms
(3 rows)

Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Pierre C" on 8 Mar 2010 04:47

> As far as I can tell, we already do index skip scans:

This feature is great but I was thinking about something else, like SELECT
DISTINCT, which currently does a seq scan, even if x is indexed.

Here is an example. In both cases it could use the index to skip all
non-interesting rows, pulling only 69 rows from the heap instead of 120K.

EXPLAIN ANALYZE SELECT DISTINCT vente, type_id FROM annonces;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=15270.98..15271.82 rows=84 width=3) (actual
time=113.277..113.288 rows=69 loops=1)
-> Seq Scan on annonces (cost=0.00..14682.32 rows=117732 width=3)
(actual time=0.005..76.069 rows=119655 loops=1)

EXPLAIN ANALYZE SELECT DISTINCT ON( vente, type_id ) * FROM annonces;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.00..34926.90 rows=84 width=1076) (actual
time=0.019..107.318 rows=69 loops=1)
-> Index Scan using annonces_type on annonces (cost=0.00..34338.24
rows=117732 width=1076) (actual time=0.017..52.982 rows=119655 loops=1)

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Marko Tiikkaja on 8 Mar 2010 04:58

On 2010-03-08 11:47 +0200, Pierre C wrote:
>> As far as I can tell, we already do index skip scans:
>
> This feature is great but I was thinking about something else, like SELECT
> DISTINCT, which currently does a seq scan, even if x is indexed.
>
> Here is an example. In both cases it could use the index to skip all
> non-interesting rows, pulling only 69 rows from the heap instead of 120K.

Oh, this is what I believe MySQL calls "loose index scans". I'm
actually looking into this as we speak, but there seems to be a
non-trivial amount of work to be done in order for this to work.

Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: Visual Studio 2005, C-language function - avoiding hacks?
Next: [HACKERS] Core dump running PL/Perl installcheck with bleadperl [PATCH]