Prev: Visual Studio 2005, C-language function - avoiding hacks?
Next: [HACKERS] Core dump running PL/Perl installcheck with bleadperl [PATCH]
From: Andrew Dunstan on 6 Mar 2010 16:01 François Pérou wrote: > > My opinion is that PostgreSQL should accept any MySQL syntax and return > warnings. I believe that we should access even innodb syntax and turn it > immediately into PostgreSQL tables. This would allow people with no > interest in SQL to migrate from MySQL to PostgreSQL without any harm. > This is just fantasy. Doing this will destabilize Postgres, cost us hugely in maintenance effort and LOSE us users. If we do this why the heck should we stop there? Why shouldn't we replicate the broken behaviour of every major database out there? It's really time for you to stop making this suggestion, once and for all. It is just not going to happen. Moreover MySQL appears to be fracturing into a bunch of different forks, so why now, of all times, would we want to adopt its broken syntax? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Pierre C" on 8 Mar 2010 04:17 > My opinion is that PostgreSQL should accept any MySQL syntax and return > warnings. I believe that we should access even innodb syntax and turn it > immediately into PostgreSQL tables. This would allow people with no > interest in SQL to migrate from MySQL to PostgreSQL without any harm. A solution would be a SQL proxy (a la pgpool) with query rewriting. > PHP developers don't have time to invest in learning deep SQL. This is true, and it is a big problem IMHO. It results in lots of slow, broken, insecure database designs. ALL the web apps that I've done "CPR ressuscitation" on follow the same schema : - devs are database noobs - generous use of MyISAM - numerous queries, most of them unoptimized and/or useless - use of Apache/mod_php instead of fastcgi - sometimes, use of a huge slow bloated CMS/"framework" which issues even more unoptimized and/or useless SQL queries - site gains popularity - huge traffic takes an unprepared team by surprise (never heard of stuff like concurrency or scaling) - site fails horribly That said, I've got a 150.000+ members forum running on MySQL with sub 5 ms page times on a low-end server, it works if you do it right. Most opensource PHP apps developers have to expend lots of efforts to work on MyISAM that doesn't support foreign keys or constraints. If those resources could be directed to useful work instead of wasted like this, the result would be a lot better. The irony is that even with all that effort, you can't make a web app work without transactions, sooner or later your database integrity will fail. My theory on this is simple : - PHP is a very weak language, not suited to implementation of really useful frameworks (unlike Python / Ruby) example : Find an ORM for PHP that is as good as sqlalchemy. It does not exist, because it is impossible to do. -> really smart programmers dislike PHP because it is a pretty weak language, so they all flee to Python, Ruby, etc All big PHP applications turn into a huge "usine à gaz", impossible to understand code, because of language weakness. - really smart DBAs dislike MySQL (unless they have a nice paying job at facebook or flickr) So, it is very difficult to find good PHP developers, and especially with database knowledge. > IMHO, PostgreSQL has to be more flexible (in > psychological terms) to understand MySQL user needs and answer them, > just to give them a choice to migrate to PostgreSQL. Problem is, as you mentioned above, most PHP developers don't know what their "needs" are because they have little database expertise. About stuff MySQL does that I would like postgres to implement, I'd focus more on features, not syntax : - some form of index-only scans or equivalent (visibility map would probably suffice) - some form of INSERT ON DUPLICATE KEY UPDATE or equivalent (merge...) where the DB, not me, takes care of concurrency - some way to "SELECT a,b,c,d GROUP BY a" when it can be determined that it is equivalent to "GROUP BY a,b,c,d", ie a is UNIQUE NOT NULL - index skip scans (well, MySQL doesn't really do index skip scans, but since it can do index-only scans, it's an approximation) - simpler syntax for DELETEs using JOINs And while I'm at it, I'll add my pet feature : An extremely fast form of temporary storage. Table main is referenced by tables child1, child2, ... childN - SELECT ... FROM main WHERE (very complex condition involving gist coordinates search etc) ORDER BY Then I want the rows from child tables which reference those results. If I add a lot of JOINs to my query, it's entirely possible that the (very complex condition involving gist coordinates search etc) is mis-estimated .. This is generally not a problem since it usually uses bitmap index scans which can survive lots of abuse. However it causes mis-planning of the JOINs which is a problem. Besides, some of the child tables have few rows, but lots of columns, so it complicates the query and returns many times the same data, which the ORM doesn't care about since it would rather instanciate 1 object per referenced table row instead of 1 object per main table row. I would like to do : CREATE TEMP TABLE foo AS SELECT ... FROM main WHERE (very complex condition involving gist coordinates search etc); ANALYZE foo; SELECT * FROM foo ORDER BY ... SELECT c.* FROM foo JOIN child1 ON (...) SELECT c.* FROM foo JOIN child2 ON (...) etc This splits the query into much easier to manage fragments, and the results are easier to use, too. I can store in the application only 1 object per child table row. But I can't do this because it causes an update of system catalogs (slow, iowait, and bloat). Basically it would be nice to have "something" (temp table, cursor, CTE, tuplestore, whatever) that can hold a short-lived result set, can be used like a table, can have accurate statistics, and can be used in several queries, without disk writes. Note this would completely solve the set-returning functions stats problem since you could store and analyze the function result in an efficient way. -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Marko Tiikkaja on 8 Mar 2010 04:38 On 2010-03-08 11:17 +0200, Pierre C wrote: > - index skip scans (well, MySQL doesn't really do index skip scans, but > since it can do index-only scans, it's an approximation) As far as I can tell, we already do index skip scans: => create index foo_a_b_idx on foo(a,b); CREATE INDEX => explain analyze select * from foo where b = 2; QUERY PLAN --------------------------------------------------------------------------- Index Scan using foo_a_b_idx on foo (cost=0.00..20.30 rows=5 width=8) (actual time=0.027..0.057 rows=1 loops=1) Index Cond: (b = 2) Total runtime: 0.075 ms (3 rows) Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Pierre C" on 8 Mar 2010 04:47 > As far as I can tell, we already do index skip scans: This feature is great but I was thinking about something else, like SELECT DISTINCT, which currently does a seq scan, even if x is indexed. Here is an example. In both cases it could use the index to skip all non-interesting rows, pulling only 69 rows from the heap instead of 120K. EXPLAIN ANALYZE SELECT DISTINCT vente, type_id FROM annonces; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=15270.98..15271.82 rows=84 width=3) (actual time=113.277..113.288 rows=69 loops=1) -> Seq Scan on annonces (cost=0.00..14682.32 rows=117732 width=3) (actual time=0.005..76.069 rows=119655 loops=1) EXPLAIN ANALYZE SELECT DISTINCT ON( vente, type_id ) * FROM annonces; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ Unique (cost=0.00..34926.90 rows=84 width=1076) (actual time=0.019..107.318 rows=69 loops=1) -> Index Scan using annonces_type on annonces (cost=0.00..34338.24 rows=117732 width=1076) (actual time=0.017..52.982 rows=119655 loops=1) -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Marko Tiikkaja on 8 Mar 2010 04:58
On 2010-03-08 11:47 +0200, Pierre C wrote: >> As far as I can tell, we already do index skip scans: > > This feature is great but I was thinking about something else, like SELECT > DISTINCT, which currently does a seq scan, even if x is indexed. > > Here is an example. In both cases it could use the index to skip all > non-interesting rows, pulling only 69 rows from the heap instead of 120K. Oh, this is what I believe MySQL calls "loose index scans". I'm actually looking into this as we speak, but there seems to be a non-trivial amount of work to be done in order for this to work. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |