Prev: [HACKERS] C-Language Fun on VC2005 ERROR: could not load library
Next: extended operator classes vs. type interfaces
From: Greg Stark on 8 Apr 2010 23:51 On Fri, Apr 9, 2010 at 12:17 AM, Joachim Wieland <joe(a)mcknight.de> wrote: > One question that I do not yet see answered is, do we risk violating a > patent even if we just link against a compression library, for example > liblzf, without shipping the actual code? > Generally patents are infringed on when the process is used. So whether we link against or ship the code isn't really relevant. The user using the software would need a patent license either way. We want Postgres to be usable without being dependent on any copyright or patent licenses. Linking against as an option isn't nearly as bad since the user compiling it can choose whether to include the restricted feature or not. That's what we do with readline. However it's not nearly as attractive when it restricts what file formats Postgres supports -- it means someone might generate backup dump files that they later discover they don't have a legal right to read and restore :( -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Joachim Wieland on 10 Apr 2010 08:18 On Fri, Apr 9, 2010 at 5:51 AM, Greg Stark <gsstark(a)mit.edu> wrote: > Linking against as an option isn't nearly as bad since the user > compiling it can choose whether to include the restricted feature or > not. That's what we do with readline. However it's not nearly as > attractive when it restricts what file formats Postgres supports -- it > means someone might generate backup dump files that they later > discover they don't have a legal right to read and restore :( If we only linked against it, we'd leave it up to the user to weigh the risk as long as we are not aware of any such violation. Our top priority is to make sure that the project would not be harmed if one day such a patent showed up. If I understood you correctly, this is not an issue, even if we included lzf and less again if we only link against it. The rest is about user education and using lzf only in pg_dump and not for toasting, we could show a message in pg_dump if lzf is chosen to make the user aware of the possible issues. If we still cannot do this, then what I am asking is: What does the project need to be able to at least link against such a compression algorithm? Is it a list of 10, 20, 50 or more other projects using it or is it a lawyer saying: "There is no patent."? But then, how can we be sure that the lawyer is right? Or couldn't we include it even if we had both, because again, we couldn't be sure... ? Joachim -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 13 Apr 2010 15:03 Joachim Wieland <joe(a)mcknight.de> writes: > If we still cannot do this, then what I am asking is: What does the > project need to be able to at least link against such a compression > algorithm? Well, what we *really* need is a convincing argument that it's worth taking some risk for. I find that not obvious. You can pipe the output of pg_dump into your-choice-of-compressor, for example, and that gets you the ability to spread the work across multiple CPUs in addition to eliminating legal risk to the PG project. And in any case the general impression seems to be that the main dump-speed bottleneck is on the backend side not in pg_dump's compression. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Stefan Kaltenbrunner on 14 Apr 2010 04:25 Tom Lane wrote: > Joachim Wieland <joe(a)mcknight.de> writes: >> If we still cannot do this, then what I am asking is: What does the >> project need to be able to at least link against such a compression >> algorithm? > > Well, what we *really* need is a convincing argument that it's worth > taking some risk for. I find that not obvious. You can pipe the output > of pg_dump into your-choice-of-compressor, for example, and that gets > you the ability to spread the work across multiple CPUs in addition to > eliminating legal risk to the PG project. And in any case the general > impression seems to be that the main dump-speed bottleneck is on the > backend side not in pg_dump's compression. legal risks aside (I'm not a lawyer so I cannot comment on that) the current situation imho is: * for a plain pg_dump the backend is the bottleneck * for a pg_dump -Fc with compression, compression is a huge bottleneck * for pg_dump | gzip, it is usually compression (or bytea and some other datatypes in <9.0) * for a parallel dump you can either dump uncompressed and compress afterwards which increases diskspace requirements (and if you need parallel dump you usually have a large database) and complexity (because you would have to think about how to manually parallel the compression * for a parallel dump that compresses inline you are limited by the compression algorithm on a per core base and given that the current inline compression overhead is huge you loose a lot of the benefits of parallel dump Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Dimitri Fontaine on 14 Apr 2010 04:33
Tom Lane <tgl(a)sss.pgh.pa.us> writes: > Well, what we *really* need is a convincing argument that it's worth > taking some risk for. I find that not obvious. You can pipe the output > of pg_dump into your-choice-of-compressor, for example, and that gets > you the ability to spread the work across multiple CPUs in addition to > eliminating legal risk to the PG project. Well, I like -Fc and playing with the catalog to restore in staging environments only the "interesting" data. I even automated all the catalog mangling in pg_staging so that I just have to setup which schema I want, with only the DDL or with the DATA too. The fun is when you want to exclude functions that are used in triggers based on the schema where the function lives, not the trigger, BTW, but that's another story. So yes having both -Fc and another compression facility than plain gzip would be good news. And benefiting from a better compression in TOAST would be good too I guess (small size hit, lots faster, would fit). Summary : my convincing argument is using the dumps for efficiently preparing development and testing environments from production data, thanks to -Fc. That includes skipping data to restore. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |