[RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up [PgSql]

Prev: ERROR: GIN indexes do not support whole-index scans
Next: [HACKERS] Why SELECT keyword on parser is written as SELECTME ?

From: Robert Haas on 20 May 2010 23:40

On Thu, May 20, 2010 at 4:27 PM, Andres Freund <andres(a)anarazel.de> wrote:
> I looked a bit around for faster implementations of CRC32 and found one in
> zlib. After adapting it (pg uses slightly different computation (non-
> inverted)) I found that it increases the speed of the CRC32 calculation itself
> 3 fold.

But zlib is not under the PostgreSQL license.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andres Freund on 21 May 2010 01:11

On Friday 21 May 2010 05:40:03 Robert Haas wrote:
> On Thu, May 20, 2010 at 4:27 PM, Andres Freund <andres(a)anarazel.de> wrote:
> > I looked a bit around for faster implementations of CRC32 and found one
> > in zlib. After adapting it (pg uses slightly different computation (non-
> > inverted)) I found that it increases the speed of the CRC32 calculation
> > itself 3 fold.
>
> But zlib is not under the PostgreSQL license.
Yes. But:
1. the zlib license shouldn't be a problem in itself - pg_dump also already
links to zlib
2. I planned to ask Mark Adler whether he would support relicising those bits.
I have read some other discussions where he was supportive of doing such a
thing
3. Given that idea was posted publically on the usenet it is not hard to
produce an independent implementation.

So I do not see any big problems there... Or am I missing something?

Greetings,

Andres

/* zlib.h -- interface of the 'zlib' general purpose compression library
version 1.2.2, October 3rd, 2004

Copyright (C) 1995-2004 Jean-loup Gailly and Mark Adler

This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any damages
arising from the use of this software.

Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:

1. The origin of this software must not be misrepresented; you must not
claim that you wrote the original software. If you use this software
in a product, an acknowledgment in the product documentation would be
appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original software.
3. This notice may not be removed or altered from any source distribution.

Jean-loup Gailly jloup(a)gzip.org
Mark Adler madler(a)alumni.caltech.edu

*/

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andres Freund on 30 May 2010 05:56

On Sunday 30 May 2010 04:56:09 Greg Stark wrote:
> This sounds familiar. If you search back in the archives around 2004
> or so I think you'll find a similar discussion when we replaced the
> crc32 implementation with what we have now. We put a fair amount of
> effort into searching for faster implementations so if you've found
> one 3x faster I'm pretty startled.
All of those didnt think of computing more than one byte at the same time.
Most if not all current architectures are more or less superscalar (explictly
by the compiler or implicitly by somewhat intelligent silicon) - the current
algorithm has an ordering restrictions that prevent any benefit from that.
Basically it needs the CRC of the last byte for the next one - the zlib/my
version computes 4 bytes independently and then squashes them together which
results in way much better overall usage.

> Are you sure it's faster on all
> architectures and not a win sometimes and a loss other times? And are
> you sure it's faster in our use case where we're crcing small
> sequences of data often and not crcing a large block?
I tried on several and it was never a loss at 16+ bytes, never worse at 8, and
most of the time equal if not better at 4. Sizes of 1-4 are somewhat slower as
they use the same algorithm as the old version but do have an additional jump.
Thats a difference of about 3-4cycles.

I will try to implement an updated patch sometime these days.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

|
Pages: 1
Prev: ERROR: GIN indexes do not support whole-index scans
Next: [HACKERS] Why SELECT keyword on parser is written as SELECTME ?