From: Chip Eastham on
On Apr 15, 12:14 pm, Pubkeybreaker <pubkeybrea...(a)aol.com> wrote:
> On Apr 15, 11:43 am, Michael Hennebry <henne...(a)web.cs.ndsu.nodak.edu>
> wrote:
>
>
>
> > On Apr 15, 10:20 am, recoder <kurtulmeh...(a)gmail.com> wrote:
>
> > > On Apr 15, 5:36 pm, Pubkeybreaker <pubkeybrea...(a)aol.com> wrote:
>
> > > > On Apr 15, 10:19 am, recoder <kurtulmeh...(a)gmail.com> wrote:
>
> > > > > Dear All,
> > > > > What is the state of the art method to detect if a binary number is a
> > > > > perfect square?
> > > > > Thanks in advance....
>
> > > > Please answer the following:
>
> > > > (1) Why do you believe that its representation (i.e. binary) matters?
> > > > On a computer, all numbers are stored in binary.
> > > > (2) How big is the number? [this matters!]
> > > > (3) When you say "state of the art" do you mean "best known
> > > > algorithm",
> > > > or do you mean "best known code on a particular computer"??
> > > > If the latter, you need to specify the computer.
> > > > (4) Must the answer be deterministic, or will you settle for knowing
> > > > the
> > > > answer with high probability?
>
> > > > Your statement of the problem is too vague to have a definitive answer.
>
> > > the binary number is in the range of 100 k - 100 Million digits
>
> > > for me state of the art means speed
>
> > > Deterministic might take long time, high probabibility is speedier I
> > > guess
>
> > For probabilistic, evaluate a lot of Jacobi symbols.
>
> Yep.
>
> > No means no. Yes means maybe.
>
> > For deterministic, I think you will have to bite the bullet and find
> > the square root.
> > This can be done in a time proportional to the time required for
> > multiplication.- Hide quoted text -
>
> Use Newton-Raphson along with FFT based multiplication.

If the distribution of inputs contains a significant
fraction of non-squares, your average time benefits
from doing some "probabilistic" tests (giving a chance
to quickly obtain the negative result) before resorting
to the actual square root computation.

- A nonzero square in binary representation will have
least significant bits "...001" followed by an even
number of zero bits (possibly none), because odd
squares are congruent to 1 mod 8. [Depending on the
machine architecture it may be practical to strip
off the trailing zero bit pairs (factor out powers
of 4) so we know we are dealing with an odd number,
but unless the distribution of inputs contains a
high frequency of such powers, it may not be a
useful reduction.]

- 2^32 - 1 = 3*5*17*257*65537 has 5 distinct prime
factors, so if I understood the "100 million digits"
aright, your input may consist of 10 million or so
32-bit words. Summing these into a 64-bit accumulator
in one pass, and then summing those two 32-bit words
into a single 32-bit word, would allow you to test
the input for quadratic residues mod those five
prime values. This will knock out about 95% of the
non-squares, assuming the input distribution isn't
malicious.

regards, chip
From: Michael Hennebry on
On Apr 15, 1:58 pm, Chip Eastham <hardm...(a)gmail.com> wrote:

> - 2^32 - 1 = 3*5*17*257*65537 has 5 distinct prime
>   factors, so if I understood the "100 million digits"
>   aright, your input may consist of 10 million or so
>   32-bit words.  Summing these into a 64-bit accumulator
>   in one pass, and then summing those two 32-bit words
>   into a single 32-bit word, would allow you to test
>   the input for quadratic residues mod those five
>   prime values.  This will knock out about 95% of the
>   non-squares, assuming the input distribution isn't
>   malicious.

With Horner's rule and up to two 32x32->64-bit multiplications per
step,
one can also do any other modulus less than 2**32.
From: Chip Eastham on
On Apr 15, 7:06 pm, Michael Hennebry <henne...(a)web.cs.ndsu.nodak.edu>
wrote:
> On Apr 15, 1:58 pm, Chip Eastham <hardm...(a)gmail.com> wrote:
>
> > - 2^32 - 1 = 3*5*17*257*65537 has 5 distinct prime
> >   factors, so if I understood the "100 million digits"
> >   aright, your input may consist of 10 million or so
> >   32-bit words.  Summing these into a 64-bit accumulator
> >   in one pass, and then summing those two 32-bit words
> >   into a single 32-bit word, would allow you to test
> >   the input for quadratic residues mod those five
> >   prime values.  This will knock out about 95% of the
> >   non-squares, assuming the input distribution isn't
> >   malicious.
>
> With Horner's rule and up to two 32x32->64-bit multiplications per
> step,
> one can also do any other modulus less than 2**32.

It might be worthwhile. Additional tests to
exclude non-squares should benefit average case
timing. After all there are relatively few
squares when you get out to 100 million digits,
so proportionally more effort to eliminate any
non-squares will improve average speed up to a
point. Perhaps the composite modulus:

2^48 - 1 = (3^2)*5*7*13*17*97*257*673

will do better if, as on Intel-like platforms,
we have some flexibility in operand size. The
ideal would be numerous small coprime moduli,
small enough that quadratic residues can be
found by lookup.

regards, chip
From: Phil Carmody on
Chip Eastham <hardmath(a)gmail.com> writes:
> On Apr 15, 7:06 pm, Michael Hennebry <henne...(a)web.cs.ndsu.nodak.edu>
> wrote:
>> On Apr 15, 1:58 pm, Chip Eastham <hardm...(a)gmail.com> wrote:
>>
>> > - 2^32 - 1 = 3*5*17*257*65537 has 5 distinct prime
>> >   factors, so if I understood the "100 million digits"
>> >   aright, your input may consist of 10 million or so
>> >   32-bit words.  Summing these into a 64-bit accumulator
>> >   in one pass, and then summing those two 32-bit words
>> >   into a single 32-bit word, would allow you to test
>> >   the input for quadratic residues mod those five
>> >   prime values.  This will knock out about 95% of the
>> >   non-squares, assuming the input distribution isn't
>> >   malicious.
>>
>> With Horner's rule and up to two 32x32->64-bit multiplications per
>> step,
>> one can also do any other modulus less than 2**32.
>
> It might be worthwhile. Additional tests to
> exclude non-squares should benefit average case
> timing. After all there are relatively few
> squares when you get out to 100 million digits,
> so proportionally more effort to eliminate any
> non-squares will improve average speed up to a
> point. Perhaps the composite modulus:
>
> 2^48 - 1 = (3^2)*5*7*13*17*97*257*673
>
> will do better if, as on Intel-like platforms,
> we have some flexibility in operand size. The
> ideal would be numerous small coprime moduli,
> small enough that quadratic residues can be
> found by lookup.

If you're doing 2^48-1, then it's probably quicker to
just do 2^96-1 in 3 32-bit registers. Or even 2^192-1
in 3 64-bit registers. That can be done as quickly as
an add.

And don't forget even numbers. I have a bit twiddle
which rejects all numbers that have an odd number
of trailing 0 bits in their lowest word, and those
for which the lowest set bit is preceded by a 1, in
only 5 elementary operations. It's basically like
the old mod 8 trick, (0 or 1 only), but also addresses
n>>2 if n%4==0, n>>4 if n%16==0, etc.

Phil
--
I find the easiest thing to do is to k/f myself and just troll away
-- David Melville on r.a.s.f1
From: Nicolas Bonneel on
recoder wrote:
> the binary number is in the range of 100 k - 100 Million digits
>
> for me state of the art means speed
>
> Deterministic might take long time, high probabibility is speedier I
> guess

With such a range and if probabilistic is enough, I guess I can do
better than chance and with quite high probability at zero cost
returning always false...