From: Andrew Tomazos on
On Nov 21, 6:29 pm, gwo...(a)figipc78.tu-graz.ac.at (GJ Woeginger)
wrote:
> In comp.theory Andrew Tomazos <and...(a)tomazos.com> wrote:
> # I was posed the following question in a technical interview for a
> # Software Engineering position by a major multinational NASDAQ company:
> #
> # [Paraphrasing]  "You are given an array of 1,000,000 32-bit integers.
> # One int value x occurs 500,001 times or more in the array.  Specify an
> # algorithm to determine x."
> #
> # The assumption being extra points for efficiency.
>
> There is an old analysis of this problem by Fischer and Salzberg.
>   M.J. Fisher and S.L. Salzberg  (1982)
>   Finding a majority among n votes.
>   Journal of Algorithms 3, pp 143-152.
>
> If 2k elements contain a majority element (= an element that occurs at
> least k+1 times), then you can find it with 3k-2 element comparisons
> (and some small overhead).  The basic idea in their algorithm is that
> whenever you find two distinct elements, then you can destroy both without
> changing the majority element among the remaining elements.  This yields:
>
>  Run once through the array, and maintain a majority-candidate and a counter.
>  The majority-candidate is initialized as the first element, and
>    the counter (counting how often you have seen the candidate) is
>    initialized at 1.
>  If you hit the current candidate again, increment the counter.
>  If you hit another element, decrement the counter.
>  If the counter becomes 0, drop the current candidate and start from
>    scratch with the remaining part of the array.
>
> Fischer and Salzberg also show that this bound 3k-2 is best possible in
> the worst case (and that's the main part of their paper).

If I understand your description than it would look like:

int findmode(int* p, int n)
{
int x = p[0];
int c = 1;

for (int i = 1; i < n; i++)
{
if (c == 0)
{
x = p[i];
c = 1;
}
else if (p[i] == x)
c++;
else
c--;
}

return x;
}

It seems that this could only produce at maximum (2k-1) comparisions
in the worst case, not (3k-2) as you claim?
-Andrew.
From: Andrew Tomazos on
On Nov 22, 1:01 am, Andrew Tomazos <and...(a)tomazos.com> wrote:
> On Nov 21, 6:29 pm, gwo...(a)figipc78.tu-graz.ac.at (GJ Woeginger)
> wrote:
>
>
>
> > In comp.theory Andrew Tomazos <and...(a)tomazos.com> wrote:
> > # I was posed the following question in a technical interview for a
> > # Software Engineering position by a major multinational NASDAQ company:
> > #
> > # [Paraphrasing]  "You are given an array of 1,000,000 32-bit integers.
> > # One int value x occurs 500,001 times or more in the array.  Specify an
> > # algorithm to determine x."
> > #
> > # The assumption being extra points for efficiency.
>
> > There is an old analysis of this problem by Fischer and Salzberg.
> >   M.J. Fisher and S.L. Salzberg  (1982)
> >   Finding a majority among n votes.
> >   Journal of Algorithms 3, pp 143-152.
>
> > If 2k elements contain a majority element (= an element that occurs at
> > least k+1 times), then you can find it with 3k-2 element comparisons
> > (and some small overhead).  The basic idea in their algorithm is that
> > whenever you find two distinct elements, then you can destroy both without
> > changing the majority element among the remaining elements.  This yields:
>
> >  Run once through the array, and maintain a majority-candidate and a counter.
> >  The majority-candidate is initialized as the first element, and
> >    the counter (counting how often you have seen the candidate) is
> >    initialized at 1.
> >  If you hit the current candidate again, increment the counter.
> >  If you hit another element, decrement the counter.
> >  If the counter becomes 0, drop the current candidate and start from
> >    scratch with the remaining part of the array.
>
> > Fischer and Salzberg also show that this bound 3k-2 is best possible in
> > the worst case (and that's the main part of their paper).
>
> If I understand your description than it would look like:
>
> int findmode(int* p, int n)
> {
>     int x = p[0];
>     int c = 1;
>
>     for (int i = 1; i < n; i++)
>     {
>        if (c == 0)
>        {
>           x = p[i];
>           c = 1;
>        }
>        else if (p[i] == x)
>            c++;
>        else
>            c--;
>     }
>
>     return x;
>
> }
>
> It seems that this could only produce at maximum (2k-1) comparisions
> in the worst case, not (3k-2) as you claim?
>   -Andrew.

Oh wait... the (c==0) counts as a comparison. I see it now. Ignore
me.
-Andrew.
From: Bill Dubuque on
Andrew Tomazos <andrew(a)tomazos.com> wrote:
>"Alf P. Steinbach" <alfps(a)start.no> wrote:
>>Bill Dubuque <wgd(a)nestle.csail.mit.edu> wrote:
>>>cri(a)tiac.net (Richard Harter) wrote:
>>>>Andrew Tomazos <andrew(a)tomazos.com> wrote:
>>>>>
>>>>>I was posed the following question in a technical interview for a
>>>>>Software Engineering position by a major multinational NASDAQ company:
>>>>>
>>>>>[Paraphrasing] "You are given an array of 1,000,000 32-bit integers.
>>>>>One int value x occurs 500,001 times or more in the array. Specify an
>>>>>algorithm to determine x."
>>>>>
>>>>>The assumption being extra points for efficiency. [snip code] The idea
>>>>>is to count the frequency of each of the 32 bits in the array separately,
>>>>>knowing that these bit counts will be dominated by the mode. [...]
>>>>>
>>>>>My question about this is as follows. I, perhaps boldly, claim to
>>>>>employers to have a "masters-equivilant" experience/knowledge of
>>>>>computer science and math. Should I have been able to come up with
>>>>>this solution without prompting or help?
>>>>>Would you expect someone with a CompSci Masters or PhD from some major
>>>>>ivy league university to be able to come up with this solution without
>>>>>help? If so in what course or from which book would you expect to learn
>>>>>the required knowledge to come up with the above solution?
>>>>
>>>> That's an interesting question with an interesting presupposition.
>>>> The first thing to understand is that this is a puzzle rather than
>>>> a programming problem.
>>>
>>> I disagree. Saying that it's a puzzle rather than a programming problem
>>> seems to imply that the solution is ad-hoc, rather than a special case
>>> of some more universal technique. But that is not true here. Namely,
>>> the proposed solution follows immediately from the obvious fact that
>>> majority elts on product sets are preserved by component projections.
>>> So the solution is simply an instance of well-known divide-and-conquer
>>> techniques for product objects. Such reductions are ubiquitous in both
>>> mathematics and computer science. So I would expect a good student
>>> to find this solution given enough time. I'd also expect a student
>>> from a top-tier school to discover the more optimal solution, viz.
>>>
>>> bi != a => Maj({a^k b1 b2... bk} \/ S) = Maj(S)
>>>
>>> via m/n > 1/2 => (m-k)/(n-2k) > 1/2 via mediants/arithmetic
>>>
>>> again, assuming enough time. But that assumption is highly problematic
>>> when it comes to designing tests that quickly measure various skills.
>>> It's impossible to test such in the short time span of an interview.
>>> It remains difficult even in much longer timed tests. For example
>>> many of the most successful mathematicians did not perform well on
>>> the Putnam competition, and some of those who did well didn't turn
>>> out to be exceptional mathematicians. Thus there isn't necessarily
>>> much correlation between intelligence and random test scores.
>>>
>>> Marilyn vos Savant is a prime example. The woman who supposedly
>>> has the "world's highest IQ" published a Parade magazine column [1]
>>> and book [2] on Wiles proof of FLT. Her nonsensical arguments proved
>>> beyond a doubt that she has absolutely no clue about higher math
>>> and, worse, doesn't have the intelligence to know that (even after
>>> many experts pointed out gaping flaws in her very naive arguments).
>>> So much for IQ tests.
>>>
>>> [1] sci.math, 11/29/1993, Full Text of Savant FLT Parade Article
>>> http://google.com/group/sci.math/msg/8cb44534c63372ad
>>> http://google.com/groups?selm=WGD.93Nov29055015%40martigny.ai.mit.edu
>>>
>>> [2] Boston, Nigel; Granville, Andrew.
>>> Review of: The World's Most Famous Math Problem (The Proof
>>> of Fermat's Last Theorem and Other Mathematical Mysteries)
>>> by Marilyn vos Savant
>>> The American Mathematical Monthly, 102, 5, 1995, 470-473
>>> http://www.dms.umontreal.ca/~andrew/PDF/VS.pdf
>>
>> Hiya. Could you please explain the above more optimal solution.

I'll be happy to elaborate if you say what isn't clear to you.

> Dubuque is referring to the solution that Woeginger more lucidly
> described above.

No, I hadn't yet seen Woeginger's post when I posted that. Note
that what one finds "more lucid" may depend on one's background.
A computer scientist might find W's post more lucid, whereas a
mathematician may find the above more to the essence of the matter,
esp. since the mediant viewpoint make the underlying math trivial
(W's post completely omits discussion of any math or proof).

--Bill Dubuque
From: Casey Hawthorne on
These questions/puzzles are to see how you approach the problem, but
if you've seen the puzzle before it is a useless practice.

--
Regards,
Casey
From: Richard Harter on
On 21 Nov 2009 16:33:03 -0500, Bill Dubuque
<wgd(a)nestle.csail.mit.edu> wrote:

>cri(a)tiac.net (Richard Harter) wrote:
>>Andrew Tomazos <andrew(a)tomazos.com> wrote:
>>>
>>>I was posed the following question in a technical interview for a
>>>Software Engineering position by a major multinational NASDAQ company:
>>>
>>>[Paraphrasing] "You are given an array of 1,000,000 32-bit integers.
>>>One int value x occurs 500,001 times or more in the array. Specify an
>>>algorithm to determine x."
>>>
>>>The assumption being extra points for efficiency. [snip code] The idea
>>>is to count the frequency of each of the 32 bits in the array separately,
>>>knowing that these bit counts will be dominated by the mode. [...]
>>>
>>>My question about this is as follows. I, perhaps boldly, claim to
>>>employers to have a "masters-equivilant" experience/knowledge of
>>>computer science and math. Should I have been able to come up with
>>>this solution without prompting or help?
>>>Would you expect someone with a CompSci Masters or PhD from some major
>>>ivy league university to be able to come up with this solution without
>>>help? If so in what course or from which book would you expect to learn
>>>the required knowledge to come up with the above solution?
>>
>> That's an interesting question with an interesting presupposition.
>> The first thing to understand is that this is a puzzle rather than
>> a programming problem.
>
>I disagree. Saying that it's a puzzle rather than a programming problem
>seems to imply that the solution is ad-hoc, rather than a special case
>of some more universal technique. But that is not true here.

I disagree. The fact that the solution is "a special case of
some more universal technique" is not to the point - that is
common with good puzzles. The thing that makes it a puzzle is
that the circumstances are carefully arranged. Change the
problem slightly, e.g, the most common is not a majority, or it
is permitted to use O(n) space, or the items are strings rather
than integers, and the nature of the problem and its solution
changes.

If it were a programming problem rather than a puzzle, it might
run something like this: You are given an array of n entities,
not all distinct. Which one(s) occur most frequently. What
general strategy would you suggest? If space usage is critical
does this change your answer? If time usage is critical does
this change your answer? Do any special circumstances occur to
you that might justify changing your approach?

I see that sci.math is among the newsgroups in the original
posting. The question has mathematical interest; it remains,
however, that the context is programming and interviews for
programming positions.


Richard Harter, cri(a)tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Infinity is one of those things that keep philosophers busy when they
could be more profitably spending their time weeding their garden.