From: Ben Bacarisse on
gwoegi(a)figipc78.tu-graz.ac.at (GJ Woeginger) writes:

> In comp.theory Andrew Tomazos <andrew(a)tomazos.com> wrote:
> # I was posed the following question in a technical interview for a
> # Software Engineering position by a major multinational NASDAQ company:
> #
> # [Paraphrasing] "You are given an array of 1,000,000 32-bit integers.
> # One int value x occurs 500,001 times or more in the array. Specify an
> # algorithm to determine x."
> #
> # The assumption being extra points for efficiency.
>
> There is an old analysis of this problem by Fischer and Salzberg.
> M.J. Fisher and S.L. Salzberg (1982)
> Finding a majority among n votes.
> Journal of Algorithms 3, pp 143-152.

I can't track this down. Have I got the right Journal of Algorithms?

http://www.informatik.uni-trier.de/~ley/db/journals/jal/jal3.html

does not list that paper. Of course, this table of contents might be
wrong. The horrendous link:

http://www.sciencedirect.com/science?_ob=PublicationURL&_tockey=%23TOC%236839%231982%23999969995%23518142%23FLA%23&_cdi=6839&_pubType=J&_auth=y&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=2c8bc9013d6025d08b7d24d3207dfe18

appears to be for the journal itself. The paper does not appear
there, either.

> If 2k elements contain a majority element (= an element that occurs at
> least k+1 times), then you can find it with 3k-2 element comparisons
> (and some small overhead). The basic idea in their algorithm is that
> whenever you find two distinct elements, then you can destroy both without
> changing the majority element among the remaining elements. This yields:
>
> Run once through the array, and maintain a majority-candidate and a counter.
> The majority-candidate is initialized as the first element, and
> the counter (counting how often you have seen the candidate) is
> initialized at 1.
> If you hit the current candidate again, increment the counter.
> If you hit another element, decrement the counter.
> If the counter becomes 0, drop the current candidate and start from
> scratch with the remaining part of the array.
>
> Fischer and Salzberg also show that this bound 3k-2 is best possible in
> the worst case (and that's the main part of their paper).

I can find a 1982 technical report from Yale:

FINDING A MAJORITY AMONG N VOTES Solution to Problem 81-5 (Journal
of Algorithms, June 1981)

by the same two authors. It describes what seems to me to be a
different algorithm and for which the 3k - 2 bound is proved. I
suspect the algorithm has been modified since the technical report but
I can't find a good reference that matches the algorithm you describe.

<snip>
--
Ben.
From: mjc on
I was also given this problem as part of an interview quiz. My answer
was: Use one of the known O(n) median algorithms and use that value.
This was accepted by the interviewer (but I didn't get the job).
From: Stuart Golodetz on
Andrew Tomazos wrote:
> I was posed the following question in a technical interview for a
> Software Engineering position by a major multinational NASDAQ company:
>
> [Paraphrasing] "You are given an array of 1,000,000 32-bit integers.
> One int value x occurs 500,001 times or more in the array. Specify an
> algorithm to determine x."
>
> The assumption being extra points for efficiency.
>
> I initially came up with a linear space, linear time solution. With
> some prompting and hints from the interviewer we worked our way to a
> smaller constant space and linear time algorithm. That being
> basically:
>
> int findmode(int* p, int n)
> {
> int count[32];
> for(int i = 0; i < 32; i++)
> count[i] = 0;
>
> for (int i = 0; i < n; i++)
> for (int j = 0; j < 32; j++)
> if (p[i] & (1 << j)) // if bit j is on
> count[j]++;
> else
> count[j]--;
>
> int x = 0;
> for (int i = 0; i < 32; i++)
> if (count[i] > 0)
> x = x | (1 << i);
>
> return x;
> }
>
> The idea here is to count the frequency of each of the 32 bits in the
> array separately, knowing that these bit counts will be dominated by
> the mode.
>
> The interviewer already knew the answer, so I can only assume the test
> was based on how much help he had to give me to arrive at it.

Well he certainly wasn't trying to use you as cheap labour to solve his
latest coding problem, that's for sure :) It's a rare interviewer who
asks questions to which he doesn't already know the answer...

> My question about this is as follows. I, perhaps boldly, claim to
> employers to have a "masters-equivilant" experience/knowledge of
> computer science and math. Should I have been able to come up with
> this solution without prompting or help?

Not necessarily - the purpose of these things isn't always to see
whether you can immediately get the right answer. Often they really want
to see how you tackle the problem.

On a separate note, though, it's not entirely clear to me what a
"masters-equivalent experience/knowledge of computer science and math"
entails - not because I question in any way your experience/knowledge,
but because not all masters degrees are equal (and not all of them are
trying to teach the same thing). Furthermore, people with equal degrees
are not in general equally good at what they do - and there's not even
necessarily a direct correlation between what marks people get in their
degree exams and how competent they are in a work situation. So I
understand what you're getting at, but I'm not sure that strictly
speaking it's a meaningful claim (if it impresses a potential employer,
perhaps you should worry about them).

> Would you expect someone with a CompSci Masters or PhD from some major
> ivy league university to be able to come up with this solution without
> help?

I don't think having a graduate degree per se makes any difference to
your ability to solve coding puzzles like this. (I'm currently doing a
DPhil at Oxford, and whilst it's made a big difference to a number of
skills I have, I don't think it's particularly helped me with puzzles
like this - some I can see through, some I can't, much as it always used
to be.) On the other hand, if you've done such a degree then you
probably have more of an interest in coding/maths than many people, so
it's not entirely unreasonable to ask you puzzle questions like this to
see how your mind works. (Being able to solve them isn't always the
point - it's sometimes how you go about trying that they're interested in.)

> If so in what course or from which book would you expect to learn the
> required knowledge to come up with the above solution?

No one course/book could teach you how to solve all the possible puzzles
they might throw at you - they're trying to see how your mind
works/evaluate your all-round knowledge and lateral thinking. It's
actually good that you can't just get a book of puzzles and learn
everything in it - that would defeat the point of the test.

> Is the skill to be able to come up with such an algorithm something
> that is learned by studying lots of algorithms and being able to mix
> and match the "tricks"? If so, what is a similar commonly known
> algorithm(s) on which the above solution could have been based?

I think some people are naturally better at it than others (some of the
guys I met at the British Informatics Olympiad some years ago were
scarily good at algorithm design, for example), but it is something
that's helped by seeing lots of algorithms, certainly.

> Or, is the ability to invent such a solution simply a matter of IQ?
> Some people have the talent to solve the puzzle, see the pattern and
> come up with the solution - and some just don't?

It's not IQ (whatever that measures in practice) per se - it's partly
having a particular skill set, e.g. being a lateral thinker etc., and
partly having seen lots of similar things before. Some people are better
at it - and you want some of those people on your team (one of the guys
I work with is good like this, always coming up with algorithms for
things). On the other hand, not all of them will be as strong in other
areas, so you probably wouldn't want the ability to solve algorithmic
puzzles to be your only selection criterion; nor would you want to rule
out people who were excellent in other areas because solving this sort
of puzzle isn't really their thing. It's important to recruit a balanced
team.

Cheers,
Stuart

> Thanks,
> Andrew.
From: James Kanze on
On Nov 22, 4:51 pm, Chip Eastham <hardm...(a)gmail.com> wrote:
> On Nov 21, 7:04 pm, Andrew Tomazos <and...(a)tomazos.com> wrote:
> > On Nov 22, 1:01 am, Andrew Tomazos <and...(a)tomazos.com> wrote:

> > > On Nov 21, 6:29 pm, gwo...(a)figipc78.tu-graz.ac.at (GJ Woeginger)
> > > wrote:

> > > > In comp.theory Andrew Tomazos <and...(a)tomazos.com> wrote:
> > > > > I was posed the following question in a technical
> > > > > interview for a Software Engineering position by a
> > > > > major multinational NASDAQ company:

> > > > > [Paraphrasing] "You are given an array of 1,000,000
> > > > > 32-bit integers. One int value x occurs 500,001 times
> > > > > or more in the array. Specify an algorithm to
> > > > > determine x."

> > > > > The assumption being extra points for efficiency.

> > > > There is an old analysis of this problem by Fischer and Salzberg.
> > > > M.J. Fisher and S.L. Salzberg (1982)
> > > > Finding a majority among n votes.
> > > > Journal of Algorithms 3, pp 143-152.

> > > > If 2k elements contain a majority element (= an element
> > > > that occurs at least k+1 times), then you can find it
> > > > with 3k-2 element comparisons (and some small overhead).
> > > > The basic idea in their algorithm is that whenever you
> > > > find two distinct elements, then you can destroy both
> > > > without changing the majority element among the
> > > > remaining elements. This yields:

> > > > Run once through the array, and maintain a
> > > > majority-candidate and a counter. The
> > > > majority-candidate is initialized as the first element,
> > > > and the counter (counting how often you have seen the
> > > > candidate) is initialized at 1.
> > > > If you hit the current candidate again, increment the counter.
> > > > If you hit another element, decrement the counter.
> > > > If the counter becomes 0, drop the current candidate and start from
> > > > scratch with the remaining part of the array.

> > > > Fischer and Salzberg also show that this bound 3k-2 is
> > > > best possible in the worst case (and that's the main
> > > > part of their paper).

> > > If I understand your description than it would look like:

> > > int findmode(int* p, int n)
> > > {
> > > int x = p[0];
> > > int c = 1;
>
> > > for (int i = 1; i < n; i++)
> > > {
> > > if (c == 0)
> > > {
> > > x = p[i];
> > > c = 1;
> > > }
> > > else if (p[i] == x)
> > > c++;
> > > else
> > > c--;
> > > }
> > > return x;

> > > }

> > > It seems that this could only produce at maximum (2k-1)
> > > comparisions in the worst case, not (3k-2) as you claim?

> > Oh wait... the (c==0) counts as a comparison. I see it now.
> > Ignore me.

> I don't think the (c==0) counts as a comparison.

Of course it does. Why wouldn't it?

> I think the other k-1 comparisons are for the 2nd
> part of the algorithm, when you make one more pass
> through the array verifying the candidate found in
> the 1st part. [You already know the index i for
> one occurrence of the candidate, so only the other
> k-1 indices have to be checked.]

What second part? There is no second pass.

--
James Kanze
From: GJ Woeginger on
In comp.theory Ben Bacarisse <ben.usenet(a)bsb.me.uk> wrote:
# gwoegi(a)figipc78.tu-graz.ac.at (GJ Woeginger) writes:
#
# > There is an old analysis of this problem by Fischer and Salzberg.
# > M.J. Fisher and S.L. Salzberg (1982)
# > Finding a majority among n votes.
# > Journal of Algorithms 3, pp 143-152.
#
# I can't track this down. Have I got the right Journal of Algorithms?
#
# http://www.informatik.uni-trier.de/~ley/db/journals/jal/jal3.html
#
# does not list that paper. Of course, this table of contents might be
# wrong. The horrendous link:
#
# http://www.sciencedirect.com/science?_ob=PublicationURL&_tockey=%23TOC%236839%231982%23999969995%23518142%23FLA%23&_cdi=6839&_pubType=J&_auth=y&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=2c8bc9013d6025d08b7d24d3207dfe18
#
# appears to be for the journal itself. The paper does not appear
# there, either.


I gave the wrong page numbers. It should be
M.J. Fisher and S.L. Salzberg (1982)
Finding a majority among n votes.
Journal of Algorithms 3, pp 375-379

And it is not an independent paper, but the discussion of
problem 81-5 in the problems column of Leo Guibas.

I also messed up the statement of the main result of Fisher and Salzberg:
The 3k-2 comparisons are for the problem where you have to decide
whether there exists some majority element.

--Gerhard


# > If 2k elements contain a majority element (= an element that occurs at
# > least k+1 times), then you can find it with 3k-2 element comparisons
# > (and some small overhead). The basic idea in their algorithm is that
# > whenever you find two distinct elements, then you can destroy both without
# > changing the majority element among the remaining elements. This yields:
# >
# > Run once through the array, and maintain a majority-candidate and a counter.
# > The majority-candidate is initialized as the first element, and
# > the counter (counting how often you have seen the candidate) is
# > initialized at 1.
# > If you hit the current candidate again, increment the counter.
# > If you hit another element, decrement the counter.
# > If the counter becomes 0, drop the current candidate and start from
# > scratch with the remaining part of the array.
# >
# > Fischer and Salzberg also show that this bound 3k-2 is best possible in
# > the worst case (and that's the main part of their paper).
#


___________________________________________________________
Gerhard J. Woeginger http://www.win.tue.nl/~gwoegi/