From: Andrew Tomazos on
On Dec 23 2009, 3:48 am, "Ostap S. B. M. Bender Jr."
<ostap_bender_1...(a)hotmail.com> wrote:
> On Nov 21, 8:12 am, Andrew Tomazos <and...(a)tomazos.com> wrote:
>
>
>
> > I was posed the following question in a technical interview for a
> > Software Engineering position by a major multinational NASDAQ company:
>
> > [Paraphrasing]  "You are given an array of 1,000,000 32-bit integers.
> > One int value x occurs 500,001 times or more in the array.  Specify an
> > algorithm to determine x."
>
> > The assumption being extra points for efficiency.
>
> > I initially came up with a linear space, linear time solution.  With
> > some prompting and hints from the interviewer we worked our way to a
> > smaller constant space and linear time algorithm.  That being
> > basically:
>
> > int findmode(int* p, int n)
> > {
> >     int count[32];
> >     for(int i = 0; i < 32; i++)
> >         count[i] = 0;
>
> >     for (int i = 0; i < n; i++)
> >          for (int j = 0; j < 32; j++)
> >                if (p[i] & (1 << j)) // if bit j is on
> >                     count[j]++;
> >                else
> >                     count[j]--;
>
> >     int x = 0;
> >     for (int i = 0; i < 32; i++)
> >         if (count[i] > 0)
> >             x = x | (1 << i);
>
> >     return x;
>
> > }
>
> > The idea here is to count the frequency of each of the 32 bits in the
> > array separately, knowing that these bit counts will be dominated by
> > the mode.
>
> > The interviewer already knew the answer, so I can only assume the test
> > was based on how much help he had to give me to arrive at it.
>
> > My question about this is as follows.  I, perhaps boldly, claim to
> > employers to have a "masters-equivilant" experience/knowledge of
> > computer science and math.  Should I have been able to come up with
> > this solution without prompting or help?
>
> > Would you expect someone with a CompSci Masters or PhD from some major
> > ivy league university to be able to come up with this solution without
> > help?
>
> > If so in what course or from which book would you expect to learn the
> > required knowledge to come up with the above solution?
>
> > Is the skill to be able to come up with such an algorithm something
> > that is learned by studying lots of algorithms and being able to mix
> > and match the "tricks"?  If so, what is a similar commonly known
> > algorithm(s) on which the above solution could have been based?
>
> > Or, is the ability to invent such a solution simply a matter of IQ?
> > Some people have the talent to solve the puzzle, see the pattern and
> > come up with the solution - and some just don't?
>
> How badly would the following simple algorithm work:
>
> Look at the first column. Determine which digit occurs more - 0 or 1 -
> among the 1,000,000 integers, and mark those integers, that don't have
> this digit, as "R" (for "rejects")

This requires linear O(n) space to store these reject marks. The two
previously discussed algorithms only require constant O(1) space.
-Andrew.
From: Ostap S. B. M. Bender Jr. on
On Dec 31, 6:38 pm, Andrew Tomazos <and...(a)tomazos.com> wrote:
> On Dec 23 2009, 3:48 am, "Ostap S. B. M. Bender Jr."
>
>
>
> <ostap_bender_1...(a)hotmail.com> wrote:
> > On Nov 21, 8:12 am, Andrew Tomazos <and...(a)tomazos.com> wrote:
>
> > > I was posed the following question in a technical interview for a
> > > Software Engineering position by a major multinational NASDAQ company:
>
> > > [Paraphrasing]  "You are given an array of 1,000,000 32-bit integers.
> > > One int value x occurs 500,001 times or more in the array.  Specify an
> > > algorithm to determine x."
>
> > > The assumption being extra points for efficiency.
>
> > > I initially came up with a linear space, linear time solution.  With
> > > some prompting and hints from the interviewer we worked our way to a
> > > smaller constant space and linear time algorithm.  That being
> > > basically:
>
> > > int findmode(int* p, int n)
> > > {
> > >     int count[32];
> > >     for(int i = 0; i < 32; i++)
> > >         count[i] = 0;
>
> > >     for (int i = 0; i < n; i++)
> > >          for (int j = 0; j < 32; j++)
> > >                if (p[i] & (1 << j)) // if bit j is on
> > >                     count[j]++;
> > >                else
> > >                     count[j]--;
>
> > >     int x = 0;
> > >     for (int i = 0; i < 32; i++)
> > >         if (count[i] > 0)
> > >             x = x | (1 << i);
>
> > >     return x;
>
> > > }
>
> > > The idea here is to count the frequency of each of the 32 bits in the
> > > array separately, knowing that these bit counts will be dominated by
> > > the mode.
>
> > > The interviewer already knew the answer, so I can only assume the test
> > > was based on how much help he had to give me to arrive at it.
>
> > > My question about this is as follows.  I, perhaps boldly, claim to
> > > employers to have a "masters-equivilant" experience/knowledge of
> > > computer science and math.  Should I have been able to come up with
> > > this solution without prompting or help?
>
> > > Would you expect someone with a CompSci Masters or PhD from some major
> > > ivy league university to be able to come up with this solution without
> > > help?
>
> > > If so in what course or from which book would you expect to learn the
> > > required knowledge to come up with the above solution?
>
> > > Is the skill to be able to come up with such an algorithm something
> > > that is learned by studying lots of algorithms and being able to mix
> > > and match the "tricks"?  If so, what is a similar commonly known
> > > algorithm(s) on which the above solution could have been based?
>
> > > Or, is the ability to invent such a solution simply a matter of IQ?
> > > Some people have the talent to solve the puzzle, see the pattern and
> > > come up with the solution - and some just don't?
>
> > How badly would the following simple algorithm work:
>
> > Look at the first column. Determine which digit occurs more - 0 or 1 -
> > among the 1,000,000 integers, and mark those integers, that don't have
> > this digit, as "R" (for "rejects")
>
> This requires linear O(n) space to store these reject marks.  The two
> previously discussed algorithms only require constant O(1) space.
>  

Do you mean this:

int findmode(int* p, int n)
{
int count[32];
for(int i = 0; i < 32; i++)
count[i] = 0;

for (int i = 0; i < n; i++)
for (int j = 0; j < 32; j++)
if (p[i] & (1 << j)) // if bit j is on
count[j]++;
else
count[j]--;

int x = 0;
for (int i = 0; i < 32; i++)
if (count[i] > 0)
x = x | (1 << i);

return x;

}


Doesn't count[j] require O(log n) space?


But of course O(log n) is still better than O(n)

From: Andrew Tomazos on
On Jan 1, 4:23 am, "Ostap S. B. M. Bender Jr."
<ostap_bender_1...(a)hotmail.com> wrote:
> On Dec 31, 6:38 pm, Andrew Tomazos <and...(a)tomazos.com> wrote:
>
>
>
> > On Dec 23 2009, 3:48 am, "Ostap S. B. M. Bender Jr."
>
> > <ostap_bender_1...(a)hotmail.com> wrote:
> > > On Nov 21, 8:12 am, Andrew Tomazos <and...(a)tomazos.com> wrote:
>
> > > > I was posed the following question in a technical interview for a
> > > > Software Engineering position by a major multinational NASDAQ company:
>
> > > > [Paraphrasing]  "You are given an array of 1,000,000 32-bit integers.
> > > > One int value x occurs 500,001 times or more in the array.  Specify an
> > > > algorithm to determine x."
>
> > > > The assumption being extra points for efficiency.
>
> > > > I initially came up with a linear space, linear time solution.  With
> > > > some prompting and hints from the interviewer we worked our way to a
> > > > smaller constant space and linear time algorithm.  That being
> > > > basically:
>
> > > > int findmode(int* p, int n)
> > > > {
> > > >     int count[32];
> > > >     for(int i = 0; i < 32; i++)
> > > >         count[i] = 0;
>
> > > >     for (int i = 0; i < n; i++)
> > > >          for (int j = 0; j < 32; j++)
> > > >                if (p[i] & (1 << j)) // if bit j is on
> > > >                     count[j]++;
> > > >                else
> > > >                     count[j]--;
>
> > > >     int x = 0;
> > > >     for (int i = 0; i < 32; i++)
> > > >         if (count[i] > 0)
> > > >             x = x | (1 << i);
>
> > > >     return x;
>
> > > > }
>
> > > > The idea here is to count the frequency of each of the 32 bits in the
> > > > array separately, knowing that these bit counts will be dominated by
> > > > the mode.
>
> > > > The interviewer already knew the answer, so I can only assume the test
> > > > was based on how much help he had to give me to arrive at it.
>
> > > > My question about this is as follows.  I, perhaps boldly, claim to
> > > > employers to have a "masters-equivilant" experience/knowledge of
> > > > computer science and math.  Should I have been able to come up with
> > > > this solution without prompting or help?
>
> > > > Would you expect someone with a CompSci Masters or PhD from some major
> > > > ivy league university to be able to come up with this solution without
> > > > help?
>
> > > > If so in what course or from which book would you expect to learn the
> > > > required knowledge to come up with the above solution?
>
> > > > Is the skill to be able to come up with such an algorithm something
> > > > that is learned by studying lots of algorithms and being able to mix
> > > > and match the "tricks"?  If so, what is a similar commonly known
> > > > algorithm(s) on which the above solution could have been based?
>
> > > > Or, is the ability to invent such a solution simply a matter of IQ?
> > > > Some people have the talent to solve the puzzle, see the pattern and
> > > > come up with the solution - and some just don't?
>
> > > How badly would the following simple algorithm work:
>
> > > Look at the first column. Determine which digit occurs more - 0 or 1 -
> > > among the 1,000,000 integers, and mark those integers, that don't have
> > > this digit, as "R" (for "rejects")
>
> > This requires linear O(n) space to store these reject marks.  The two
> > previously discussed algorithms only require constant O(1) space.
> >  
>
> Do you mean this:
>
> int findmode(int* p, int n)
> {
>     int count[32];
>     for(int i = 0; i < 32; i++)
>         count[i] = 0;
>
>     for (int i = 0; i < n; i++)
>          for (int j = 0; j < 32; j++)
>                if (p[i] & (1 << j)) // if bit j is on
>                     count[j]++;
>                else
>                     count[j]--;
>
>     int x = 0;
>     for (int i = 0; i < 32; i++)
>         if (count[i] > 0)
>             x = x | (1 << i);
>
>     return x;
>
> }
>
> Doesn't count[j] require O(log n) space?
>
> But of course O(log n) is still better than O(n)

Ummm, I'm not sure whether you are reffering to the number of ints
that make up the count array (32) or the size of the individual ints
themselves. The number of counts required is always 32 independant of
n, therefore that part is constant.

In the counting algorithm the size of each of the 32 ints has to be
large enough to hold a value of n, and so requires log2(n) bits. This
is also true of the Fisher algorithm which must maintain a similiar
count of maximum value n, and therefore also needs log2(n) bits.
These were generally referred to as constant space solutions, but I am
confused right now whether that is true anymore, and whether these two
solutions are actually technically O(log(n)) space. I remember
hearing something about there being an implied log operation on space
complexity in some context, but I'm not sure if I'm conflating this
with something else.
-Andrew.
From: websnarf on
Andrew Tomazos <and...(a)tomazos.com> wrote:
> I was posed the following question in a technical interview for a
> Software Engineering position by a major multinational NASDAQ company:
>
> [Paraphrasing] "You are given an array of 1,000,000 32-bit integers.
> One int value x occurs 500,001 times or more in the array. Specify an
> algorithm to determine x."

I missed your original post, but dug up the following that might be of
interest:

http://www.cs.utexas.edu/users/moore/best-ideas/mjrty/index.html

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
From: FÖLDY Lajos on

On Wed, 27 Jan 2010, websnarf wrote:

> I missed your original post, but dug up the following that might be of
> interest:
>
> http://www.cs.utexas.edu/users/moore/best-ideas/mjrty/index.html

Probably I miss something, but this algorithm gives D as the majority
element in the sequence AAAABCBCD. ???

regards,
lajos