From: Willem on
Virgil wrote:
) I
)> >>>I was posed the following question in a technical interview for a
)> >>>Software Engineering position by a major multinational NASDAQ company:
)> >>>
)> >>>[Paraphrasing] "You are given an array of 1,000,000 32-bit integers.
)> >>>One int value x occurs 500,001 times or more in the array. Specify an
)> >>>algorithm to determine x."
)
) Find, to the nearest integer, the average value.

Doesn't work.
Consider the case of 500,001 times 0 and 499,999 times 2^31-1

Perhaps you are confused with 'find the median value' ?


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
From: Richard Heathfield on
In <7mr6n5F3j1be1U1(a)mid.individual.net>, Axel Vogt wrote:

> Andrew Tomazos wrote:
>> I was posed the following question in a technical interview for a
>> Software Engineering position by a major multinational NASDAQ
>> company:
>>
>> [Paraphrasing] "You are given an array of 1,000,000 32-bit
>> [integers.
>> One int value x occurs 500,001 times or more in the array. Specify
>> an algorithm to determine x."
>>
>> The assumption being extra points for efficiency.
> <snipped>
>
> Being a bit stupid I first would ask "Why? What do you do with it?"
> and then I would pick on random. I am almost certain, that even at
> a low number of draws the chance to get the very integer is higher
> than implementing an algo without coding errors.

I thought of that too, but quickly saw the flaw - if for large N you
have (N/2)+1 occurrences of X, and (N/2)-1 occurrences of Y, you
can't settle the matter (is it X, or is it Y?) satisfactorily without
reading every element.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
From: Willem on
Andrew Tomazos wrote:
) Dubuque is referring to the solution that Woeginger more lucidly
) described above. Both it and the bit counting method are
) asymptotically equivalent solutions to the original problem. I'm sure
) either of these solutions provided on the spot would have received
) "full marks".

I'm not so sure. I wouldn't be surprised if you only get "full marks"
for the solution that the interviewer has in mind, even if there are
other solutions that are equivalent or better.

NB: The counting method is actually better, because you don't need
O(k * log n) space (where k is the size of an individual key).
Or, practically speaking, it also works if the items are strings.


A very simple proof of the 'other' method would be that whenever
c reaches 0, the part of the array you processed can't have more
than half of the 'majority' item. (Exactly half of it is one single
item, and the other half is not that single item) So the remaining
part *must* have more than half, i.e. the problem is reduced to
the same problem for the remaining part.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
From: Moi on
On Sun, 22 Nov 2009 08:47:19 +0000, Richard Heathfield wrote:

> In <7mr6n5F3j1be1U1(a)mid.individual.net>, Axel Vogt wrote:
>
>> Andrew Tomazos wrote:
>>> I was posed the following question in a technical interview for a
>>> Software Engineering position by a major multinational NASDAQ company:
>>>
>>> [Paraphrasing] "You are given an array of 1,000,000 32-bit [integers.
>>> One int value x occurs 500,001 times or more in the array. Specify an
>>> algorithm to determine x."
>>>
>>> The assumption being extra points for efficiency.
>> <snipped>
>>
>> Being a bit stupid I first would ask "Why? What do you do with it?" and
>> then I would pick on random. I am almost certain, that even at a low
>> number of draws the chance to get the very integer is higher than
>> implementing an algo without coding errors.
>
> I thought of that too, but quickly saw the flaw - if for large N you
> have (N/2)+1 occurrences of X, and (N/2)-1 occurrences of Y, you can't
> settle the matter (is it X, or is it Y?) satisfactorily without reading
> every element.

With a kind of weight-balanced tree,
( := rotate in such a way that the node with the higher reference
count is closer to the root) you can terminate reading and inserting
the values once there is a clear "winner".

I think, something similar can be done for the bitcounting method:
if all bins have a value with (abs(value) > number_of_remaining_items)
you can terminate the inspection / insertion.



AvK
From: Andrew Tomazos on
On Nov 22, 9:50 am, Willem <wil...(a)stack.nl> wrote:
> ) Dubuque is referring to the solution that Woeginger more lucidly
> ) described above.  Both it and the bit counting method are
> ) asymptotically equivalent solutions to the original problem.  I'm sure
> ) either of these solutions provided on the spot would have received
> ) "full marks".
>
> I'm not so sure.  I wouldn't be surprised if you only get "full marks"
> for the solution that the interviewer has in mind, even if there are
> other solutions that are equivalent or better.

Ummm. No, I don't think so. It was my impression that performance is
assessed according to the asymptotic equivalence class of time and
space requirements. At least in the interview setting I am talking
about.

> NB: The counting method is actually better, because you don't need
> O(k * log n) space (where k is the size of an individual key).
> Or, practically speaking, it also works if the items are strings.

Normally, k (the number of bits in an int) would be considered
constant (I haven't encountered many 1,000,000 bit integers as yet),
therefore both solutions are asymptotically equivalent in space and
time. There is usually a very large number of ways to make
nonasymptotic improvements to any algorithm. None of this precludes
one solution being "better" than the other by some other measure
though, but it gets harder to measure performance at a finer grain.
You need to start considering the impact of the memory hierarchy,
locality of reference, what the optimizer is going to do, and so on.
To be honest I would expect both algorithms to be memory bound by the
single pass of the large array. But it would be interesting to
compare anyway. Maybe I'll do a performance comparison.
-Andrew