Math/CompSci Interview Question - Thoughts? [General Programming]

Prev: Unification Algorithm
Next: Software Configuration Management: Continuous Integration Widely Adopted

From: Andrew Tomazos on 22 Nov 2009 10:01

On Nov 22, 9:50 am, Willem <wil...(a)stack.nl> wrote:
> ) Dubuque is referring to the solution that Woeginger more lucidly
> ) described above. Both it and the bit counting method are
> ) asymptotically equivalent solutions to the original problem. I'm sure
> ) either of these solutions provided on the spot would have received
> ) "full marks".
>
> I'm not so sure. I wouldn't be surprised if you only get "full marks"
> for the solution that the interviewer has in mind, even if there are
> other solutions that are equivalent or better.

Ummm. No, I don't think so. It was my impression that performance is
assessed according to the asymptotic equivalence class of time and
space requirements. At least in the interview setting I am talking
about.

> NB: The counting method is actually better, because you don't need
> O(k * log n) space (where k is the size of an individual key).
> Or, practically speaking, it also works if the items are strings.

Normally, k (the number of bits in an int) would be considered
constant (I haven't encountered many 1,000,000 bit integers as yet),
therefore both solutions are asymptotically equivalent in space and
time. There is usually a very large number of ways to make
nonasymptotic improvements to any algorithm. None of this precludes
one solution being "better" than the other by some other measure
though, but it gets harder to measure performance at a finer grain.
You need to start considering the impact of the memory hierarchy,
locality of reference, what the optimizer is going to do, and so on.
To be honest I would expect both algorithms to be memory bound by the
single pass of the large array. But it would be interesting to
compare anyway. Maybe I'll do a performance comparison.
-Andrew

From: Malcolm McLean on 22 Nov 2009 10:45

"Andrew Tomazos" <andrew(a)tomazos.com> wrote in message news:
>I was posed the following question in a technical interview for a
> Software Engineering position by a major multinational NASDAQ company:
>
> [Paraphrasing] "You are given an array of 1,000,000 32-bit integers.
> One int value x occurs 500,001 times or more in the array. Specify an
> algorithm to determine x."
>
Take 100 values at random. Then you need some AI. If you get a number with a
count of about 50 and no other numbers with counts of anything like that
figure, then it is vanishingly unlikely that then number is anything other
than the 50s. (Typically it is more likely that the computer will have its
memory corrupted during the execution of the program). If you've got two
counts close to 50, then it must be that there is one value represented
500,001 times and another represented 499,999 times (or thereabouts). In
this situation, you have to step through the array and count your
candidates.

To do it properly you can work out the stats.

From: Willem on 22 Nov 2009 11:22

Andrew Tomazos wrote:
) On Nov 22, 9:50�am, Willem <wil...(a)stack.nl> wrote:
)> I'm not so sure. �I wouldn't be surprised if you only get "full marks"
)> for the solution that the interviewer has in mind, even if there are
)> other solutions that are equivalent or better.
)
) Ummm. No, I don't think so. It was my impression that performance is
) assessed according to the asymptotic equivalence class of time and
) space requirements. At least in the interview setting I am talking
) about.

My comment is based on my general impression of interviewers, not on the
way the OP worded his question. I find it more likely that an interviewer
posing such a question 'knows the trick' and wants the interviewee to find
the same trick. Also note that in this case, the interviewer goaded the
interviewee to the less general solution (see below).

)> NB: The counting method is actually better, because you don't need
)> O(k * log n) space (where k is the size of an individual key).
)> Or, practically speaking, it also works if the items are strings.

Did you read the last sentence ? You know, the one with 'strings' ?

) Normally, k (the number of bits in an int) would be considered
) constant (I haven't encountered many 1,000,000 bit integers as yet),
) therefore both solutions are asymptotically equivalent in space and
) time.

We're looking at the difference between two algorithms. With one view,
they're equivalent, with another view one of them is better. To me, that
means that that one is better overall.

Furthermore, the counting solution is more general, because it
can work on any type of item with an equivalence operator.

All of this has nothing to do with nonasymptotic behaviour.

) <snip>

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

From: Chip Eastham on 22 Nov 2009 11:51

On Nov 21, 7:04 pm, Andrew Tomazos <and...(a)tomazos.com> wrote:
> On Nov 22, 1:01 am, Andrew Tomazos <and...(a)tomazos.com> wrote:
>
>
>
> > On Nov 21, 6:29 pm, gwo...(a)figipc78.tu-graz.ac.at (GJ Woeginger)
> > wrote:
>
> > > In comp.theory Andrew Tomazos <and...(a)tomazos.com> wrote:
> > > # I was posed the following question in a technical interview for a
> > > # Software Engineering position by a major multinational NASDAQ company:
> > > #
> > > # [Paraphrasing] "You are given an array of 1,000,000 32-bit integers.
> > > # One int value x occurs 500,001 times or more in the array. Specify an
> > > # algorithm to determine x."
> > > #
> > > # The assumption being extra points for efficiency.
>
> > > There is an old analysis of this problem by Fischer and Salzberg.
> > > M.J. Fisher and S.L. Salzberg (1982)
> > > Finding a majority among n votes.
> > > Journal of Algorithms 3, pp 143-152.
>
> > > If 2k elements contain a majority element (= an element that occurs at
> > > least k+1 times), then you can find it with 3k-2 element comparisons
> > > (and some small overhead). The basic idea in their algorithm is that
> > > whenever you find two distinct elements, then you can destroy both without
> > > changing the majority element among the remaining elements. This yields:
>
> > > Run once through the array, and maintain a majority-candidate and a counter.
> > > The majority-candidate is initialized as the first element, and
> > > the counter (counting how often you have seen the candidate) is
> > > initialized at 1.
> > > If you hit the current candidate again, increment the counter.
> > > If you hit another element, decrement the counter.
> > > If the counter becomes 0, drop the current candidate and start from
> > > scratch with the remaining part of the array.
>
> > > Fischer and Salzberg also show that this bound 3k-2 is best possible in
> > > the worst case (and that's the main part of their paper).
>
> > If I understand your description than it would look like:
>
> > int findmode(int* p, int n)
> > {
> > int x = p[0];
> > int c = 1;
>
> > for (int i = 1; i < n; i++)
> > {
> > if (c == 0)
> > {
> > x = p[i];
> > c = 1;
> > }
> > else if (p[i] == x)
> > c++;
> > else
> > c--;
> > }
>
> > return x;
>
> > }
>
> > It seems that this could only produce at maximum (2k-1) comparisions
> > in the worst case, not (3k-2) as you claim?
> > -Andrew.
>
> Oh wait... the (c==0) counts as a comparison. I see it now. Ignore
> me.
> -Andrew.

I don't think the (c==0) counts as a comparison.
I think the other k-1 comparisons are for the 2nd
part of the algorithm, when you make one more pass
through the array verifying the candidate found in
the 1st part. [You already know the index i for
one occurrence of the candidate, so only the other
k-1 indices have to be checked.]

You can see how badly I (mathtalk-ga) stumbled
through this same puzzle here:

[Google Answers: algorithms]
http://answers.google.com/answers/threadview/id/582711.html

regards, chip

From: Malcolm McLean on 22 Nov 2009 13:00

"Richard Heathfield" <rjh(a)see.sig.invalid> wrote in message
> I thought of that too, but quickly saw the flaw - if for large N you
> have (N/2)+1 occurrences of X, and (N/2)-1 occurrences of Y, you
> can't settle the matter (is it X, or is it Y?) satisfactorily without
> reading every element.
>
You need to calculate the error in your estimate. If two numers overlap 0.5
frequency, on some suitably stringent cutoff, then you do a full count.

However I'm sure how to do this, offhand.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Prev: Unification Algorithm
Next: Software Configuration Management: Continuous Integration Widely Adopted