From: Willem on 22 Nov 2009 03:33 Virgil wrote: ) I )> >>>I was posed the following question in a technical interview for a )> >>>Software Engineering position by a major multinational NASDAQ company: )> >>> )> >>>[Paraphrasing] "You are given an array of 1,000,000 32-bit integers. )> >>>One int value x occurs 500,001 times or more in the array. Specify an )> >>>algorithm to determine x." ) ) Find, to the nearest integer, the average value. Doesn't work. Consider the case of 500,001 times 0 and 499,999 times 2^31-1 Perhaps you are confused with 'find the median value' ? SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT
From: Richard Heathfield on 22 Nov 2009 03:47 In <7mr6n5F3j1be1U1(a)mid.individual.net>, Axel Vogt wrote: > Andrew Tomazos wrote: >> I was posed the following question in a technical interview for a >> Software Engineering position by a major multinational NASDAQ >> company: >> >> [Paraphrasing] "You are given an array of 1,000,000 32-bit >> [integers. >> One int value x occurs 500,001 times or more in the array. Specify >> an algorithm to determine x." >> >> The assumption being extra points for efficiency. > <snipped> > > Being a bit stupid I first would ask "Why? What do you do with it?" > and then I would pick on random. I am almost certain, that even at > a low number of draws the chance to get the very integer is higher > than implementing an algo without coding errors. I thought of that too, but quickly saw the flaw - if for large N you have (N/2)+1 occurrences of X, and (N/2)-1 occurrences of Y, you can't settle the matter (is it X, or is it Y?) satisfactorily without reading every element. -- Richard Heathfield <http://www.cpax.org.uk> Email: -http://www. +rjh@ "Usenet is a strange place" - dmr 29 July 1999 Sig line vacant - apply within
From: Willem on 22 Nov 2009 03:50 Andrew Tomazos wrote: ) Dubuque is referring to the solution that Woeginger more lucidly ) described above. Both it and the bit counting method are ) asymptotically equivalent solutions to the original problem. I'm sure ) either of these solutions provided on the spot would have received ) "full marks". I'm not so sure. I wouldn't be surprised if you only get "full marks" for the solution that the interviewer has in mind, even if there are other solutions that are equivalent or better. NB: The counting method is actually better, because you don't need O(k * log n) space (where k is the size of an individual key). Or, practically speaking, it also works if the items are strings. A very simple proof of the 'other' method would be that whenever c reaches 0, the part of the array you processed can't have more than half of the 'majority' item. (Exactly half of it is one single item, and the other half is not that single item) So the remaining part *must* have more than half, i.e. the problem is reduced to the same problem for the remaining part. SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT
From: Moi on 22 Nov 2009 06:32 On Sun, 22 Nov 2009 08:47:19 +0000, Richard Heathfield wrote: > In <7mr6n5F3j1be1U1(a)mid.individual.net>, Axel Vogt wrote: > >> Andrew Tomazos wrote: >>> I was posed the following question in a technical interview for a >>> Software Engineering position by a major multinational NASDAQ company: >>> >>> [Paraphrasing] "You are given an array of 1,000,000 32-bit [integers. >>> One int value x occurs 500,001 times or more in the array. Specify an >>> algorithm to determine x." >>> >>> The assumption being extra points for efficiency. >> <snipped> >> >> Being a bit stupid I first would ask "Why? What do you do with it?" and >> then I would pick on random. I am almost certain, that even at a low >> number of draws the chance to get the very integer is higher than >> implementing an algo without coding errors. > > I thought of that too, but quickly saw the flaw - if for large N you > have (N/2)+1 occurrences of X, and (N/2)-1 occurrences of Y, you can't > settle the matter (is it X, or is it Y?) satisfactorily without reading > every element. With a kind of weight-balanced tree, ( := rotate in such a way that the node with the higher reference count is closer to the root) you can terminate reading and inserting the values once there is a clear "winner". I think, something similar can be done for the bitcounting method: if all bins have a value with (abs(value) > number_of_remaining_items) you can terminate the inspection / insertion. AvK
From: Andrew Tomazos on 22 Nov 2009 10:01
On Nov 22, 9:50 am, Willem <wil...(a)stack.nl> wrote: > ) Dubuque is referring to the solution that Woeginger more lucidly > ) described above. Both it and the bit counting method are > ) asymptotically equivalent solutions to the original problem. I'm sure > ) either of these solutions provided on the spot would have received > ) "full marks". > > I'm not so sure. I wouldn't be surprised if you only get "full marks" > for the solution that the interviewer has in mind, even if there are > other solutions that are equivalent or better. Ummm. No, I don't think so. It was my impression that performance is assessed according to the asymptotic equivalence class of time and space requirements. At least in the interview setting I am talking about. > NB: The counting method is actually better, because you don't need > O(k * log n) space (where k is the size of an individual key). > Or, practically speaking, it also works if the items are strings. Normally, k (the number of bits in an int) would be considered constant (I haven't encountered many 1,000,000 bit integers as yet), therefore both solutions are asymptotically equivalent in space and time. There is usually a very large number of ways to make nonasymptotic improvements to any algorithm. None of this precludes one solution being "better" than the other by some other measure though, but it gets harder to measure performance at a finer grain. You need to start considering the impact of the memory hierarchy, locality of reference, what the optimizer is going to do, and so on. To be honest I would expect both algorithms to be memory bound by the single pass of the large array. But it would be interesting to compare anyway. Maybe I'll do a performance comparison. -Andrew |