From: Malcolm McLean on 22 Nov 2009 10:45 "Andrew Tomazos" <andrew(a)tomazos.com> wrote in message news: >I was posed the following question in a technical interview for a > Software Engineering position by a major multinational NASDAQ company: > > [Paraphrasing] "You are given an array of 1,000,000 32-bit integers. > One int value x occurs 500,001 times or more in the array. Specify an > algorithm to determine x." > Take 100 values at random. Then you need some AI. If you get a number with a count of about 50 and no other numbers with counts of anything like that figure, then it is vanishingly unlikely that then number is anything other than the 50s. (Typically it is more likely that the computer will have its memory corrupted during the execution of the program). If you've got two counts close to 50, then it must be that there is one value represented 500,001 times and another represented 499,999 times (or thereabouts). In this situation, you have to step through the array and count your candidates. To do it properly you can work out the stats.
From: Willem on 22 Nov 2009 11:22 Andrew Tomazos wrote: ) On Nov 22, 9:50�am, Willem <wil...(a)stack.nl> wrote: )> I'm not so sure. �I wouldn't be surprised if you only get "full marks" )> for the solution that the interviewer has in mind, even if there are )> other solutions that are equivalent or better. ) ) Ummm. No, I don't think so. It was my impression that performance is ) assessed according to the asymptotic equivalence class of time and ) space requirements. At least in the interview setting I am talking ) about. My comment is based on my general impression of interviewers, not on the way the OP worded his question. I find it more likely that an interviewer posing such a question 'knows the trick' and wants the interviewee to find the same trick. Also note that in this case, the interviewer goaded the interviewee to the less general solution (see below). )> NB: The counting method is actually better, because you don't need )> O(k * log n) space (where k is the size of an individual key). )> Or, practically speaking, it also works if the items are strings. Did you read the last sentence ? You know, the one with 'strings' ? ) Normally, k (the number of bits in an int) would be considered ) constant (I haven't encountered many 1,000,000 bit integers as yet), ) therefore both solutions are asymptotically equivalent in space and ) time. We're looking at the difference between two algorithms. With one view, they're equivalent, with another view one of them is better. To me, that means that that one is better overall. Furthermore, the counting solution is more general, because it can work on any type of item with an equivalence operator. All of this has nothing to do with nonasymptotic behaviour. ) <snip> SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT
From: Chip Eastham on 22 Nov 2009 11:51 On Nov 21, 7:04 pm, Andrew Tomazos <and...(a)tomazos.com> wrote: > On Nov 22, 1:01 am, Andrew Tomazos <and...(a)tomazos.com> wrote: > > > > > On Nov 21, 6:29 pm, gwo...(a)figipc78.tu-graz.ac.at (GJ Woeginger) > > wrote: > > > > In comp.theory Andrew Tomazos <and...(a)tomazos.com> wrote: > > > # I was posed the following question in a technical interview for a > > > # Software Engineering position by a major multinational NASDAQ company: > > > # > > > # [Paraphrasing] "You are given an array of 1,000,000 32-bit integers. > > > # One int value x occurs 500,001 times or more in the array. Specify an > > > # algorithm to determine x." > > > # > > > # The assumption being extra points for efficiency. > > > > There is an old analysis of this problem by Fischer and Salzberg. > > > M.J. Fisher and S.L. Salzberg (1982) > > > Finding a majority among n votes. > > > Journal of Algorithms 3, pp 143-152. > > > > If 2k elements contain a majority element (= an element that occurs at > > > least k+1 times), then you can find it with 3k-2 element comparisons > > > (and some small overhead). The basic idea in their algorithm is that > > > whenever you find two distinct elements, then you can destroy both without > > > changing the majority element among the remaining elements. This yields: > > > > Run once through the array, and maintain a majority-candidate and a counter. > > > The majority-candidate is initialized as the first element, and > > > the counter (counting how often you have seen the candidate) is > > > initialized at 1. > > > If you hit the current candidate again, increment the counter. > > > If you hit another element, decrement the counter. > > > If the counter becomes 0, drop the current candidate and start from > > > scratch with the remaining part of the array. > > > > Fischer and Salzberg also show that this bound 3k-2 is best possible in > > > the worst case (and that's the main part of their paper). > > > If I understand your description than it would look like: > > > int findmode(int* p, int n) > > { > > int x = p[0]; > > int c = 1; > > > for (int i = 1; i < n; i++) > > { > > if (c == 0) > > { > > x = p[i]; > > c = 1; > > } > > else if (p[i] == x) > > c++; > > else > > c--; > > } > > > return x; > > > } > > > It seems that this could only produce at maximum (2k-1) comparisions > > in the worst case, not (3k-2) as you claim? > > -Andrew. > > Oh wait... the (c==0) counts as a comparison. I see it now. Ignore > me. > -Andrew. I don't think the (c==0) counts as a comparison. I think the other k-1 comparisons are for the 2nd part of the algorithm, when you make one more pass through the array verifying the candidate found in the 1st part. [You already know the index i for one occurrence of the candidate, so only the other k-1 indices have to be checked.] You can see how badly I (mathtalk-ga) stumbled through this same puzzle here: [Google Answers: algorithms] http://answers.google.com/answers/threadview/id/582711.html regards, chip
From: Malcolm McLean on 22 Nov 2009 13:00 "Richard Heathfield" <rjh(a)see.sig.invalid> wrote in message > I thought of that too, but quickly saw the flaw - if for large N you > have (N/2)+1 occurrences of X, and (N/2)-1 occurrences of Y, you > can't settle the matter (is it X, or is it Y?) satisfactorily without > reading every element. > You need to calculate the error in your estimate. If two numers overlap 0.5 frequency, on some suitably stringent cutoff, then you do a full count. However I'm sure how to do this, offhand.
From: Mike Terry on 22 Nov 2009 13:50
"Chip Eastham" <hardmath(a)gmail.com> wrote in message news:3572ef9d-dab3-480b-8ed2-2d12ad48e1e0(a)c34g2000yqn.googlegroups.com... > On Nov 21, 7:04 pm, Andrew Tomazos <and...(a)tomazos.com> wrote: > > On Nov 22, 1:01 am, Andrew Tomazos <and...(a)tomazos.com> wrote: > > > > > > > > > On Nov 21, 6:29 pm, gwo...(a)figipc78.tu-graz.ac.at (GJ Woeginger) > > > wrote: > > > > > > In comp.theory Andrew Tomazos <and...(a)tomazos.com> wrote: > > > > # I was posed the following question in a technical interview for a > > > > # Software Engineering position by a major multinational NASDAQ company: > > > > # > > > > # [Paraphrasing] "You are given an array of 1,000,000 32-bit integers. > > > > # One int value x occurs 500,001 times or more in the array. Specify an > > > > # algorithm to determine x." > > > > # > > > > # The assumption being extra points for efficiency. > > > > > > There is an old analysis of this problem by Fischer and Salzberg. > > > > M.J. Fisher and S.L. Salzberg (1982) > > > > Finding a majority among n votes. > > > > Journal of Algorithms 3, pp 143-152. > > > > > > If 2k elements contain a majority element (= an element that occurs at > > > > least k+1 times), then you can find it with 3k-2 element comparisons > > > > (and some small overhead). The basic idea in their algorithm is that > > > > whenever you find two distinct elements, then you can destroy both without > > > > changing the majority element among the remaining elements. This yields: > > > > > > Run once through the array, and maintain a majority-candidate and a counter. > > > > The majority-candidate is initialized as the first element, and > > > > the counter (counting how often you have seen the candidate) is > > > > initialized at 1. > > > > If you hit the current candidate again, increment the counter. > > > > If you hit another element, decrement the counter. > > > > If the counter becomes 0, drop the current candidate and start from > > > > scratch with the remaining part of the array. > > > > > > Fischer and Salzberg also show that this bound 3k-2 is best possible in > > > > the worst case (and that's the main part of their paper). > > > > > If I understand your description than it would look like: > > > > > int findmode(int* p, int n) > > > { > > > int x = p[0]; > > > int c = 1; > > > > > for (int i = 1; i < n; i++) > > > { > > > if (c == 0) > > > { > > > x = p[i]; > > > c = 1; > > > } > > > else if (p[i] == x) > > > c++; > > > else > > > c--; > > > } > > > > > return x; > > > > > } > > > > > It seems that this could only produce at maximum (2k-1) comparisions > > > in the worst case, not (3k-2) as you claim? > > > -Andrew. > > > > Oh wait... the (c==0) counts as a comparison. I see it now. Ignore > > me. > > -Andrew. > > I don't think the (c==0) counts as a comparison. > I think the other k-1 comparisons are for the 2nd > part of the algorithm, when you make one more pass > through the array verifying the candidate found in > the 1st part. [You already know the index i for > one occurrence of the candidate, so only the other > k-1 indices have to be checked.] There is no second pass through the algorithm, if I've understood it correctly! |