Prev: Unification Algorithm
Next: Software Configuration Management: Continuous Integration Widely Adopted
From: Mike Terry on 22 Nov 2009 13:50 "Chip Eastham" <hardmath(a)gmail.com> wrote in message news:3572ef9d-dab3-480b-8ed2-2d12ad48e1e0(a)c34g2000yqn.googlegroups.com... > On Nov 21, 7:04 pm, Andrew Tomazos <and...(a)tomazos.com> wrote: > > On Nov 22, 1:01 am, Andrew Tomazos <and...(a)tomazos.com> wrote: > > > > > > > > > On Nov 21, 6:29 pm, gwo...(a)figipc78.tu-graz.ac.at (GJ Woeginger) > > > wrote: > > > > > > In comp.theory Andrew Tomazos <and...(a)tomazos.com> wrote: > > > > # I was posed the following question in a technical interview for a > > > > # Software Engineering position by a major multinational NASDAQ company: > > > > # > > > > # [Paraphrasing] "You are given an array of 1,000,000 32-bit integers. > > > > # One int value x occurs 500,001 times or more in the array. Specify an > > > > # algorithm to determine x." > > > > # > > > > # The assumption being extra points for efficiency. > > > > > > There is an old analysis of this problem by Fischer and Salzberg. > > > > M.J. Fisher and S.L. Salzberg (1982) > > > > Finding a majority among n votes. > > > > Journal of Algorithms 3, pp 143-152. > > > > > > If 2k elements contain a majority element (= an element that occurs at > > > > least k+1 times), then you can find it with 3k-2 element comparisons > > > > (and some small overhead). The basic idea in their algorithm is that > > > > whenever you find two distinct elements, then you can destroy both without > > > > changing the majority element among the remaining elements. This yields: > > > > > > Run once through the array, and maintain a majority-candidate and a counter. > > > > The majority-candidate is initialized as the first element, and > > > > the counter (counting how often you have seen the candidate) is > > > > initialized at 1. > > > > If you hit the current candidate again, increment the counter. > > > > If you hit another element, decrement the counter. > > > > If the counter becomes 0, drop the current candidate and start from > > > > scratch with the remaining part of the array. > > > > > > Fischer and Salzberg also show that this bound 3k-2 is best possible in > > > > the worst case (and that's the main part of their paper). > > > > > If I understand your description than it would look like: > > > > > int findmode(int* p, int n) > > > { > > > int x = p[0]; > > > int c = 1; > > > > > for (int i = 1; i < n; i++) > > > { > > > if (c == 0) > > > { > > > x = p[i]; > > > c = 1; > > > } > > > else if (p[i] == x) > > > c++; > > > else > > > c--; > > > } > > > > > return x; > > > > > } > > > > > It seems that this could only produce at maximum (2k-1) comparisions > > > in the worst case, not (3k-2) as you claim? > > > -Andrew. > > > > Oh wait... the (c==0) counts as a comparison. I see it now. Ignore > > me. > > -Andrew. > > I don't think the (c==0) counts as a comparison. > I think the other k-1 comparisons are for the 2nd > part of the algorithm, when you make one more pass > through the array verifying the candidate found in > the 1st part. [You already know the index i for > one occurrence of the candidate, so only the other > k-1 indices have to be checked.] There is no second pass through the algorithm, if I've understood it correctly!
From: Ben Bacarisse on 22 Nov 2009 14:37 gwoegi(a)figipc78.tu-graz.ac.at (GJ Woeginger) writes: > In comp.theory Andrew Tomazos <andrew(a)tomazos.com> wrote: > # I was posed the following question in a technical interview for a > # Software Engineering position by a major multinational NASDAQ company: > # > # [Paraphrasing] "You are given an array of 1,000,000 32-bit integers. > # One int value x occurs 500,001 times or more in the array. Specify an > # algorithm to determine x." > # > # The assumption being extra points for efficiency. > > There is an old analysis of this problem by Fischer and Salzberg. > M.J. Fisher and S.L. Salzberg (1982) > Finding a majority among n votes. > Journal of Algorithms 3, pp 143-152. I can't track this down. Have I got the right Journal of Algorithms? http://www.informatik.uni-trier.de/~ley/db/journals/jal/jal3.html does not list that paper. Of course, this table of contents might be wrong. The horrendous link: http://www.sciencedirect.com/science?_ob=PublicationURL&_tockey=%23TOC%236839%231982%23999969995%23518142%23FLA%23&_cdi=6839&_pubType=J&_auth=y&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=2c8bc9013d6025d08b7d24d3207dfe18 appears to be for the journal itself. The paper does not appear there, either. > If 2k elements contain a majority element (= an element that occurs at > least k+1 times), then you can find it with 3k-2 element comparisons > (and some small overhead). The basic idea in their algorithm is that > whenever you find two distinct elements, then you can destroy both without > changing the majority element among the remaining elements. This yields: > > Run once through the array, and maintain a majority-candidate and a counter. > The majority-candidate is initialized as the first element, and > the counter (counting how often you have seen the candidate) is > initialized at 1. > If you hit the current candidate again, increment the counter. > If you hit another element, decrement the counter. > If the counter becomes 0, drop the current candidate and start from > scratch with the remaining part of the array. > > Fischer and Salzberg also show that this bound 3k-2 is best possible in > the worst case (and that's the main part of their paper). I can find a 1982 technical report from Yale: FINDING A MAJORITY AMONG N VOTES Solution to Problem 81-5 (Journal of Algorithms, June 1981) by the same two authors. It describes what seems to me to be a different algorithm and for which the 3k - 2 bound is proved. I suspect the algorithm has been modified since the technical report but I can't find a good reference that matches the algorithm you describe. <snip> -- Ben.
From: mjc on 22 Nov 2009 16:52 I was also given this problem as part of an interview quiz. My answer was: Use one of the known O(n) median algorithms and use that value. This was accepted by the interviewer (but I didn't get the job).
From: Stuart Golodetz on 22 Nov 2009 19:45 Andrew Tomazos wrote: > I was posed the following question in a technical interview for a > Software Engineering position by a major multinational NASDAQ company: > > [Paraphrasing] "You are given an array of 1,000,000 32-bit integers. > One int value x occurs 500,001 times or more in the array. Specify an > algorithm to determine x." > > The assumption being extra points for efficiency. > > I initially came up with a linear space, linear time solution. With > some prompting and hints from the interviewer we worked our way to a > smaller constant space and linear time algorithm. That being > basically: > > int findmode(int* p, int n) > { > int count[32]; > for(int i = 0; i < 32; i++) > count[i] = 0; > > for (int i = 0; i < n; i++) > for (int j = 0; j < 32; j++) > if (p[i] & (1 << j)) // if bit j is on > count[j]++; > else > count[j]--; > > int x = 0; > for (int i = 0; i < 32; i++) > if (count[i] > 0) > x = x | (1 << i); > > return x; > } > > The idea here is to count the frequency of each of the 32 bits in the > array separately, knowing that these bit counts will be dominated by > the mode. > > The interviewer already knew the answer, so I can only assume the test > was based on how much help he had to give me to arrive at it. Well he certainly wasn't trying to use you as cheap labour to solve his latest coding problem, that's for sure :) It's a rare interviewer who asks questions to which he doesn't already know the answer... > My question about this is as follows. I, perhaps boldly, claim to > employers to have a "masters-equivilant" experience/knowledge of > computer science and math. Should I have been able to come up with > this solution without prompting or help? Not necessarily - the purpose of these things isn't always to see whether you can immediately get the right answer. Often they really want to see how you tackle the problem. On a separate note, though, it's not entirely clear to me what a "masters-equivalent experience/knowledge of computer science and math" entails - not because I question in any way your experience/knowledge, but because not all masters degrees are equal (and not all of them are trying to teach the same thing). Furthermore, people with equal degrees are not in general equally good at what they do - and there's not even necessarily a direct correlation between what marks people get in their degree exams and how competent they are in a work situation. So I understand what you're getting at, but I'm not sure that strictly speaking it's a meaningful claim (if it impresses a potential employer, perhaps you should worry about them). > Would you expect someone with a CompSci Masters or PhD from some major > ivy league university to be able to come up with this solution without > help? I don't think having a graduate degree per se makes any difference to your ability to solve coding puzzles like this. (I'm currently doing a DPhil at Oxford, and whilst it's made a big difference to a number of skills I have, I don't think it's particularly helped me with puzzles like this - some I can see through, some I can't, much as it always used to be.) On the other hand, if you've done such a degree then you probably have more of an interest in coding/maths than many people, so it's not entirely unreasonable to ask you puzzle questions like this to see how your mind works. (Being able to solve them isn't always the point - it's sometimes how you go about trying that they're interested in.) > If so in what course or from which book would you expect to learn the > required knowledge to come up with the above solution? No one course/book could teach you how to solve all the possible puzzles they might throw at you - they're trying to see how your mind works/evaluate your all-round knowledge and lateral thinking. It's actually good that you can't just get a book of puzzles and learn everything in it - that would defeat the point of the test. > Is the skill to be able to come up with such an algorithm something > that is learned by studying lots of algorithms and being able to mix > and match the "tricks"? If so, what is a similar commonly known > algorithm(s) on which the above solution could have been based? I think some people are naturally better at it than others (some of the guys I met at the British Informatics Olympiad some years ago were scarily good at algorithm design, for example), but it is something that's helped by seeing lots of algorithms, certainly. > Or, is the ability to invent such a solution simply a matter of IQ? > Some people have the talent to solve the puzzle, see the pattern and > come up with the solution - and some just don't? It's not IQ (whatever that measures in practice) per se - it's partly having a particular skill set, e.g. being a lateral thinker etc., and partly having seen lots of similar things before. Some people are better at it - and you want some of those people on your team (one of the guys I work with is good like this, always coming up with algorithms for things). On the other hand, not all of them will be as strong in other areas, so you probably wouldn't want the ability to solve algorithmic puzzles to be your only selection criterion; nor would you want to rule out people who were excellent in other areas because solving this sort of puzzle isn't really their thing. It's important to recruit a balanced team. Cheers, Stuart > Thanks, > Andrew.
From: James Kanze on 23 Nov 2009 05:27
On Nov 22, 4:51 pm, Chip Eastham <hardm...(a)gmail.com> wrote: > On Nov 21, 7:04 pm, Andrew Tomazos <and...(a)tomazos.com> wrote: > > On Nov 22, 1:01 am, Andrew Tomazos <and...(a)tomazos.com> wrote: > > > On Nov 21, 6:29 pm, gwo...(a)figipc78.tu-graz.ac.at (GJ Woeginger) > > > wrote: > > > > In comp.theory Andrew Tomazos <and...(a)tomazos.com> wrote: > > > > > I was posed the following question in a technical > > > > > interview for a Software Engineering position by a > > > > > major multinational NASDAQ company: > > > > > [Paraphrasing] "You are given an array of 1,000,000 > > > > > 32-bit integers. One int value x occurs 500,001 times > > > > > or more in the array. Specify an algorithm to > > > > > determine x." > > > > > The assumption being extra points for efficiency. > > > > There is an old analysis of this problem by Fischer and Salzberg. > > > > M.J. Fisher and S.L. Salzberg (1982) > > > > Finding a majority among n votes. > > > > Journal of Algorithms 3, pp 143-152. > > > > If 2k elements contain a majority element (= an element > > > > that occurs at least k+1 times), then you can find it > > > > with 3k-2 element comparisons (and some small overhead). > > > > The basic idea in their algorithm is that whenever you > > > > find two distinct elements, then you can destroy both > > > > without changing the majority element among the > > > > remaining elements. This yields: > > > > Run once through the array, and maintain a > > > > majority-candidate and a counter. The > > > > majority-candidate is initialized as the first element, > > > > and the counter (counting how often you have seen the > > > > candidate) is initialized at 1. > > > > If you hit the current candidate again, increment the counter. > > > > If you hit another element, decrement the counter. > > > > If the counter becomes 0, drop the current candidate and start from > > > > scratch with the remaining part of the array. > > > > Fischer and Salzberg also show that this bound 3k-2 is > > > > best possible in the worst case (and that's the main > > > > part of their paper). > > > If I understand your description than it would look like: > > > int findmode(int* p, int n) > > > { > > > int x = p[0]; > > > int c = 1; > > > > for (int i = 1; i < n; i++) > > > { > > > if (c == 0) > > > { > > > x = p[i]; > > > c = 1; > > > } > > > else if (p[i] == x) > > > c++; > > > else > > > c--; > > > } > > > return x; > > > } > > > It seems that this could only produce at maximum (2k-1) > > > comparisions in the worst case, not (3k-2) as you claim? > > Oh wait... the (c==0) counts as a comparison. I see it now. > > Ignore me. > I don't think the (c==0) counts as a comparison. Of course it does. Why wouldn't it? > I think the other k-1 comparisons are for the 2nd > part of the algorithm, when you make one more pass > through the array verifying the candidate found in > the 1st part. [You already know the index i for > one occurrence of the candidate, so only the other > k-1 indices have to be checked.] What second part? There is no second pass. -- James Kanze |