From: Gerbert Myburgh on
Congrats Hannes.

After seeing you being tweeked out in the last minutes before, I'm very glad you made it to the top this time.

(Whoops, accidently posted as new message thread)

Thank you very much to the contest team for a great contest. Like always it was fun to compete.

Just a comment.
It should be fairly obvious from the statistics page (number of participants per day) that there are in fact many many more people interested in the algorithmic solving than the daylight tweaking.
Yet darkness is 1 day, twilight is 1 day, and Daylight stretches out for a week.

Not only that, Both Darkness and Twilight falls in the middle of the week, when most of us have to work.

I don't suggest you change your unique competition format entirely, but I do think we will see a stronger start to Dayilght if the algortihm guys can have a weekend day to work on the problem. ahead of everyone else (particularly since far too often last minute tweaking is what wins). Care to share any details of what you did?
>
> I don't mind at all. But it'll take me a while. So expect a post in the next few hours, but in the meantime I'd imagine you can have a lot of fun seeing whos the first to figure it out ;-). I'm afraid there were no fundamental solver changes. Last minute tweaking did win, just not random tweaking.
>
> My overfitting series of entries were designed to look like random variation, with the only parameter changing between them being a 20-bit binary key stored in a lookup table. One of them got a very good result and people are quick to accept that I got a lucky key. In fact there are only 8 keys in the entire keyspace that would have defeated Sergey's top entry. That's a 1 in 131 071 chance.
>
> A clue that something was afoot can be found in the comment on the first line of the code which contains the result that it eventually achieved.
>
> A last few clues for anyone taking up the challenge to reverse engineer: Overfitting was based on information collected using the "Random Randomness" probe series which was in turn based on "Random Change" by Magnus. Sorry for not crediting correctly. Heat of battle and all that.
>
> Cheers
> H
From: Oliver Woodford on
"Hannes Naudé" wrote:
> The reason why our simple block based solvers appear to outperform the cutting edge stuff from academia is the fact that the problem as stated here does not actually correspond exactly to the compressive sampling problem typically studied in the real world. In that case, one is not able to adjust your query pattern dynamically based on the results from earlier queries. Rather, the entire set of query masks need to be specified at the outset.

Firstly, congrats on winning the grand prize, Hannes. Secondly, I disagree with this analysis of why "block based solvers" or "fully constrained solvers" (that's my term) performed best in this competition.

I spent about a day skim reading the literature (hence why I wasn't competitive during darkness). On the point of published compressive sensing algorithms providing the next query based on the current data, "Bayesian Compressive Sensing" (Ji, Xue, Carin, 2008) does exactly this. There are almost certainly more examples too.

During the course of the competition I implemented random, DCT and overlapping block queries (what I'll refer to as "under constrained solvers"). The problem with this type of approach, as I've said before, is that the value of each pixel, or rather each region (regions being the intersection of all the queries), is not fully constrained. This contrasts with the "fully constrained solvers", which give n regions from n queries, allowing the average value of each region to be computed in closed form. Because of the ambiguity in the "under constrained solvers" you need to add extra information to solve the problem, in the form of prior information on the nature of images. My PhD happened to be on the form of such priors and the techniques required to optimize them. Images are generally smooth, but the distribution of gradients is "heavy tailed", meaning there are more large gradient
discontinuities than a smooth (Gaussian) prior would predict. This makes the priors non-convex and difficult to optimize efficiently - certainly not possible in 180s. I tried to use a naive smooth prior in a very ad hoc way in my Super Slim entry, but you can see from Ned's mid-contest analysis that it over-smoothed the results. I also tried this approach on the random and DCT queries, but the smoothing generated worse results for both.

In short, "under constrained solvers" require prior information. Fast priors over-smooth and produce poor results. Good priors are slow to optimize, hence not feasible. Some of the test-suite images were distinctly unnatural anyway, also biasing against "under-constrained solvers" using natural image priors. This is my 2 cents on why the "fully constrained solvers" prevailed.

Oliver
From: Hannes Naudé on
Alan : Care to share any details of what you did?

Okay, here goes. If you are currently reverse engineering my probing entries, this is a SPOILER ALERT. Stop reading now.

Parameter tweakers typically operate by trial and error. That is, a parameter is adjusted and if the score improves the adjustment is kept, if the score gets worse, the entry is consigned to the nether regions of the scoreboard and forgotten. I wondered whether these "failed" attempts at tweaking could not somehow inform future tweaking endeavours. In most cases the answer appears to be no, since knowing that changing x from 0.37 to 0.38 worsens the score does not in general imply anything about the effect changing it to 0.36 (or any other value) will have, since score changes near the optimal settings of parameters are typically not linear (or even continuous) in any of the parameters.

Also, the parameters are not usually independent, that is, if keeping x constant and increasing y yields a score improvement and keeping y constant and increasing x yields a score improvement, it does not follow that increasing both x and y will yield the combined improvement.

However, the popularity of lines like:
if (e>=8 && e<=25) || (e>=100) || (e>=36 && e<=46) || (e>=60 && e<=66)
yields an important clue. This line is essentially a manually coded hash table with 0 or 1 entries and the parameter e functioning as the hashkey for indexing into the table. In this case the entries determine which solver will be run for a specific problem. The core observation here is that the implicit ones and zeros in this hash table are INDEPENT parameters affecting the result.

So, if adding an (e==7) clause above improves the result by 10 points and adding an (e==9) clause improves the result by 7.5 points , then adding both is guaranteed to improve the result by 17.5 points.

Now note that we can transform any parameter in the code (in my case i chose blockSize) to a set of independent parameters with code similar to the following:

rand('seed',SomeParameterThatIdentifiesTheProblem);
lookup=[ 0.5 1;1 0];
i=find(rand(1)<lookup(1,:),1,'first');
blockSize = ExistingFormula+lookup(2,i)*DELTA;

The pseudocode above implements a hashtable with 2 equally spaced buckets (In practice I used 20). The hashkey is a random value, which provides the benefit of being uniformly distributed over a known range. It should be simple to see now that one could probe out the effect of each individual bucket and then, at a time of your choosing activate all beneficial buckets at the same time to get the aggregate score improvement.

This is the basic concept, allthough I did not follow this route exactly for social engineering reasons. Firstly, if I probed out the leader (at that time) in this way, the first beneficial bucket that I hit would have placed me in top spot and attracted undue attention to what I was doing. Secondly, once someone cottoned on, it would be very simple for them to arrive at the same beneficial set of buckets that I had deduced by simply observing the scores of my entries.

For these reason I deduced the impact of each of my 20 buckets by submitting a set of 20 solvers each with a different random selection of buckets enabled. The individual bucket contributions to each score change can then be solved for by pre-multiplying the vector of score differentials by the inverse of the sampling matrix (where the sampling matrix is a 20x20 matrix with rows corresponding to the random 20 bit keys used in each of the solvers).

This meant that anyone trying to piggyback would have to first reconstruct my sampling matrix by opening each of my 20 entries and copy-pasting appropriately.
I was quite chuffed with this aspect until I shutdown thoughtlessly, losing my precious sampling matrix and having to recover it in the way just described. #@!%$. You can't get pills for stupid.

As a sidenote, since runtime is additive, just like result is, it is possible to predict the runtime (and therefore the final score) in the same way. However, the problem of finding the entry with the best score (given all bucket contributions to both result and runtime) is not a nice linear one like the problem of finding best result. Luckily Matlab can still handle this by brute-force for a 20 bit key but it might get messy at 30 bits. As it happens, this was irrelevant, since changing the blocksize has minimal impact on runtime and because my runtime estimates were noisy in any case since large values in the inverse of my sampling matrix amplified random timing variations in my probe set.

I initially came up with this scheme a few years ago after a contest (not sure which one, might have been blackbox). I have not used it until now, partly due to limited opportunity (would not have worked in Army Ants, I was unable to participate in Color Bridge) and partly due to concern for the impact widespread use of this and related strategies could have on the game. This will exacerbate the overfitting problem significantly. Twenty probes gave me the ability to predict exactly the results that would have been achieved by over a million hypothetical submissions. 30 probes would push that to over a billion. A person with an auto-submitter and the inclination to use it in this way could stall essentially all further algorithmic progress within hours of the start of daylight. Banning the approach, by any means other than manual inspection is not viable, since it does not depend on any
single function that could be banned. Having a bot win a mid-contest prize provided the push I needed to overcome my hesitation. After all, if bots are beating us, then how much more mindless can it get (no disrespect intended to Alan&#8217;s bot, I&#8217;m sure I&#8217;d like him if I got to know him ;-) ).

For this reason, I think we should seriously discuss options for promoting better generality. Simply running all entries through a validation set afterwards will not work, since the usefulness of a validation set stems from the fact that it is seen far fewer times than the test set. In this context the validation set will see just as many entries as the test set did and the validation winner is therefore also likely to be an overfitted solution. For validation to work one must restrict the number of solvers to be run against it. Just running the top X entries against the validation suite doesn't work either, because the truly general solutions end up WAAAY down in the rankings. One way would be if each contestant is allowed to nominate one or two entries to be evaluated against the validation set, perhaps once-off or perhaps daily. Unfortunately this might require non-trivial changes to
the interface. This also requires that sock puppet accounts be banned, not just frowned upon.

The only foolproof way I can think of in the current framework is to restrict the problems so that it is impossible to split the test suite into independent pieces. Examples of such problems are the Ants and Army Ants contests. Even though these were my favourite contests of all time, I consider this quite restrictive. Another way would be to have a king of the hill style contest where competitors' codes are pitted directly against one another ala Army Ants. A third option is to have daily test suite switches. These have the downside of introducing discontinuities in score, which messes with the stats page. That may or may not be an acceptable price to pay.
From: Jan on
Just some more thoughts/feedback:

1. Generalization: why not using a different data set each day (or couple of days) and then in the end letting all algorithms run against all data sets and take the average score? So e.g. if there are three data sets and an algorithm is learning against one data sets it surely goes away from the other two and since they are not available at the same time one can never learn for them all. Should drive the contest more towards generality of concepts.

2. Submission restriction: instead of 10 per 10 minutes, maybe having an increasing limit in time like : 2 per minute, 4 per 10 minute, 8 per hour so the queue is loaded more evenly and not flooded by the tweaking gurus. One gets results faster and well, any new addition to an algorithm that does not consists of pure tweaking will certainly take 5 minutes.

3. Categorization: Since the leading 1000 or 2000 submissions are all the same algorithm with small changes its really difficult to get an overview of what really different approaches there are. Here in the newsgroup people use categories in their texts (block solver, ...) - so its only natural to have them. I would suggest, that participants categorize other submissions (although this could be difficult e.g. for hybrid algorithms) and maybe get a reward for it like a place in a priority queue for each categorization of another submission. Then have winners in each categorization. Actually what I want is some kind of clustering of algorithm not for similar performance but for similar usage of algorithms, maybe some kind of genealogical grouped tree.

Example here: categories: random blocks, equal blocks, blocks + importance sampling, blocks + importants sampling + problem overfitting, ...

Then I would suggest that the submission table is initially displayed grouped for categories. So spectators have a better overview.

4. Maybe it would make the contest more popular if there would be a small prize like the winner gets a free science book from amazon (which also has its problems since amazon doesn't ship everywhere in the world). Actually just taking part for the fun of it is absolutely sufficient, but the popularity and visibility maybe goes up.

Regards,
Jan
From: Hannes Naudé on
Oliver: I spent about a day skim reading the literature (hence why I wasn't competitive during darkness). On the point of published compressive sensing algorithms providing the next query based on the current data, "Bayesian Compressive Sensing" (Ji, Xue, Carin, 2008) does exactly this. There are almost certainly more examples too.

Interesting, I see they refer to this alternate construction as Adaptive Compressive sensing. I had not come across this before. I'd be very curious to see what the best is that can be achieved if the time consideration is neglected. If anyone investigates this, please publish your results on the file exchange.

Hannes