Tuesday, July 24, 2007

ICFP: How We Reached the Top 15

The United Coding Team once again tackled the ICFP Contest this year. This time we managed to hit the top 15. How did we manage that? Well, read on I say!

The team this year consisted of the following members, all students at the University of Cape Town, South Africa:

and two team managers:
  • Hayley McIntosh
  • Ian Saunder

The problem of this year's ICFP Contest started off with us receiving Endo's DNA, which was 7MB. The DNA was interpreted as a sequence of patterns, which searched for some DNA, and templates, which described how to manipulate the DNA at that point. Some of the DNA caused RNA to spew out. The RNA was then interpreted as drawing commands to draw an image. Endo's original DNA sequence created this image:


Our task was to reverse engineer the DNA and poke away by coming up with a prefix to prepend to Endo's DNA to morph him to stand a better chance of survival on Earth, where he has crashed. We were given a target image to aim for (below) and we were scored by the number of pixels we got correct and the length of our prefix, with a shorter prefix being better.


Being in South Africa, we fitted nicely into the same timezone as the organisers so we started at noon on Friday. We took about an hour to read through the problem statement to decipher the meaning of the DNA. We then split up the tasks of writing a DNA->RNA and RNA->image converter.

Carl, Alex, Harry and I worked on the DNA converter while Bruce and James did the RNA->Image converter, both in C++. The RNA->image converter didn't take too long, but the DNA->RNA converter had efficiency issues. We got a working version by about 17:00, but it was horrendously slow compared for what we needed. Max joined us after work and quickly gave us the idea of ropes. None of us had used them before though so we had to read up on them to decide if they would be useful. We went for them in the end, but they caused us major headaches. The rope substr function defaults to taking only one character from the string instead of the usual running to the end.

After working on all these annoying bugs we finally got it working some time around 20:00 or so. We then used the prefix given to us in the problem text which ran some self checks. Unfortunately one of them failed and it was back to debugging. The bug appeared to be related to ropes, since our strings version passed all the self checks. It was also odd that it worked with compiler optimisations, but not without them enabled. While Carl worked on debugging, Bruce worked on some OpenGL code to display the image as it was being generated so we could see any hidden messages that were covered by later layers.

With the OpenGL code we discovered a hidden prefix which when run gave us a field repair guide which led us to a catalogue page and a prefix to rise the sun. Julien and Richard spent some time trying to figure out how to use the catalogue prefix to view other pages, but only ended up at invalid pages. When Bruce had a moment of free time we gave it to him and he solved it very quickly. We replaced the number in the prefix with the page number we needed. Page 1337 gave us the catalogue index page. This opened up a whole bunch of new pages which both helped us understand the DNA and gave us new problems to solve.

We did a brute force search on the page number to see if we could find any undocumented pages. Of note were pages 100 and 1024. We searched up to about 20000 pages, but didn't find much else. The stegonography hint gave us the number 9546 in the ET page, although we never figured out a use for it. The Virus Alert page we noticed was Wingdings, but we didn't find a use for the message. The Intergalactic Character Set pointed us to ebcdic. The Undocumented RNA helped us print a stack trace of function calls.

One of the pages extracted was a Gene Table. It said "Page 1 of 14" on the top. We got the remaining 13 pages by setting the value AAA_geneTablePageNr to the one we wanted. We also later discovered from the ImpDocs that printGeneTable took a boolean parameter that when set to true ran integrity checks. This proved valuable later on when we got to fixing functions in the various ways. We got some of our team, including team managers, to transcribe the gene table. From this we wrote scripts to call a function in a human readable format. This was the start of our scripting language that generated a prefix.

Some of the function calls yielded further help pages and more problems to solve. We noticed that the contest-xxxx pages had some yellow letters (ICFP's). When combined these gave us another prefix. The Encodings page helped us search for strings and polygons in the DNA. Some of these strings were helpful, while most of them we had already seen. The Fuun Security Features page was easily noticed to be encrypted with rot13.

We extracted the hitMeWithAClueStick function from the DNA. Removing the C's and trying different widths gave us the message "PORTABLE NETWORK GRAPHICS FOLLOWS". We then noticed the data was split into three sections by P's. The second chunk gave us a PNG image, which told us that an audible voice followed and we then interpreted the remaining segment as an MP3, which read out yet another prefix. This gave us the Beautiful Numbers page, which didn't get us anything extra since we had already found page 496 by brute force.

All the above was about sub-problems in the contest. However, none of this helps morph poor Endo into a cow to save him. At least, not directly. Most of the sub-problems yielded clues which in turn helped us find the right DNA to poke into Endo's DNA, call functions and do other interesting things to it. These all helped us morph the source image into the target image. Since we've been asked by the organisers not to reveal our score I think this is as far as I should go.

Hopefully what I have said should be sufficient for those that did not get very far to poke around and at least get some satisfaction from the problem. I can imagine that many of you spent the entire weekend and got nowhere.

Has Endo been saved? More importantly, have we saved Endo? We have to wait until the International Conference on Funtional Programming in October where the results will be announced.

Post mortems from other teams:
Other writeups I haven't been able to link to a team:

11 comments:

  1. Just by curiosity, how many iterations per second your DNA->RNA converter was able to do?

    ReplyDelete
  2. I get about 120,000 iterations per second on a 2GHz Core 2 Duo. Endo's original DNA takes 16 seconds to process.

    ReplyDelete
  3. Wow, that is blazing fast. I tried to write the DNA->RNA converter just for the fun of it, but mine is a slow turtle compared to yours (ie, ~600 iterations per second). I can only blame the naive (but simple) approach I used to represent the DNA sequence -- a memory-mapped file with a lot of caching. Other than ropes, what kind of data-structure have you used to store the DNA? I wonder how much I could speed up my implementation just by changing the data-structure I used.

    ReplyDelete
  4. The data structure used to store the DNA is the crucial factor wrt efficiency. We used on rope to store the whole DNA and nothing else. It gives a massive speedup - using strings instead it is unusable on DNA as long as Endo's.

    We did other small optimisations as a result of profiling and we converted all the recursive functions into iterative ones. However, nothing got us anywhere near as much of a speedup as changing the data structure.

    ReplyDelete
  5. So, you wrote all your code using simple string operations, then did a quick s/string/crope/g? Interesting. Perhaps C++ isn't a such awful language after all...

    I will try to implement my own ad-hoc rope implementation to see if my current language of choice, Python, could benefit from it. The ICFP contest certainly shown me some weak points in Python, which I would like to address in the near future.

    Thanks again for sharing your experience of the contest.

    ReplyDelete
  6. Yes, that's almost all we did. There were a couple other snags (such as with substr as mentioned in my post), but strings and ropes are almost fully compatible. I wouldn't say that makes C++ not an awful language. I still prefer Python for many things.

    One of my team mates was working on a Python implementation (after the contest of course). I'll see how far he got. Someone else said they got down to 12 minutes. One of our national competitions which I help organise has allowed Python for the past three years. It's "our USACO", so runtime plays a major role. From writing solutions in C++, Java and Python I can tell you Python is often between 10 and 100 times slower than C++ and Java.

    It's a pleasure sharing my experiences. After the results are announced in October I'll be sharing the interesting bits on how we morphed Endo. :)

    ReplyDelete
  7. To reach the top 5 you wil need some boost like these popular Generic Viagra pills

    ReplyDelete