Monday, May 31, 2010

Long Overdue Update

It's been a long 10 months since my last post, so an update is long overdue. So, what have I been up to all this time?

I've been actively growing our still relatively new UCT Algorithm Circle. We had our 2nd Python course with 75 kids at UCT, and about 15 each in Stellenbosch and Johannesburg. We've put together a solid funding proposal to Google, which if successful will allow us to teach 1,000 kids in Cape Town, Stellenbosch, Johannesburg and Durban as well as attempting to motivate and assist students in these regions to form their own courses.

I've obviously been working towards my MSc in Computer Science. The idea to finish within a year didn't quite work out. Turns out the area I'm working in is ridiculously competitive, so we've been getting harsh paper reviews (not to mention the two immediate rejections without review!). This has resulted in things taking longer than expected, and along with that I've lost a lot of motivation. I'm still touching up the final results and then need to churn through the thesis. I have about 75 pages already, but a lot of it needs to be reworked.

On Wednesday, I leave the country for four months. I'm starting a second internship at Google Zurich, working with my previous mentor this time on the Google Calendar backend. I have a vague idea of what I'll be working on there and I must say it excites me a lot! Once again it's going to involve some stats, which I really enjoy since it often means dealing with insane quantities of data.

Unfortunately I'll be missing the soccer, but hey what would you rather be doing? :P I'll make an effort to head across to Italy to watch some of their games with family there. My roommate so happens to be a Swiss rugby fan, which must be extremely rare. So I'll get to watch the Tri Nations with him.

While I'm up there, I'm going to attempt to finish off as much as my thesis as possible. It will be tricky not having physical meetings with my supervisor, but Skype will have to do. I'm also going to be meeting with my potential PhD supervisors while I'm up there. It's either PhD or work followed by PhD for me -- I haven't quite decided yet.

So that's a very brief update on what's been happening the past 10 months. I plan to be a bit more active here while at Google, to at least keep my friends updated with what I'm doing over there.

Sunday, August 2, 2009

Introduction to Programming Using Python

Just under a month ago I tossed the idea around of running a Saturday course to teach high school kids some programming. The idea was to take the new batch from the UCT Maths Circle and give them some exposure to programming and introduce the keen ones to our UCT Algorithm Circle. Back then I had no idea it would turn into this:


When I realised the number of new kids coming from the UCT Maths Circle might be too small, I decided to open it up for application from anyone with limited or no programming experience. All we asked for was a 100 word motivation of why they should be invited and an optional recommendation from a teacher to strengthen their application. I approached the Computer Olympiad office for a list of email addresses and postal addresses of the schools that had entered the Computer Olympiad before. We emailed about 90 schools and posted to about 60 schools with no idea of what to expect.

The advertising was sent through during the school holidays, so nothing really happened for some time. When schools started on Tuesday 28 July though, they started coming in slowly and some of the applications looked very promising. It was around this time that I started thinking of how many we could accept: I settled on 30 and asked 3 other students to help lecture/tutor. By the end of the first week of school though, applications had started flying in like mad. By the end of Monday we had received about 60 applications and they were still coming in. If we stuck at accepting 30 we would have had a shit time turning away some strong applications. Rob to the rescue as we got a fourth lecturer which allowed me to increase capacity to 40. More kept rolling in, including one application arriving on the morning of the course!

We eventually ended up inviting 46, with the expectation that at least 5 wouldn't pitch. We underestimated their enthusiasm: one fell ill, another had a last minute commitment; everyone else pitched! Even after the first day, only one kid fell ill and the rest all returned. Unfortunately a group of four girls had to leave early to catch a lift and another left as she was falling behind. Everyone else stuck it through the whole way. Not something I could possibly have expected! But it means we did something right, right? :)

We chose to run the whole course in the computer lab so that they could run short examples as we taught them things. It really worked well as we could immediately see if they were struggling on something and we never once lost them badly. We could get them to run bits of code to see for themselves what they did and then expand on how it worked.

Taking a step back a bit, on Thursday morning I got a bit of a scare. One of our lecturers had caught a bad dose of the flu, so I ran around trying to find a replacement. Fortunately some students lurking around in the Computer Science building at the time offered a hand and I split the load across three of them. I thank them all mightily for helping out at such short notice. They did a fantastic job, as did all the others that gave a helping hand. In the end it was myself and Michiel Baird doing all the admin work; the two of us, Ben Steenhuisen and Julian Kenwood doing the lecturing; then Jason Brownbridge, Bertus Labuschagne and his brother Phil and Kosie van der Merwe helped out with tutoring, answering the kids' questions; Brent Benade helped order pizza for everyone, a rather nasty job.

When the kids started arriving there was quite a lot of tension in the air, which was to be expected. We planned for this and tried to make sure that we had at least two from each school, so that there was a good chance each kid knew someone else. A number of our lecturers and tutors are great at throwing humour around, which helped ease the tension really well. By the end of the first day (3 hours) most of them had opened up and had no problem asking for help.

The topics we taught, in the order we taught them:

  1. What is programming?
  2. Using the Python interpreter
  3. Input and output
  4. Variables, operators and basic data types
  5. Boolean expressions and conditionals
  6. While loop
  7. Lists
  8. For loop
  9. Strings
  10. Writing functions
That was all covered in 9 hours. Bear in mind that these are grade 7, 8 and 9 kids (and a couple grade 10's) with mostly no prior programming experience at all! Now read the list above again. From what I can remember of my high school experience, what we taught them in 9 hours was equivalent to what is covered in the entire grade 10 and 11 IT syllabus minus files and gui's, and obviously all the theory crap.

They all did exceptionally well. Now it's time for them to write the test we gave them and we'll invite the best ones to attend weekly 90 minute classes to further teach them more on programming, especially focusing on improving their problem solving skills. What's great for them too is that they also have the opportunity of being invited to UCT Maths Circle, our partner in crime.

There is one thing we feel exceptionally guilty about. We received over 80 applications, but only had space for 46. Reading each and every one of those motivations of the kids we had to turn away makes me feel very sad. Therefore there is the possibility of us running another course like this in the near future. If you are interested in attending a future course, please contact me. We also run another class on data structures and algorithms for those who have a strong grasp of programming, are bored in class and want a challenge.

Sunday, February 1, 2009

UCT Algorithm Circle

After much grinding away, we had our first class of the UCT Algorithm Circle this past Thursday. We invited 32 of the most talented school kids we could find in the Cape Town area and invited them to some training. We're also slowly inviting kids outside of Cape Town to train online. We're teaching them from the very basics of programming right through to the advanced algorithms and data structures required for the IOI.

For the first class, we introduced the basics of Python. We were amazed at how quickly the kids caught on. After a 20 minute lecture and 60 minute practical session they were understanding operations, variables, stdio and more. The majority of these kids are in grades 9 and 10, and amazingly half are girls.

If things continue at the rate they're going now, this could provide a serious boost to our IOI results in upcoming years. We've always been welcome to the idea of training a wider audience, but finding the talented kids and getting them interested has always been a brick wall we couldn't knock down. This time though, collaboration with one of the people involved in the teaching kids for the IMO has seriously helped change all that.

To see the kids we have, just check out some of their introductions in this thread to see what they're capable of.

Prototype

After completing the meat of my background chapter in December, I spent the most part of January working on a prototype for my masters project. So now I get to start showing off all my pretty pictures. :)

First of all, I should mention that I am writing an extension for VMD, so I most certainly did not develop what you see below from the ground up. In an effort to simplify the process of porting my work to other molecular visualisation applications (e.g. PyMol), I decided to do all the core computation in an application-independent C++ module which communicates with an application-specific plugin via sockets. For VMD, this plugin is written in Tcl, which I have come to hate.

When you first launch VMD, you get a simple protein. Launch my extension and it churns away, calculating conservation scores (dummy values for now) and the solvent accessible surface of the protein. The protein is then coloured based on the conservation scores, which you can see below for a sample protein.


After visualising the conservation scores, the final application will visualise its prediction of binding sites. The user will then be given the option of doing further analysis of the binding sites. We're currently considering two forms of analysis: select a residue and predict what a binding site containining this residue would look like; and select some residues and predict what the binding sites would look like if we excluded these residues from the predicted binding sites. Below is a sample of user-selected residues (in red).


Then the user can choose to visualise the solvent accessible surface. We calculate this surface using marching tetrahedra to extract an isosurface and kd-trees to calculate the isovalues. The surface is coloured by the conservation scores, just like in the previous shots. Currently I don't have residue selection working in this mode, although I plan on doing so. The meat of my computation will be using the conservation scores and the solvent accessible surface to predict the binding sites.


Then finally, VMD is a very feature-full tool and least of which you can do is rotate the protein for a view of the entire protein as you can see below. There is much more you can do with it, but I'll leave interested readers to explore themselves.


Next week I'm off to the Afrigraph Conference in Pretoria, after which I have to attend this 6 week bioinformatics course in Stellenbosch. Lectures 09:00-18:00 every day for 6 weeks. Not sure how I'm going to last.

Thursday, December 11, 2008

Planning for PhD

After the painful experience of seeing my masters supervisor resign a few months ago, things have been turning out rather nicely. A huge advantage of working with my new supervisor is that she has good international collaboration. Last month I met the lead developer (from Illinois) of VMD, one of the visualisation programs I'll be writing my masters project for. This week it got even better!

This week, Dr Robert Best is visiting from Cambridge. It is looking increasingly more likely as time passes that he will be my PhD supervisor. Yes, that is correct...I am this close to getting the amazing opportunity of studying my PhD at Cambridge! The only hurdle at the moment is funding. I've been very slowly releasing news of this, as I first got word of the possibility around the time I started my new masters project a couple months ago. This is entirely thanks to my masters supervisor, Michelle Kuttel, who put me in contact with Dr Best.

Yesterday I met him for the first time, and we discussed potential research topics. The deadline for applications is due very soon, so for the moment we're focussing on a particular topic that looks very promising. Summarising very crudely, it's about taking advantage of both the speed of coarse-grained simulations and the accuracy of fine-grained simulations to produce fast, but accurate simulations of biomolecules. The main question here is how to swap between different representations of the system. Another possibility we have been looking into is furthering his research in reaction coordinates, which he summarises on his website.

By the way, this is the reason I am trying to finish my masters in a year. The year starts in October at Cambridge, and starting when their year begins makes things much simpler.

Thursday, November 13, 2008

Proposal Presentation

My proposal presentation went really well today. I was concerned that the biology jargon would get in the way of the meat of the project, but we got past that. Edwin is my second reader and he provided some very useful criticism. Main thing I'm glad about is he suggested we remove the user testing aspect, which was always my least favourite. He also picked up a few bits of jargon we need to describe better come the thesis, but nothing major. The last thing he picked on was a concern I raised with Alex (my co-supervisor) previously anyway, which means I'll be paying more attention to it.

So on from here we go. Next step is to continue with the background reading and starting of the background chapter. The plan is to wrap up with the bulk of the background reading by the end of this year and then dig deep into the project. I have a tight schedule as I plan to finish up by October next year. I've had very contradictory reactions to this, some saying it's quite doable, others laughing it off. We'll see.

Tuesday, November 11, 2008

Masters Milestone #1: Proposal

If official milestones were all that counted, I'd be about to exceed progress of the project I previously spent six months on within the first six weeks of my new project. My proposal is all written up, gone through the shredding of two wild supervisors and I'm ready to present it to the department this Wednesday. Woohoo!!

It's insane thinking how fast things can move when everything just works. From the way things ran in my last project to this now, wow how things can change over such a short period of time. I have to thank my supervisors Michelle and Alex, they've really provided great support.

So anyway, Wednesday 13:00 in CS303. My title is "Development and Validation of a Visualization Tool for Predicting Protein-Protein Interfaces". If you're interested in scientific applications of Computer Science, this is a good example. It also involves quite a bit of computer graphics.