Analysis of Xapian GSoC 2014 Applications

As I said in my earlier post, we received 31 proposals from students (ignoring 2 duplicates withdrawn by students). On closer inspection, we spotted another duplicate, so discounting that, here is how the remaining 30 proposals break down by project idea:

  • 10 - Clustering of search results
  • 5 - Learning to Rank
  • 5 - Weighting Schemes
  • 2 - Postlist encodings
  • 2 - Improve Java bindings (one with PHP bindings too)
  • 1 - Gmane search improvements
  • 1 - Testsuite Improvements
  • 1 - Performance/Relevance testing and optimization of DFR
  • 1 - Social Media Product Analyzer
  • 1 - Web application for fast image search
  • 1 - Improving Arabic Support + Python Binding Improvements

In the above list, italics indicate ideas or parts of ideas which were suggested by the student, rather than coming from our ideas list.

As in 2012, the most popular ideas from our suggested ideas list are those with the closest connections to Information Retrieval theory. I think the clustering idea also seems very accessible, which is why it's been so popular (it was only added to the list shortly before student applications opened, as we'd already seen signs that "Learning to Rank" and "Weighting Schemes" were likely to be very popular).

There's also a wider spread in quality for the clustering proposals (perhaps also due to the accessibility of that idea) so don't despair if you're a student who applied for a clustering.

And generally, if we have more than one great proposal based on the same project idea, we may accept more than one of them - we don't want to duplicate effort, but it's often possible to adjust the scopes to produce projects which don't overlap.

Posted in xapian by Olly Betts on 2014-03-27 16:00

