Olly
[ RSS | ATOM 1.0 ]
Powered by PyBlosxom

Home

Xapian GSoC 2014 Projects

Accepted GSoC students were announced on 21st April, but I was away on holiday last week, and have only just had a chance to write up a blog post about this.

We received 30 student proposals for Xapian this year, and Google allocated us six slots (the same as we had in 2012).

We had four particularly strong proposals for the "Learning to Rank" project idea, so we decided to create a second project adding more algorithms, to complement the project sketched out in our ideas list.

Congratulations to the chosen six:

Sorry to those we weren't able to select this year - we had to make some difficult decisions during the selection process, and we really appreciate the time you spent writing your proposal, working on patches, and on the rest of the application process. We'd encourage you to remain involved with Xapian, and to apply to us again next year if you're still eligible for GSoC.

If any applicants would like some more specific feedback on their applications please just come and ask us.


Posted in xapian by Olly Betts on 2014-04-30 15:10 | Permalink

Analysis of Xapian GSoC 2014 Applications

As I said in my earlier post, we received 31 proposals from students (ignoring 2 duplicates withdrawn by students). On closer inspection, we spotted another duplicate, so discounting that, here is how the remaining 30 proposals break down by project idea:

  • 10 - Clustering of search results
  • 5 - Learning to Rank
  • 5 - Weighting Schemes
  • 2 - Postlist encodings
  • 2 - Improve Java bindings (one with PHP bindings too)
  • 1 - Gmane search improvements
  • 1 - Testsuite Improvements
  • 1 - Performance/Relevance testing and optimization of DFR
  • 1 - Social Media Product Analyzer
  • 1 - Web application for fast image search
  • 1 - Improving Arabic Support + Python Binding Improvements

In the above list, italics indicate ideas or parts of ideas which were suggested by the student, rather than coming from our ideas list.

As in 2012, the most popular ideas from our suggested ideas list are those with the closest connections to Information Retrieval theory. I think the clustering idea also seems very accessible, which is why it's been so popular (it was only added to the list shortly before student applications opened, as we'd already seen signs that "Learning to Rank" and "Weighting Schemes" were likely to be very popular).

There's also a wider spread in quality for the clustering proposals (perhaps also due to the accessibility of that idea) so don't despair if you're a student who applied for a clustering.

And generally, if we have more than one great proposal based on the same project idea, we may accept more than one of them - we don't want to duplicate effort, but it's often possible to adjust the scopes to produce projects which don't overlap.


Posted in xapian by Olly Betts on 2014-03-27 16:00 | Permalink

Xapian GSoC Applications for 2014

Student applications for GSoC closed a few hours ago, and here are some initial stats on the proposals we received for Xapian (for comparison, see my blog posts for 2011 and 2012).

We received a total of 31 applications this year - here's a graph of total applications received against time:

Graph of student applications to Xapian in GSoC 2014

If you're an admin or a mentor, you can produce a similar graph for your own org(s) - just download this OpenDocument spreadsheet and follow the instructions inside.

Of the 31, 18 were submitted in the last 12 hours, with the latest submission a rather brave 99 seconds before the deadline.

The total number is lower than the 42 and 41 we received in previous years, but in a quick skim through I didn't see anything we'd immediately discount as a spam proposal and mark as invalid. So that 31 is more comparable with the numbers after removing spam from previous years (which were 33 and 30).

I suspect the improved quality and the even more marked spike as the deadline nears may be due to the new requirement that students upload proof that they are enrolled before they can submit a proposal.


Posted in xapian by Olly Betts on 2014-03-22 12:35 | Permalink

Xapian GSoC 2012 Projects

At the end of the previous episode, you may remember our gallant heroes had a pile of 30 proposals to review. We soon spotted one more to mark as invalid (just a paste with our ideas list plus a some biographical details), and another got withdrawn by the student without explanation (but was low quality anyway), so that left us with 28.

We had six volunteers for mentoring, and in the initial allocation we received five student slots from Google, but we asked nicely if we could have an extra one, and were lucky enough to get it. Last year we had four students, so that's a 50% increase.

Here's those 28, broken down by the project idea:

  • 8 - Weighting Schemes
  • 6 - Learning to Rank
  • 3 - Dynamic Snippets
  • 2 - Lucene Backend
  • 2 - QueryParser improvements
  • 1 - Erlang Bindings
  • 1 - Improve C# and Java bindings
  • 1 - Improve PHP Bindings
  • 1 - Improve Python Bindings
  • 1 - Improving Japanese Support
  • 1 - Node.js Bindings
  • 1 - Postlist encodings

I find it interesting that the most popular three ideas have closer connections to Information Retrieval theory than most - probably these appeal to students who have taken IR courses and already have an interest and some knowledge of the project area. I think we should aim to get more ideas like these on the list in future years.

It's worth noting that in several cases students had taken an idea in sufficiently different directions that there wasn't much overlap, so we didn't just pick the best proposal for each project idea to narrow things down. Also, the proposal isn't the only factor - we like to see applicants work on patch, and to interact with us on IRC and/or email. But in the end it happens we ended up with proposals which were all from different ideas - here are those we selected:

My congratulations to the lucky six, and my commiserations to those we weren't able to select. It wasn't an easy selection to make, and we truly appreciate the time you spent writing your proposal, working on patches, and on the rest of the application process. We'd encourage you to remain involved with Xapian, and to apply to us again next year if you're still eligible for GSoC.


Posted in xapian by Olly Betts on 2012-04-26 19:06 | Permalink

Xapian GSoC Applications for 2012

Student applications for GSoC closed a day or so ago, and we've done an initial pass through Xapian's applications, so I thought I should post another overview, similar to last year's.

We received a total of 41 applications this year (very close to last year's total of 42). Here's a graph of applications against time:

Graph of student applications to Xapian in GSoC 2012

If you're an admin or a mentor, you can produce a similar graph for your own org(s) - just download this OpenDocument spreadsheet and follow the instructions inside.

That total of 41 includes one duplicate and one application withdrawn by the student (we had one of each last year too). I've also gone through and marked nine spam proposals as invalid (similar to the seven we had last year). Spam proposals are things like proposals with no connection at all to Xapian, and proposals which are just a title and/or paste from our ideas list with a generic biography.

So that leaves us with 30 proposals (compared to 33 last year). It's hard to really measure, but my feeling is that the average quality is higher than last year (and it was already pretty impressive last year).


Posted in xapian by Olly Betts on 2012-04-09 01:04 | Permalink

Xapian 1.3 Branched

(Actually, we branched six weeks ago, but I've not got around to writing about it until now.)

The development branch approach we used for 1.1.x development releases leading to a stable 1.2.0 release seemed to work pretty well, so we're adopting that again.

The main problem last time was that it took a long time to actually stabilise 1.1.x because we kept slipping more changes in. For 1.3.x, we need to be more disciplined and changes should be developed on a branch and not merged prematurely. We now have solid git mirroring, so developing on a branch is a more pleasant experience than before. We also need to be brutal sooner. It's better for everyone to (say) achieve two releases series in two years than have one release series take two years.

When I was in the UK back in May, Richard and I sat down and hashed out a list of goals for a 1.4 release series. This is what we came up with (the order is just how they came to mind, so isn't really significant):


read more…

Posted in xapian by Olly Betts on 2011-07-22 16:29 | Permalink


Home