Olly
[ RSS | ATOM 1.0 ]
Powered by PyBlosxom

Home

Xapian GSoC 2012 Projects

At the end of the previous episode, you may remember our gallant heroes had a pile of 30 proposals to review. We soon spotted one more to mark as invalid (just a paste with our ideas list plus a some biographical details), and another got withdrawn by the student without explanation (but was low quality anyway), so that left us with 28.

We had six volunteers for mentoring, and in the initial allocation we received five student slots from Google, but we asked nicely if we could have an extra one, and were lucky enough to get it. Last year we had four students, so that's a 50% increase.

Here's those 28, broken down by the project idea:

  • 8 - Weighting Schemes
  • 6 - Learning to Rank
  • 3 - Dynamic Snippets
  • 2 - Lucene Backend
  • 2 - QueryParser improvements
  • 1 - Erlang Bindings
  • 1 - Improve C# and Java bindings
  • 1 - Improve PHP Bindings
  • 1 - Improve Python Bindings
  • 1 - Improving Japanese Support
  • 1 - Node.js Bindings
  • 1 - Postlist encodings

I find it interesting that the most popular three ideas have closer connections to Information Retrieval theory than most - probably these appeal to students who have taken IR courses and already have an interest and some knowledge of the project area. I think we should aim to get more ideas like these on the list in future years.

It's worth noting that in several cases students had taken an idea in sufficiently different directions that there wasn't much overlap, so we didn't just pick the best proposal for each project idea to narrow things down. Also, the proposal isn't the only factor - we like to see applicants work on patch, and to interact with us on IRC and/or email. But in the end it happens we ended up with proposals which were all from different ideas - here are those we selected:

My congratulations to the lucky six, and my commiserations to those we weren't able to select. It wasn't an easy selection to make, and we truly appreciate the time you spent writing your proposal, working on patches, and on the rest of the application process. We'd encourage you to remain involved with Xapian, and to apply to us again next year if you're still eligible for GSoC.


Posted in xapian by Olly Betts on 2012-04-26 19:06 | Permalink

Xapian GSoC Applications for 2012

Student applications for GSoC closed a day or so ago, and we've done an initial pass through Xapian's applications, so I thought I should post another overview, similar to last year's.

We received a total of 41 applications this year (very close to last year's total of 42). Here's a graph of applications against time:

Graph of student applications to Xapian in GSoC 2012

If you're an admin or a mentor, you can produce a similar graph for your own org(s) - just download this OpenDocument spreadsheet and follow the instructions inside.

That total of 41 includes one duplicate and one application withdrawn by the student (we had one of each last year too). I've also gone through and marked nine spam proposals as invalid (similar to the seven we had last year). Spam proposals are things like proposals with no connection at all to Xapian, and proposals which are just a title and/or paste from our ideas list with a generic biography.

So that leaves us with 30 proposals (compared to 33 last year). It's hard to really measure, but my feeling is that the average quality is higher than last year (and it was already pretty impressive last year).


Posted in xapian by Olly Betts on 2012-04-09 01:04 | Permalink

Xapian 1.3 Branched

(Actually, we branched six weeks ago, but I've not got around to writing about it until now.)

The development branch approach we used for 1.1.x development releases leading to a stable 1.2.0 release seemed to work pretty well, so we're adopting that again.

The main problem last time was that it took a long time to actually stabilise 1.1.x because we kept slipping more changes in. For 1.3.x, we need to be more disciplined and changes should be developed on a branch and not merged prematurely. We now have solid git mirroring, so developing on a branch is a more pleasant experience than before. We also need to be brutal sooner. It's better for everyone to (say) achieve two releases series in two years than have one release series take two years.

When I was in the UK back in May, Richard and I sat down and hashed out a list of goals for a 1.4 release series. This is what we came up with (the order is just how they came to mind, so isn't really significant):


read more…

Posted in xapian by Olly Betts on 2011-07-22 16:29 | Permalink

Xapian GSoC Applications for 2011

Student applications for GSoC closed a few hours ago. This is Xapian's first year as a mentoring organisation (though I've been involved in previous years with SWIG and Debian) and we've been blown away by the response from students.

If you'd have asked me when we'd got accepted, I'd have guessed we might get 20 applications and feel we'd done well, but counting up now we have 42. Ignoring two which were withdrawn (one duplicate, one a spam which surprisingly got withdrawn when I politely suggested such applications weren't useful), here is a graph of applications against time:

Graph of student applications to Xapian in GSoC 2011

If you're an admin or a mentor, you can produce a similar graph for your own org(s) - just download this OpenDocument spreadsheet and follow the instructions inside.

Now the task of selection starts in earnest. I've gone through and marked the seven spam proposals as ineligible (that's one line proposals, proposals with no connection at all to Xapian, and proposals which are just a title and/or paste from our ideas list with a generic biography).

That leaves 33, but not all are really in the running, before our student applicants start to despair! I don't have a good picture yet, but it looks like there are something like 10-15 we'll be seriously considering.


Posted in xapian by Olly Betts on 2011-04-09 14:36 | Permalink

Numeric Term ID Implementation

I've implemented much of the numeric term id support now. Currently only appending documents is fully supported, and I haven't changed the position table keys yet.

I made some minor additional changes too while I was working on the code.


read more…

Posted in xapian by Olly Betts on 2010-01-25 21:39 | Permalink

Numeric Term IDs

Nearly a decade ago, Open Muscat (the project Xapian has evolved out of) used integer term ids to represent terms internally. This turned out to be awkward to deal with when running searches over several databases together since any term will generally have a different term id in each database. It's especially problematic when generating relevance feedback terms since we can't run through the lists of terms for each document in the same order without sorting them.

So in late 2000, we changed to representing terms internally as strings.

As part of the work GMX are sponsoring on reducing database size, I've been revisiting this decision.


read more…

Posted in xapian by Olly Betts on 2010-01-14 15:55 | Permalink


Home