Olly
[ RSS | ATOM 1.0 ]
Powered by PyBlosxom

« GSoC Proposal Status | Home | RC Bugs Fixed for October 2011 »

Xapian 1.3 Branched

(Actually, we branched six weeks ago, but I've not got around to writing about it until now.)

The development branch approach we used for 1.1.x development releases leading to a stable 1.2.0 release seemed to work pretty well, so we're adopting that again.

The main problem last time was that it took a long time to actually stabilise 1.1.x because we kept slipping more changes in. For 1.3.x, we need to be more disciplined and changes should be developed on a branch and not merged prematurely. We now have solid git mirroring, so developing on a branch is a more pleasant experience than before. We also need to be brutal sooner. It's better for everyone to (say) achieve two releases series in two years than have one release series take two years.

When I was in the UK back in May, Richard and I sat down and hashed out a list of goals for a 1.4 release series. This is what we came up with (the order is just how they came to mind, so isn't really significant):

  • New stable backend - it's probably too ambitious to finish brass in time, but we can combine the parts of brass which are already working well with the current chert backend to achieve some decent improvements in a sane timescale. The plan is to call this agate, and from now on to name the successive backends alphabetically.
  • Clean up the remote backend protocol. Within a release series we avoid incompatible changes to the remote backend protocol, so changes need to be additive only, or at least support kept in the server for the old client messages. With a new release series, we can clean this up. Now done
  • When an exception is propagated across the remote connection, Xapian passes its name. Instead we should pass a number, which can then be turned back into an exception with a simple switch statement, and also results in a smaller message (ticket #471). Now done
  • It would be good to try to sort out some built-in support for multi-value facets (ticket #199).
  • Deprecate the hand-coded XS Perl bindings and promote the newer SWIG-generated Perl bindings in their place. The SWIG bindings are up-to-date with with C++ API and will be easier to keep that way. There are some differences between the wrapped APIs which we are tracking in ticket #523. Ideally we should fix these, but ultimately we think it's better for users to have to make some updates in exchange for having up-to-date bindings in the future. But please report any further incompatibilities you find in that ticket. Partly done
  • We should deprecate the hand-coded JNI Java bindings in favour of the newer SWIG-generated Java bindings. The main blocker for this is sorting out how to get the latter into a namespace. There are some differences in the wrapped APIs, but the hand-coded bindings are lagging a lot at this point, so having up-to-date bindings is definitely worth the pain of transition to user. Now done
  • We should upgrade to using a newer version of SWIG to generate the bindings, which includes various fixes including one for C#. Now done
  • We should get subclassing in PHP working. Now done
  • We should resolve the remaining issues with Python 3 support (ticket #346).
  • Include any newer Snowball stemming algorithms. Now done
  • Upgrade the Unicode tables to the latest Unicode version (Unicode 6.0.0) (ticket #497). Now done
  • Make sure the documentation is up to date, and improve it where it is weakest. Partly done
  • Use the C version of the English stemmer, which is faster than the Snowball version. The C version currently needs fixing to work with UTF-8.
  • Sort out end iterator proxies, so that it != db.termlist_end("foo") compiles to a simple NULL pointer comparison (in 1.2.x, this is done for ValueIterator, but not the others).
  • We should try to merge at least some of the GSoC projects. Partly done (Lua is merging regularly)
  • Merge the geospatial branch - see ticket #481.

There's no guarantee that everything above will actually get done, but also this isn't meant to be an exhaustive list of features - just those that we thought of that afternoon. If you have a patch, it's certainly up for consideration. For example, we're working on, or have already made, the following changes which weren't in our earlier list:

  • Database::reopen() now returns a boolean value to indicate if the database may have been reopened (ticket #548). Now done
  • Reimplement the internals of Xapian::Query to be saner, and also smaller and faster (ticket #280). Partly done
  • The interface files used by the SWIG bindings have been largely rewritten and we now direct SWIG to parse the C++ API headers in all cases except for dbfactory.h, which should simplify maintenance in the future.

Posted in xapian by Olly Betts on 2011-07-22 16:29 | Permalink


« GSoC Proposal Status | Home | RC Bugs Fixed for October 2011 »