« First Post! | Home | The Spelling Table »
The first step to reducing the database size is to look at what uses the space in databases at the moment.
I maintain the Xapian-based search for the gmane mailing list archive, so I've used that as my initial case-study. It's a fairly large database (about 71 million documents taking 346GB).
I've bundled up the scripts and code I used in case you want to analyse your own database. If you do, I'd love to see the results, especially if you have a large database and they differ from mine. There's a README file in the tarball which explains how to use them.
Note that this script will take several hours to process a large table. I did bear efficiency in mind while writing it, but only to a certain extent.
Posted in xapian by Olly Betts on 2009-12-14 13:22 | Permalink