Olly
[ RSS | ATOM 1.0 ]
Powered by PyBlosxom

« First Post! | Home | The Spelling Table »

Gmane Size Analysis

The first step to reducing the database size is to look at what uses the space in databases at the moment.

I maintain the Xapian-based search for the gmane mailing list archive, so I've used that as my initial case-study. It's a fairly large database (about 71 million documents taking 346GB).

I've bundled up the scripts and code I used in case you want to analyse your own database. If you do, I'd love to see the results, especially if you have a large database and they differ from mine. There's a README file in the tarball which explains how to use them.

Note that this script will take several hours to process a large table. I did bear efficiency in mind while writing it, but only to a certain extent.


Posted in xapian by Olly Betts on 2009-12-14 13:22 | Permalink


« First Post! | Home | The Spelling Table »