Olly
[ RSS | ATOM 1.0 ]
Powered by PyBlosxom

« Low-Hanging Key Size Reduction Results | Home | Kilmister Track »

Gmane Size Analysis Update

I have rerun the analysis scripts on the converted database, and will summarise the changes.

The total size of the database has dropped from 346GB to 338GB. Here's the breakdown of the key size statistics:

Table Original key size range (bytes) Original key size mean (bytes) New key size range (bytes) New key size mean (bytes) Reduction in mean (bytes)
record 2-5 4.76 2-4 3.94 0.82
termlist 2-5 4.76 2-4 3.94 0.82
spelling 3-65 10.29 3-65 10.29 0
position 3-70 10.35 3-69 9.53 0.82
postlist 1-193 17.23 1-191 15.38 1.85

The graphs are all essentially the same shape as before.

Looking at the "space breakdown" tables, the per-entry overhead has increased relative to the others in every case, which should come as little surprise. So reducing the per item overhead is more important than ever.

More unexpectedly, there are now more continuation items from splitting tags for all the affected tables. The explanation for this is presumably that with a shorter key, forcing an entry to be split to make use of unused space at the end of a block is more often going to save space (splitting an entry means that the key has to be repeated), and this effect is greater than the reduction in the number of entries which have to be split because of their size.


Posted in xapian by Olly Betts on 2009-12-18 14:14 | Permalink


« Low-Hanging Key Size Reduction Results | Home | Kilmister Track »