Olly's Bloghttp://survex.com/~olly/blog/index.atomOlly Bettshttp://survex.com/~olly/blog/index.atomolly@survex.comCopyright (C) 2009-2013 Olly Betts
PyBlosxom http://pyblosxom.sourceforge.net/ 1.4.3 01/10/2008
2013-10-18T05:01:18ZINT(PI)http://survex.com/~olly/blog/2013/10/18/int-pi2013-10-18T05:01:18Z2013-10-18T05:01:18Z
<div class="document" id="int-pi">
<p>Slide 5 from my nostalgia-fest <a class="reference external" href="http://survex.com/~olly/talks/small-programs/">"The Art of Writing Small Programs"</a>
from just under two years ago:</p>
<img alt="The Art of Writing Small Programs - Slide 5" src="http://survex.com/~olly/blog/img/small-programs-slide5.png" style="width: 566px; height: 391px;" />
<p><a class="reference external" href="http://xkcd.com/1275/">XKCD 1275</a> from last week:</p>
<img alt="xkcd 1275 - INT(PI)" src="http://survex.com/~olly/blog/img/xkcd_int_pi.png" style="width: 469px; height: 75px;" />
<p>(Of course that should be <cite>INT SQR EXP PI/INT PI * PI * R ** INT PI</cite>).</p>
</div>
Debian GSoC Applications for 2013http://survex.com/~olly/blog/2013/05/06/debian-gsoc-applications-20132013-05-06T01:36:43Z2013-05-06T01:36:43Z
<div class="document" id="debian-gsoc-applications-for-2013">
<p>I've produced a graph of the 61 student applications which Debian
received for <a class="reference external" href="http://code.google.com/soc">GSoC</a> this year:</p>
<img alt="Graph of student applications to Debian in GSoC 2013" src="http://survex.com/~olly/blog/img/debian-gsoc-2013.png" style="width: 409px; height: 207px;" />
<p>Ana <a class="reference external" href="http://ekaia.org/blog/2012/04/27/debian-in-the-google-summer-of-code-2012/">blogged a similar graph last year</a>
if you want to compare. It looks like the total is down a little (though
I'm not sure if the figure of 81 from the text, or ~68 read from the graph
is correct for last year) - this is likely at least partly due to the number of
proposals each student can send having been reduced from 20 last year to 5
this year, which should have reduced the number of low quality proposals.
The timeline this year is later, which may have also had an effect.</p>
<p>If you're an admin or a mentor, you can produce a similar graph for your
own org(s) - just download this <a class="reference external" href="http://survex.com/~olly/gsoc-proposal-graph-2013.ods">OpenDocument spreadsheet</a> and follow the
instructions inside.</p>
</div>
Empty Strings in C++http://survex.com/~olly/blog/2013/01/06/empty-strings-in-c++2013-01-05T12:26:51Z2013-01-05T12:26:51Z
<div class="document" id="empty-strings-in-c">
<p>Gaurav (one of Xapian's GSoC students this year) queried my advice that it's
better not to explicitly initialise a std::string in C++ like this:</p>
<pre class="literal-block">
std::string s = "";
</pre>
<p>but instead to write:</p>
<pre class="literal-block">
std::string s;
</pre>
<p>It could be argued that the former makes it clearer that the string is
initialised (since C++ does inherit C's behaviour of not implicitly
initialising variables of fundamental types, such as int and double). But
objects aren't left uninitialised - the default constructor gets called
(or if there isn't one, the code won't compile).</p>
<p>The downside is that you get quite a lot more code from the first version
than the second. Perhaps compilers will grow smarter and in future both
the above will compile to the same code, but that's not true today.</p>
<p>Here's a simple bit of test code:</p>
<pre class="literal-block">
#include <string>
std::string f() {
#ifdef EXPLICIT_INIT
std::string s = "";
#else
std::string s;
#endif
return s;
}
</pre>
<p>And I'll compile it with GCC (g++ (Debian 4.7.2-4) 4.7.2) on x86-64 to produce
assembler output:</p>
<pre class="literal-block">
$ g++ -O2 -S s.cc -o s1.s
$ g++ -DEXPLICIT_INIT -O2 -S s.cc -o s2.s
$ diff s1.s s2.s
1a2,4
> .section .rodata.str1.1,"aMS",@progbits,1
> .LC0:
> .string ""
9,10c12,25
< movq %rdi, %rax
< movq $_ZNSs4_Rep20_S_empty_rep_storageE+24, (%rdi)
---
> pushq %rbx
> .cfi_def_cfa_offset 16
> .cfi_offset 3, -16
> movl $.LC0, %esi
> movq %rdi, %rbx
> subq $16, %rsp
> .cfi_def_cfa_offset 32
> leaq 15(%rsp), %rdx
> call _ZNSsC1EPKcRKSaIcE
> addq $16, %rsp
> .cfi_def_cfa_offset 16
> movq %rbx, %rax
> popq %rbx
> .cfi_def_cfa_offset 8
</pre>
<p>(As an aside, I wouldn't recommend spending a lot of time digging into the
assembler your compiler produces, but if there's more than one equivalent way
to write code for a common case (as here) and you want to know which is most
efficient, it can be informative to see what code is produced for each).</p>
<p>You don't really need to know x86-64 assembler to grasp what's happening
above, which is lucky, since I don't really know x86-64 assembler. In the
first hunk, we get an empty literal string being added to the rodata (read-only
data) section; and in the second, instead of two instructions which copy a
standard empty string representation, we get much more elaborate code, including
a function call to _ZNSsC1EPKcRKSaIcE - the C++ mangled name for the
std::string from const char * constructor:</p>
<pre class="literal-block">
$ c++filt _ZNSsC1EPKcRKSaIcE
std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)
</pre>
<p>I also tested the example above with clang (Debian clang version 3.0-6
(tags/RELEASE_30/final) (based on LLVM 3.0)) and the resultant code is
fairly similar for each case.</p>
<p>The overhead for doing this once isn't going to matter, but if it happens
every time you declare a std::string variable, the cumulative effects may
be measurable. And as well as taking longer to execute, the larger code will
cause greater CPU cache pressure.</p>
</div>
Message of the Dayhttp://survex.com/~olly/blog/2012/10/26/motd2012-10-25T19:04:53Z2012-10-25T19:04:53Z
<div class="document" id="message-of-the-day">
<p>I was staying in room <a class="reference external" href="http://en.wikipedia.org/wiki/2112_(album)">2112</a> in
the hotel for the GSoC mentor summit at the weekend, which wetted my appetite
for some <a class="reference external" href="http://en.wikipedia.org/wiki/Rush_(band)">Rush</a>, so I stuck a CD in
my laptop today and got this dialog:</p>
<img alt="Presto Rush Presto Rush Presto Rush" src="http://survex.com/~olly/blog/img/presto-rush.png" style="width: 648px; height: 287px;" />
<p>Looks like there's already <a class="reference external" href="http://bugs.debian.org/631760">a report (#631760)</a> for this issue.</p>
</div>
Xapian GSoC 2012 Projectshttp://survex.com/~olly/blog/2012/04/26/xapian-gsoc-2012-projects2012-04-26T07:06:51Z2012-04-26T07:06:51Z
<div class="document" id="xapian-gsoc-2012-projects">
<p>At the end of the <a class="reference external" href="http://survex.com/~olly/blog/xapian/xapian-gsoc-applications-for-2012.html">previous episode</a>,
you may remember our gallant heroes had a pile of 30 proposals to review.
We soon spotted one more to mark as invalid (just a paste with our ideas list
plus a some biographical details), and another got withdrawn by the student
without explanation (but was low quality anyway), so that left us with 28.</p>
<p>We had six volunteers for mentoring, and in the initial allocation we received
five student slots from Google, but we asked nicely if we could have an extra
one, and were lucky enough to get it. Last year we had four students, so that's
a 50% increase.</p>
<p>Here's those 28, broken down by the project idea:</p>
<blockquote>
<ul class="simple">
<li>8 - Weighting Schemes</li>
<li>6 - Learning to Rank</li>
<li>3 - Dynamic Snippets</li>
<li>2 - Lucene Backend</li>
<li>2 - QueryParser improvements</li>
<li>1 - Erlang Bindings</li>
<li>1 - Improve C# and Java bindings</li>
<li>1 - Improve PHP Bindings</li>
<li>1 - Improve Python Bindings</li>
<li>1 - Improving Japanese Support</li>
<li>1 - Node.js Bindings</li>
<li>1 - Postlist encodings</li>
</ul>
</blockquote>
<p>I find it interesting that the most popular three ideas have closer connections
to Information Retrieval theory than most - probably these appeal to
students who have taken IR courses and already have an interest and some
knowledge of the project area. I think we should aim to get more ideas like
these on the list in future years.</p>
<p>It's worth noting that in several cases students had taken an idea in
sufficiently different directions that there wasn't much overlap, so we didn't
just pick the best proposal for each project idea to narrow things down. Also,
the proposal isn't the only factor - we like to see applicants work on patch,
and to interact with us on IRC and/or email. But in the end it happens we
ended up with proposals which were all from different ideas - here are those we
selected:</p>
<blockquote>
<ul class="simple">
<li>Gaurav Arora: <a class="reference external" href="http://www.google-melange.com/gsoc/project/google/gsoc2012/samuelharden/21001">Bi-gram Language Modeling</a></li>
<li>Marius Tibeica: <a class="reference external" href="http://www.google-melange.com/gsoc/project/google/gsoc2012/mtibeica/27001">Node.js Bindings</a></li>
<li>Mihai Bivol: <a class="reference external" href="http://www.google-melange.com/gsoc/project/google/gsoc2012/mbivol/19003">Omega: Dynamic Snippets</a></li>
<li>Rishabh Mehrotra: <a class="reference external" href="http://www.google-melange.com/gsoc/project/google/gsoc2012/rishabhmehrotra/29001">Ranking :: Learning to Rank</a></li>
<li>Sehaj Singh Kalra: <a class="reference external" href="http://www.google-melange.com/gsoc/project/google/gsoc2012/sehaj_singh_kalra/22001">QueryParser Reimplementation</a></li>
<li>Uvarov Michael: <a class="reference external" href="http://www.google-melange.com/gsoc/project/google/gsoc2012/freeakk/33001">Erlang Bindings</a></li>
</ul>
</blockquote>
<p>My congratulations to the lucky six, and my commiserations to those we weren't
able to select. It wasn't an easy selection to make, and we truly appreciate
the time you spent writing your proposal, working on patches, and on the rest
of the application process. We'd encourage you to remain involved with Xapian,
and to apply to us again next year if you're still eligible for GSoC.</p>
</div>
Xapian GSoC Applications for 2012http://survex.com/~olly/blog/2012/04/09/xapian-gsoc-applications-for-20122012-04-08T13:04:33Z2012-04-08T13:04:33Z
<div class="document" id="xapian-gsoc-applications-for-2012">
<p>Student applications for <a class="reference external" href="http://code.google.com/soc">GSoC</a> closed a day
or so ago, and we've done an initial pass through <a class="reference external" href="http://xapian.org/">Xapian's</a> applications, so I thought I should post another
overview, similar to <a class="reference external" href="http://survex.com/~olly/blog/xapian/xapian-gsoc-applications-for-2011.html">last year's</a>.</p>
<p>We received a total of 41 applications this year (very close to last year's
total of 42). Here's a graph of applications against time:</p>
<img alt="Graph of student applications to Xapian in GSoC 2012" src="http://survex.com/~olly/blog/img/xapian-gsoc2012-applications.png" style="width: 466px; height: 248px;" />
<p>If you're an admin or a mentor, you can produce a similar graph for your
own org(s) - just download this <a class="reference external" href="http://survex.com/~olly/gsoc-proposal-graph-2012.ods">OpenDocument spreadsheet</a> and follow the
instructions inside.</p>
<p>That total of 41 includes one duplicate and one application withdrawn by the
student (we had one of each last year too). I've also gone through and marked
nine spam proposals as invalid (similar to the seven we had last year). Spam
proposals are things like proposals with no connection at all to Xapian, and
proposals which are just a title and/or paste from our ideas list with a
generic biography.</p>
<p>So that leaves us with 30 proposals (compared to 33 last year). It's hard
to really measure, but my feeling is that the average quality is higher
than last year (and it was already pretty impressive last year).</p>
</div>