Searching in a Django site — part 2: how

One of the things that was still on my wish list for this site, was a proper search. In two articles I will explain how I’ve done this. The previous article described why I picked Djapian. This article focusses on some of the technical aspects of my setup.

Buildout

If you’ve read other articles from me, you may know that I’m a big fan of buildout. Thus the first step of adding search functionality starts with updating the buildout configuration.

[buildout]
parts =
    xapian
    xapian-bindingseggs =
    djapian

[versions]
djapian = 2.3.1
zc.recipe.cmmi = 1.3.2

[django]
recipe = djangorecipe
extra-paths =
    ${xapian:location}/lib/python

[xapian]
recipe = zc.recipe.cmmi
url = http://oligarchy.co.uk/xapian/1.0.22/xapian-core-1.0.22.tar.gz

[xapian-bindings]
recipe = zc.recipe.cmmi
url = http://oligarchy.co.uk/xapian/1.0.22/xapian-bindings-1.0.22.tar.gz
extra_options =
    PYTHON_LIB=${xapian:location}/lib/python
    XAPIAN_CONFIG=${xapian:location}/bin/xapian-config
    --with-python
    --with-php=no
    --with-ruby=no
    --with-java=no
    --with-csharp=no

Should you want to use Djapian yourself, note that version 2.3.1 is not compatible with the Xapian 1.2 series. Building the index works fine, but when you start searching you’ll get the following error:

'MSetItem' object has no attribute 'get_document'

According to issue 113 this will be solved in Djapian version 2.4. For now I just used Xapian 1.0.22.

Django

Using Djapian in your Django project is simple so I won’t waste to many words on it:

INSTALLED_APPS = (
    'djapian',
)

DJAPIAN_DATABASE_PATH = path/to/djapian_spaces

Don’t forget to run the syncdb management command afterwards.

Application

In your (weblog) application you’ll need to add an index.py file where you’ll define how your model(s) will be indexed. I just have one model, BlogEntry, which I want to index. So my index.py can be simple:

from djapian import space, Indexer
from blog.models import BlogEntry

class BlogEntryIndexer(Indexer):
    fields = [('title', 2), 'intro', 'body', 'caption']

    def trigger(indexer, object):
        return object.is_published

space.add_index(BlogEntry, BlogEntryIndexer, attach_as='indexer')

As you can see I think the title of the article is a bit more important. I also only want to index published articles.

To make sure the index is loaded when the application starts, a small addition to urls.py does the trick:

from djapian import load_indexes
load_indexes()

Then there is the actual search view. I’ll spare you the boring part and only show the code that performs the search action:

from djapian.resultset import xapian
from blog.models import BlogEntry

def search(request):
    # Get the search string as "query"
    resultset = BlogEntry.indexer.search(query).flags(
        xapian.QueryParser.FLAG_BOOLEAN |
        xapian.QueryParser.FLAG_PHRASE |
        xapian.QueryParser.FLAG_LOVEHATE |
        xapian.QueryParser.FLAG_WILDCARD |
        xapian.QueryParser.FLAG_PARTIAL).prefetch()
    ...

(See the Xapian API documentation for more flags.)

The index

There’s only two things left. The first is creating the index for the first time:

$ bin/django index --rebuild

The other thing is making sure the index is updated. In my case I run the following command once every hour. This means that for up to one hour after publishing an article, it cannot be found with the search. I think this is quite acceptable for my publishing rate. Anyway, the command:

$ bin/django index

And that’s all there is to it. If you want more information, read the Djapian Tutorial. It covers most of the above in greater detail.