Django, Sphinx and search by distance

Some of the projects I’ve been working on recently, required from me implementing search by a zip code. One of the approaches would be to use GeoDjango or geopy. This will solve the problem of looking up object by distance, but most of the time you want to have full-text search index on your models as well. So, wouldn’t it be wonderful if our search engine supported queries by distance? Turns out that with Sphinx you can do precisely that.

Sphinx installation and integration with django are described in depth in the following tutorials:

The only thing I wanted to mention, is that you can also install Sphinx with PostgreSQL. To do so you will need to have postgresql-server-8.3-devel on Debian installed and configure sphinx with —with-pgsql flag.

Now, let’s say you have a site with jobs.

class Job(models.Model):
    title = models.CharField(max_length=255)
    description = models.TextField()
    tags = TagField()
    latitude = models.FloatField()
    longitude = models.FloatField()

    search = SphinxSearch(
          index='jobs_job',
          weights={
              'title':100,
              'description':50,
              'tags': 70, 
         })

If you run ./manage.py generate_sphinx_config jobs, it will give you basic index config. You can tweak it by converting latitude and longitude to radians and declaring them as attributes.

source jobs_job
{
    type                = pgsql
    sql_host            = localhost
    sql_user            = youruser
    sql_pass            = yourpassword
    sql_db              = sphinx_example
    sql_port            = 
    sql_query_pre       =
    sql_query_post      =
    sql_query           = \
        SELECT id, title, description, tags, radians(latitude) as latit, radians(longitude) as longit\
        FROM jobs_job
    sql_query_info      = SELECT * FROM `jobs_job` WHERE `id` = $id

    sql_attr_float      = latit
    sql_attr_float      = longit
}

index jobs_job
{
    source          = jobs_job
    path            = /var/data/jobs_job
    docinfo         = extern
    morphology      = none
    stopwords       =
    min_word_len    = 2
    charset_type    = utf-8
    min_prefix_len  = 0
    min_infix_len   = 0
}

Now you can use “geoanchor” attribute of the SphinxSearch object to perform search by distance. The only thing you need to remember, Sphinx expects your latitude and longitude to be in radians, and it returns distance in meters. 1 mile is approximately 1609.344 meters. For example, if you want to search for all jobs in 10 mile radius from given geographic point:

from django.core.management import setup_environ
import settings

setup_environ(settings)

from jobs.models import Job

import math

def to_radians(latit, longit):
    return (math.radians(latit),math.radians(longit))

latit, longit = to_radians(34.095259, -118.347997)

mi = 1609.344

job_list = (
     Job.search.geoanchor('latit', 'longit', latit, longit)
               .filter(**{'@geodist__lt':10*mi})
               .order_by('-@geodist')
)

print [x.sphinx.values() for x in job_list]

Geoanchor syntax requires you to specify names of the Sphinx attributes for latitude and longitude. You have them in your index config as ‘latit’ and ‘longit’. Sphinx is using “magic” @geodist attribute to work with distance, which doesn’t work with python syntax for function named arguments, that’s why you have to use **kwargs syntax in filter.

You can find full code of Django project on GitHub:

http://github.com/zen4ever/djangosphinx_example

UPDATE: Updated attribute names, so they won’t be similar to database keywords, thanks to @imns81

Filed under  //   django   sphinx  

Solving "n+1 query problem" in Django using custom template tags

Here is a neat trick I saw in Pinax, in the bookmarks app.

Related votes for bookmarks are retrieved in one query inside the custom template tag, and then grouped into dictionary using bookmark ids as keys. Then “get_item_from_dict” is used in the “for loop” to retrieve votes related to particular bookmark and display them in the template.

Generally speaking, if you need to select related items and can’t use “select_related”, you should seriously consider using django-batch-select. It handles both ForeignKey and ManyToManyField relations, and allows you to filter related models by their fields as well.

But it in some cases custom template tags feels like a more natural thing to do. In my case I wanted also to use “autopaginate” tag. So I wasn’t sure how django-batch-select would play with it, since pagination is happening in the template. Additional bonus is that there is no need to override manager on your model.

The only problem is that you need to write custom template tag, and get into subclassing Node, and parsing your tag arguments. Luckily we can use awesome django-templatetag-sugar written by Alex Gaynor. Writing custom Django tags has never been easier.

Assuming we have following models:

class Deal(models.Model):
    title = models.CharField(max_length=55)

class Coupon(models.Model):
     deal = models.ForeignKey(Deal)
     unique_id = models.CharField(max_length=55)

Our deals_tags.py would be something like following.

from django import template
from deals.models import Deal, Coupon

from templatetag_sugar.register import tag
from templatetag_sugar.parser import Name, Variable, Constant
from itertools import groupby

register = template.Library()

@tag(register, [Variable(), Constant("as"), Name()])
def related_coupons(context, qs, asvar):
    ids = map(lambda x: x.id, qs)
    coupons = Coupon.objects.filter(deal__in=ids)

    groups=[]
    uniquekeys=[]
    # groupby returns an iterator with iterators inside
    # so we need to go through each item and make a list from it
    for k, g in groupby(coupons, lambda y: y.deal_id):
        groups.append(list(g))
        uniquekeys.append(k)
    context[asvar] = dict(zip(uniquekeys, groups))
    return ''

@tag(register, [Variable(), Variable(), Constant("as"), Name()])
def get_from_dict(context, dict_var, key_var, asvar):
    context[asvar] = dict_var.get(key_var, None)
    return ''

Now you can use related coupons in your template:

{% related_coupons deals as coupons_dict %}

{% for deal in deals %}
    {{ deal.title }}
    {% get_from_dict coupons_dict deal.id as coupons %}
    # access your coupons here
{% endfor %}
Filed under  //   django  

About


Twitter