Andrew Kurinnyi's posterous http://www.zen4ever.com Blog about Python, Django, or any other form of programming I happen to practice at the time posterous.com Thu, 10 Jun 2010 01:44:00 -0700 Django, Sphinx and search by distance http://www.zen4ever.com/django-sphinx-and-search-by-distance http://www.zen4ever.com/django-sphinx-and-search-by-distance

Some of the projects I’ve been working on recently, required from me implementing search by a zip code. One of the approaches would be to use GeoDjango or geopy. This will solve the problem of looking up object by distance, but most of the time you want to have full-text search index on your models as well. So, wouldn’t it be wonderful if our search engine supported queries by distance? Turns out that with Sphinx you can do precisely that.

Sphinx installation and integration with django are described in depth in the following tutorials:

The only thing I wanted to mention, is that you can also install Sphinx with PostgreSQL. To do so you will need to have postgresql-server-8.3-devel on Debian installed and configure sphinx with —with-pgsql flag.

Now, let’s say you have a site with jobs.

class Job(models.Model):
    title = models.CharField(max_length=255)
    description = models.TextField()
    tags = TagField()
    latitude = models.FloatField()
    longitude = models.FloatField()

    search = SphinxSearch(
          index='jobs_job',
          weights={
              'title':100,
              'description':50,
              'tags': 70, 
         })

If you run ./manage.py generate_sphinx_config jobs, it will give you basic index config. You can tweak it by converting latitude and longitude to radians and declaring them as attributes.

source jobs_job
{
    type                = pgsql
    sql_host            = localhost
    sql_user            = youruser
    sql_pass            = yourpassword
    sql_db              = sphinx_example
    sql_port            = 
    sql_query_pre       =
    sql_query_post      =
    sql_query           = \
        SELECT id, title, description, tags, radians(latitude) as latit, radians(longitude) as longit\
        FROM jobs_job
    sql_query_info      = SELECT * FROM `jobs_job` WHERE `id` = $id

    sql_attr_float      = latit
    sql_attr_float      = longit
}

index jobs_job
{
    source          = jobs_job
    path            = /var/data/jobs_job
    docinfo         = extern
    morphology      = none
    stopwords       =
    min_word_len    = 2
    charset_type    = utf-8
    min_prefix_len  = 0
    min_infix_len   = 0
}

Now you can use “geoanchor” attribute of the SphinxSearch object to perform search by distance. The only thing you need to remember, Sphinx expects your latitude and longitude to be in radians, and it returns distance in meters. 1 mile is approximately 1609.344 meters. For example, if you want to search for all jobs in 10 mile radius from given geographic point:

from django.core.management import setup_environ
import settings

setup_environ(settings)

from jobs.models import Job

import math

def to_radians(latit, longit):
    return (math.radians(latit),math.radians(longit))

latit, longit = to_radians(34.095259, -118.347997)

mi = 1609.344

job_list = (
     Job.search.geoanchor('latit', 'longit', latit, longit)
               .filter(**{'@geodist__lt':10*mi})
               .order_by('-@geodist')
)

print [x.sphinx.values() for x in job_list]

Geoanchor syntax requires you to specify names of the Sphinx attributes for latitude and longitude. You have them in your index config as ‘latit’ and ‘longit’. Sphinx is using “magic” @geodist attribute to work with distance, which doesn’t work with python syntax for function named arguments, that’s why you have to use **kwargs syntax in filter.

You can find full code of Django project on GitHub:

http://github.com/zen4ever/djangosphinx_example

UPDATE: Updated attribute names, so they won’t be similar to database keywords, thanks to @imns81

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/239797/andrew.jpg http://posterous.com/users/36Pu1E4uUiE9 Andrew Kurinnyi zen4ever Andrew Kurinnyi
Sun, 30 May 2010 04:44:00 -0700 Solving "n+1 query problem" in Django using custom template tags http://www.zen4ever.com/solving-n1-query-problem-in-django-using-cust http://www.zen4ever.com/solving-n1-query-problem-in-django-using-cust

Here is a neat trick I saw in Pinax, in the bookmarks app.

Related votes for bookmarks are retrieved in one query inside the custom template tag, and then grouped into dictionary using bookmark ids as keys. Then “get_item_from_dict” is used in the “for loop” to retrieve votes related to particular bookmark and display them in the template.

Generally speaking, if you need to select related items and can’t use “select_related”, you should seriously consider using django-batch-select. It handles both ForeignKey and ManyToManyField relations, and allows you to filter related models by their fields as well.

But it in some cases custom template tags feels like a more natural thing to do. In my case I wanted also to use “autopaginate” tag. So I wasn’t sure how django-batch-select would play with it, since pagination is happening in the template. Additional bonus is that there is no need to override manager on your model.

The only problem is that you need to write custom template tag, and get into subclassing Node, and parsing your tag arguments. Luckily we can use awesome django-templatetag-sugar written by Alex Gaynor. Writing custom Django tags has never been easier.

Assuming we have following models:

class Deal(models.Model):
    title = models.CharField(max_length=55)

class Coupon(models.Model):
     deal = models.ForeignKey(Deal)
     unique_id = models.CharField(max_length=55)

Our deals_tags.py would be something like following.

from django import template
from deals.models import Deal, Coupon

from templatetag_sugar.register import tag
from templatetag_sugar.parser import Name, Variable, Constant
from itertools import groupby

register = template.Library()

@tag(register, [Variable(), Constant("as"), Name()])
def related_coupons(context, qs, asvar):
    ids = map(lambda x: x.id, qs)
    coupons = Coupon.objects.filter(deal__in=ids)

    groups=[]
    uniquekeys=[]
    # groupby returns an iterator with iterators inside
    # so we need to go through each item and make a list from it
    for k, g in groupby(coupons, lambda y: y.deal_id):
        groups.append(list(g))
        uniquekeys.append(k)
    context[asvar] = dict(zip(uniquekeys, groups))
    return ''

@tag(register, [Variable(), Variable(), Constant("as"), Name()])
def get_from_dict(context, dict_var, key_var, asvar):
    context[asvar] = dict_var.get(key_var, None)
    return ''

Now you can use related coupons in your template:

{% related_coupons deals as coupons_dict %}

{% for deal in deals %}
    {{ deal.title }}
    {% get_from_dict coupons_dict deal.id as coupons %}
    # access your coupons here
{% endfor %}

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/239797/andrew.jpg http://posterous.com/users/36Pu1E4uUiE9 Andrew Kurinnyi zen4ever Andrew Kurinnyi
Sat, 29 May 2010 04:02:00 -0700 Converting Word files to plain text with IronPython http://www.zen4ever.com/converting-word-files-to-plain-text-with-iron http://www.zen4ever.com/converting-word-files-to-plain-text-with-iron

It’s been some time since I had a need to do Word scripting, because I’m working mostly with Linux and Mac.

My first attempt was trying to use pywin32 and COM.

The problem was that the documents I was trying to convert were in Russian. So I wanted the output to be in UTF-8.

While SaveAs method in Word has Encoding parameter, it wasn’t quite clear, how would I specify it from pywin32.

So, my next attempt was to use IronPython, since it has native .NET interface with Office. The biggest advantage of this approach was the fact that you can do dir() on all objects and methods in IronPython shell.

After some googling on encodings, and IronPython Word scripting, here is the script I came up with.

import sys
import clr
import System
from System.IO import DirectoryInfo, Path

clr.AddReference("Microsoft.Office.Interop.Word")
import Microsoft.Office.Interop.Word as Word

def convert_files(doc_path):
    directory = DirectoryInfo(doc_path)
    files = directory.GetFiles("*.doc")
    for file_info in files:
        doc_to_text(doc_path, file_info.Name)
    return

def doc_to_text(folder, filename):
    wa = Word.ApplicationClass()
    file, ext = filename.split('.')
    document = wa.Documents.Open(Path.Combine(folder, filename),
                                 ReadOnly=True)
    document.SaveAs(Path.Combine(folder, file+'.txt'),
                    FileFormat=Word.WdSaveFormat.wdFormatDOSText,
                    Encoding=65001)
    document.Close()

if __name__ == "__main__":
    if len(sys.argv) == 2:
        convert_files(sys.argv[1])
    else:
        print "Requires folder name as an argument"

According to MSDN encoding 65001 corresponds to UTF-8.

Update, Tim Golder(tjguk) pointed out that it is very easy to do it in Win32 as well:

import win32com.client

word = win32com.client.gencache.EnsureDispatch ("Word.Application")
const = win32com.client.constants
doc = word.Documents.Open (r"c:\temp\temp.doc")
doc.SaveAs (r"c:\temp\temp.txt",
            FileFormat=const.wdFormatDOSText,
            Encoding=65001)
doc.Close ()
word.Quit ()

print open ("c:/temp/temp.txt").read ()

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/239797/andrew.jpg http://posterous.com/users/36Pu1E4uUiE9 Andrew Kurinnyi zen4ever Andrew Kurinnyi
Sun, 15 Nov 2009 14:33:43 -0800 Drupal or Django? A guide for Decision Makers. http://www.zen4ever.com/drupal-or-django-a-guide-for-decision-makers http://www.zen4ever.com/drupal-or-django-a-guide-for-decision-makers http://birdhouse.org/blog/2009/11/11/drupal-or-django/
Interesting article about pros and cons of both systems.
Author favors Django for his projects.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/239797/andrew.jpg http://posterous.com/users/36Pu1E4uUiE9 Andrew Kurinnyi zen4ever Andrew Kurinnyi