Improve Django ORM performance on Foreign Keys

If you use Foreign Keys on a model in Django, you might not be aware of performance issues until it hits you. Navigating through ForeignKey relationships in your code/templates are very easy, but creates a db query every time.
Lets look at this problem with a simple model:

class Category(models.Model):
    name = models.CharField(max_length=255)
	
class Article(models.Model):
    title = models.CharField(max_length=255)
    content = models.TextField()

    category = models.ForeignKey(Category)
	
    created_by = models.ForeignKey(User, related_name='+')
    modified_by = models.ForeignKey(User, related_name='+', blank=True, null=True)

For a list of articles you could write something like this:

for article in Article.objects.all(): 
    print "%s by %s in %s" % (article.title, article.modified_by or article.created_by, article.category.name)

Run the code and look at the generated queries (either in the django development server output or, a litte bit prettier, in the SQL tab of the Django Debug Toolbar. With 3 Article objects, this will create 7 SQL queries to your database:

  • One for the list of article objects
  • One for the modified_by field on each object
  • One for the created_by field on each object
  • One for the category field on each object

To work around this issue, you can call select_related() on the Manager object of the Article class. This will combine all referenced objects into one query, which is usually a lot faster! Have a look into the documentation for a list of parameters.

How to parse a syslog logfile in python

Thanks to the incredible pyparsing module it is really easy to parse arbitrary files without the hassle of regular expressions.

The following code parses a standard syslog-ng logfile:

from pyparsing import Word, alphas, Suppress, Combine, nums, string, Optional, Regex

month = Word(string.uppercase, string.lowercase, exact=3)
integer = Word(nums)
serverDateTime = Combine(month + " " + integer + " " + integer + ":" + integer + ":" + integer)
hostname = Word(alphas + nums + "_" + "-")
daemon = Word(alphas + "/" + "-" + "_") + Optional(Suppress("[") + integer + Suppress("]")) + Suppress(":")
message = Regex(".*")
bnf = serverDateTime + hostname + daemon + message

with open('/path/to/logfile') as syslogFile:
	for line in syslogFile:
		fields = bnf.parseString(line)
		print fields

Es ist noch nicht vorbei

Inspiriert von der Python-Hackerei und den Möglichkeiten die das Yahoo! Developer Network bieten, habe ich direkt das World-Heatmap Script von Simon Willison aus seinem Vortrag von den StackOverflow DevDays mit dem Yahoo Developer Network gekreuzt und herausgekommen ist diese Karte der Welt mit den Herkunftsländern der gebannten IPs der aktuellen Welle (rot ist böse):

Es ist noch nicht vorbei

Und weil ich gerade dabei war, habe ich auch noch eine für die am meisten abgelehnten Mails am Mailserver erstellt:

Es ist noch nicht vorbei

Eclipse SVN auto keywords

In ~/.subversion/config unter

[miscellany]
enable-auto-props = yes

setzen und unter

[auto-props]
*.java = svn:eol-style=native;svn:keywords=Id 
*.php = svn:eol-style=native;svn:keywords=Id 

eintragen. Dann wird bei jedem svn add auch die Properties auf *.java gesetzt.