Skip to content

lilspikey/django-batch-select

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Django Batch Select

The idea of Django Batch Select is to provide an equivalent to Django's select_related functionality. As of such it's another handy tool for avoiding the "n+1 query problem".

select_related is handy for minimizing the number of queries that need to be made in certain situations. However it is only usual for pre-selecting ForeignKey relations.

batch_select is handy for pre-selecting ManyToManyField relations and reverse ForeignKey relations.

It works by performing a single extra SQL query after a QuerySet has been evaluated to stitch in the the extra fields asked for. This requires the addition of a custom Manager, which in turn returns a custom QuerySet with extra methods attached.

Example Usage

Assuming we have models defined as the following:

from batch_select.models import BatchManager

class Tag(models.Model):
    name = models.CharField(max_length=32)

class Section(models.Model):
    name = models.CharField(max_length=32)

    objects = BatchManager()

class Entry(models.Model):
    title = models.CharField(max_length=255)
    section = models.ForeignKey(Section, blank=True, null=True)
    tags = models.ManyToManyField(Tag)

    objects = BatchManager()

I'll also define a helper function to show the SQL queries generated:

from django import db

def show_queries():
    for query in db.connection.queries:
        print query["sql"]
    db.reset_queries()

Here are a few example (with generated sql queries):

>>> Entry.objects.batch_select('tags').all()
[]
>>> show_queries() # no results, so no 2nd query
SELECT "batch_select_entry"."id", "batch_select_entry"."title", "batch_select_entry"."section_id" FROM "batch_select_entry"
>>> Entry.objects.create()
>>> Entry.objects.create()
>>> tag1 = Tag.objects.create(name='tag1')
>>> tag2 = Tag.objects.create(name='tag2')
>>> db.reset_queries()
>>> entries = Entry.objects.batch_select('tags').all()
>>> entry = entries[0]
>>> print entry.tags_all
[]
>>> show_queries()
SELECT "batch_select_entry"."id", "batch_select_entry"."title", "batch_select_entry"."section_id" FROM "batch_select_entry" LIMIT 1
SELECT (`batch_select_entry_tags`.`entry_id`) AS "entry_id", "batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" = "batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags".entry_id IN (1)
>>> entry.tags.add(tag1)
>>> db.reset_queries()
>>> entries = Entry.objects.batch_select('tags').all()
>>> entry = entries[0]
>>> print entry.tags_all
[<Tag: Tag object>]
>>> show_queries()
SELECT "batch_select_entry"."id", "batch_select_entry"."title", "batch_select_entry"."section_id" FROM "batch_select_entry" LIMIT 1
SELECT (`batch_select_entry_tags`.`entry_id`) AS "entry_id", "batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" = "batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags".entry_id IN (1)
>>> entries = Entry.objects.batch_select('tags').all()
>>> for entry in entries:
....     print entry.tags_all
....
[<Tag: Tag object>]
[]
>>> show_queries()
SELECT "batch_select_entry"."id", "batch_select_entry"."title", "batch_select_entry"."section_id" FROM "batch_select_entry"
SELECT (`batch_select_entry_tags`.`entry_id`) AS "entry_id", "batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" = "batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags".entry_id IN (1, 2)

Re-running that same last for loop without using batch_select generate three queries instead of two (n+1 queries):

>>> entries = Entry.objects.all()
>>> for entry in entries:
....     print entry.tags.all()
....
[<Tag: Tag object>]
[]

>>> show_queries()
SELECT "batch_select_entry"."id", "batch_select_entry"."title", "batch_select_entry"."section_id" FROM "batch_select_entry"
SELECT "batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" = "batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags"."entry_id" = 1
SELECT "batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" = "batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags"."entry_id" = 2

This also works with reverse foreign keys. So for example we can get this entries that belong to each section:

>>> section1 = Section.objects.create(name='section1')
>>> section2 = Section.objects.create(name='section2')
>>> Entry.objects.create(section=section1)
>>> Entry.objects.create(section=section1)
>>> Entry.objects.create(section=section2)
>>> db.reset_queries()
>>> Section.objects.batch_select('entry_set')
[<Section: Section object>, <Section: Section object>]
>>> show_queries()
SELECT "batch_select_section"."id", "batch_select_section"."name" FROM "batch_select_section" LIMIT 21
SELECT ("batch_select_entry"."section_id") AS "__section_id", "batch_select_entry"."id", "batch_select_entry"."title", "batch_select_entry"."section_id", "batch_select_entry"."location_id" FROM "batch_select_entry" WHERE "batch_select_entry"."section_id" IN (1, 2)

Each section object in that query will have an entry_set_all field containing the relevant entries.

You need to pass batch_select the "related name" of the foreign key, in this case "entry_set". NB by default the related name for a foreign key does not actually include the _set suffix, so you can use just "entry" in this case. I have made sure that the _set suffix version also works to try and keep the API simpler.

More Advanced Usage

By default the batch fields are inserted into fields named <name>_all, on each object. So:

Entry.objects.batch_select('tags').all()

results in the Entry instances having fields called 'tags_all' containing the Tag objects associated with that Entry.

If you want to give the field a different name just use a keyword argument - in the same way as using the Aggregation API:

Entry.objects.batch_select(selected_tags='tags').all()

Would means the Tag objects would be assigned to fields called 'selected_tags'.

If you want to perform filtering of the related objects you will need to use a Batch object. By doing this you can pass extra keyword arguments in the same way as when using the filter method of a QuerySet:

from batch_select.models import Batch

Entry.objects.batch_select(tags_containing_blue=Batch('tags', name__contains='blue'))

Would return Entry objects with fields called 'tags_containing_name' with only those Tags whose name contains 'blue'.

In addition to filtering using keyword arguments, you can also call the following methods on a Batch object, with their effects being passed on to the underlying QuerySet object:

(Note that distinct(), values() etc are not included as they would have side-effects on how the extra query is associated with the original query) So for example to achieve the same effect as the filter above you could do the following:

from batch_select.models import Batch

Entry.objects.batch_select(tags_containing_blue=Batch('tags').filter(name__contains='blue'))

Whereas the following would exclude tags containing "blue" and order by name:

from batch_select.models import Batch

batch = Batch('tags').exclude(name__contains='blue').order_by('name')
Entry.objects.batch_select(tags_not_containing_blue=batch)

Compatibility

Django batch select should work with Django 1.1-1.3 at least.

TODOs and BUGS

See: http://github.com/lilspikey/django-batch-select/issues

About

batch select many-to-many and one-to-many fields (to help avoid n+1 query problem)

Resources

License

Stars

Watchers

Forks

Packages

No packages published