workaboutcontactblogs
All articles

UnderstandingORMsinDjango

A comprehensive look at Django's ORM how QuerySets work, when to avoid them, raw SQL escape hatches, N+1 problems, select_related vs prefetch_related, and query optimisation patterns.

The Django ORM is one of the most beloved and, paradoxically, one of the most misunderstood parts of the framework. It can help you move fast, but it can also hide performance time-bombs behind beautifully expressive Python. Let's look at it honestly.

What an ORM Actually Does

An ORM (Object-Relational Mapper) maps database tables to Python classes and rows to object instances. Django's ORM goes further it provides a query API that lets you compose SQL queries in Python, with lazy evaluation, chaining, and annotation support.

QuerySet Fundamentals

QuerySets are lazy. They don't hit the database until evaluated. They're also cached once evaluated, they store their results. Understanding these two properties is critical.

python
# Chaining - still lazy, still one query
articles = (
    Article.objects
    .filter(published=True)
    .exclude(author__isnull=True)
    .order_by('-created_at')
    .select_related('author', 'category')
)

# Evaluated - SQL fires
for article in articles:    # First iteration caches the result
    print(article.title)

for article in articles:    # Uses the cache, no SQL
    print(article.author.name)  # No extra query - already joined

The N+1 Problem

N+1 is Django's most common performance trap. You query N objects, then for each object you make one more query to fetch a related object. The result: 1 + N database round trips instead of 1 or 2.

python
# BAD N+1
articles = Article.objects.filter(published=True)
for article in articles:
    print(article.author.name)  # One DB query per article!

# GOOD select_related (JOIN for FK / OneToOne)
articles = Article.objects.filter(published=True).select_related('author')
for article in articles:
    print(article.author.name)  # No extra queries

# GOOD prefetch_related (separate query + Python join, for M2M / reverse FK)
articles = Article.objects.filter(published=True).prefetch_related('tags')
for article in articles:
    print([tag.name for tag in article.tags.all()])  # No extra queries

When to Use Which

  • select_related for ForeignKey and OneToOneField. Does a SQL JOIN. Use when you always need the related object.
  • prefetch_related for ManyToManyField and reverse ForeignKey. Does a separate query and joins in Python. More flexible.
  • Prefetch objects use Prefetch(queryset=...) when you need to filter or annotate the prefetched relation.

Annotations and Aggregations

Instead of fetching objects and computing values in Python, push the computation to the database. Django's annotate() and aggregate() let you do this cleanly.

python
from django.db.models import Count, Avg, Q, F

# Annotate each author with their published article count
authors = Author.objects.annotate(
    published_count=Count(
        'articles',
        filter=Q(articles__published=True)
    )
).order_by('-published_count')

# Use F() for column references (avoids extra Python round trip)
Article.objects.filter(views__gte=F('likes') * 10)

# Aggregate across the whole queryset
stats = Article.objects.aggregate(
    total=Count('id'),
    avg_views=Avg('views'),
)

Raw SQL Escape Hatches

Sometimes the ORM can't express what you need, or the generated SQL is suboptimal. Django provides three escape hatches:

python
# 1. Raw QuerySet maps to model, respects deferred fields
articles = Article.objects.raw(
    'SELECT * FROM blog_article WHERE views > %s ORDER BY views DESC',
    [1000]
)

# 2. Connection cursor full control, no model mapping
from django.db import connection
with connection.cursor() as cursor:
    cursor.execute('REFRESH MATERIALIZED VIEW article_stats')

# 3. RawSQL expression embed raw SQL inside an annotate/filter
from django.db.models.expressions import RawSQL
Article.objects.annotate(
    score=RawSQL('(views * 0.3 + likes * 0.7)', [])
).order_by('-score')

QuerySet Optimisation Checklist

  • Use only() or defer() to fetch only the columns you need avoids loading large text/blob fields.
  • Use values() or values_list() when you don't need model instances returns dicts/tuples, much lighter.
  • Add database indexes with db_index=True or Meta.indexes critical for filter/order_by fields.
  • Use iterator() for large QuerySets to avoid loading everything into memory at once.
  • Profile with django-debug-toolbar or django-silk in development never guess, always measure.
  • Use bulk_create() and bulk_update() instead of per-object save() loops.

Transactions and Atomicity

python
from django.db import transaction

# Wrap related writes in a transaction
@transaction.atomic
def transfer_credits(from_user, to_user, amount):
    from_user.credits = F('credits') - amount
    from_user.save(update_fields=['credits'])
    to_user.credits = F('credits') + amount
    to_user.save(update_fields=['credits'])
    # If any exception occurs, the entire block rolls back

# Savepoints for nested atomics
with transaction.atomic():
    do_outer_work()
    with transaction.atomic():  # Creates a savepoint
        do_risky_inner_work()   # Only this rolls back if it fails

The ORM is a tool that respects your intelligence it doesn't hide SQL from you, it gives you the option to not think about it most of the time. Master when to use it at its highest level and when to drop down to raw SQL, and you'll write Django that scales.