Failure-Tolerant Queue In Django And PostgreSQL

Moving expensive tasks to background processes may potentially unlock all new user experiences and optimizations for web applications, but has traditionally required complexity and a secondary source of truth. What if there was an easier way?

November 11, 2021

DjangoPostgres

Moving expensive tasks to background processes may potentially unlock all new user experiences for web applications, but has traditionally required complexity and a secondary source of truth. Potential benefits of pushing long-running tasks to a background queue include freeing up the user interface, horizontally scaling with background workers, and preventing web servers from being overloaded.

The typical way that people stream work to background processes seems to be adding a service external to the main database to handle it, such as Redis, RabbitMQ, or Kafka. Then special care needs to be taken to ensure data isn't lost during unexpected failures or accidentally duplicated. What if there was an easier way?

If you are working with Django, then chances are you already use PostgreSQL. The good news is that a queue can be implemented directly in PostgreSQL, which has some nice features like failure-tolerance and exactly-once processing.

Start by defining a model with an auto-incrementing primary key:

api/models.py

from django.db.models import Model

class BackgroundTask(Model):
    id = models.BigAutoField(primary_key=True)
    # Other fields...

As you can see this is a perfectly normal Django model. You may also insert rows into it normally. The secret sauce is how it is queried. PostgreSQL supports the ability to lock rows to a transaction, select their columns, and then delete when closing the transaction. Assuming the Django "app" where you created your model is called api, that query looks like this:

DELETE FROM api_backgroundtask  -- Regular DELETE so far...
WHERE id = (                    -- Subquery to SELECT row to be updated, and lock just that row.
    SELECT id
    FROM api_backgroundtask
    ORDER BY id                 -- First in, first out.
    FOR UPDATE SKIP LOCKED      -- Lock for update. Skip other locked rows.
    LIMIT 1                     -- Only one row.
)
RETURNING *;                    -- Return all columns.

Plucking tasks from our queue in Django looks like this:

from api.models import BackgroundTask
from django.db import transaction

# ...

with transaction.atomic():
    results = BackgroundTask.objects.raw(
        '''
        DELETE FROM api_backgroundtask
        WHERE id = (
            SELECT id
            FROM api_backgroundtask
            ORDER BY id
            FOR UPDATE SKIP LOCKED
            LIMIT 1
        )
        RETURNING *;
        '''
    )
    for background_task in results:
        # There is only going to be at most one result due to the LIMIT above.
        # Now do something with background_task...

Now if the transaction fails (due to for example either the process exiting or something in the loop), then any locked row will be left untouched and unlocked in the table. Multiple PostgreSQL connections can also pluck rows from the queue without locking each other out.

In conclusion, this is going to be a very solid starting point for background processing in most web applications. Kafka is always a reliable alternative that is capable of handling more streaming data than PostgreSQL, but is also complex and expensive to run. As always, be sure to benchmark if you aren't sure the technology fits your performance needs.