Optimizer🔗

This section covers Undine's query optimizer, which is responsible for optimizing queries to your GraphQL schema in order to reduce the amount of database queries that are made when resolving a request.

The Problems🔗

Before we take a look at how the optimizer works, let's first understand why it exists by going over some common problems that arise when using GraphQL to fetch data from a relational database.

The N+1 Problem🔗

Let's say you have a collection of models like this:

from django.db import models


class Project(models.Model):
    name = models.CharField(max_length=255)


class Task(models.Model):
    name = models.CharField(max_length=255)
    description = models.TextField()

    project = models.ForeignKey(Project, on_delete=models.CASCADE)


class Step(models.Model):
    name = models.CharField(max_length=255)
    done = models.BooleanField(default=False)

    task = models.ForeignKey(Task, on_delete=models.CASCADE)

And a schema like this:

type ProjectType {
  pk: Int!
  name: String!
}

type StepType {
  pk: Int!
  name: String!
  done: Boolean!
}

type TaskType {
  pk: Int!
  name: String!
  description: String!
  project: ProjectType!
  steps: [StepType!]!
}

type Query {
  tasks: [TaskType!]!
}

Now, let's say you query tasks like this:

query {
  tasks {
    pk
    name
    description
    project {
      pk
      name
    }
    steps {
      pk
      name
      done
    }
  }
}

In GraphQL, each field in the query will be resolved separately, and most importantly, if a field returns a list of objects with subfields, the resolvers for those subfields will be called for each object in the list. Normally, a field's resolver knows nothing about the query it is in, and so it only fetches the data it needs.

In this case, the top level resolver for tasks will fetch all Task objects, but won't make any joins to related models its subfields might need. For example, when the subfield for the related Project resolves, that resolver will try to look up the Project from the root Task instance it received, but since the Project was not fetched along with the Task, it needs to make another query to the database.

This means that for the whole query we first fetch all Tasks, and then all Projects and Steps for each Task. If we had 100 Tasks, each of which is linked to a Project, but also to 10 Steps. In total this would result in 201 queries to the database!

It's important to notice that the amount of queries is proportional to the amount of Tasks in the database. You can imagine how this can get out of hand quickly, especially when you start nesting relations deeper and deeper. Each additional level of nesting will grow the number of queries exponentially! That's why it's called the N+1 problem: 1 query for the root object, and N queries for all subfields, where N is the amount of root objects.

Over-fetching🔗

Another issue with resolving Django models using normal resolvers is that when a model is fetched from the database, all of its non-relational fields are also fetched by default. This means that we'll fetch fields that are not needed in the query. This can be expensive if the model has many fields or fields that contain a lot of data. This is called over-fetching, in contrast to the N+1 problem, which is a problem of under-fetching.

The Optimizer🔗

Undine includes an optimizer that fixes the above problems automatically by introspecting the incoming query and generating the appropriate QuerySet optimizations like prefetch_related and select_related calls. It plugs into the top-level resolvers in the schema, so in the above example, the resolver for the tasks entrypoint, and makes the necessary optimizations to reduce the amount of database queries made. This way all subfields can resolve normally, knowing that the data they need is already fetched.

For the most part, this is all you need to know about the optimizer. However, there are a few things you need to know to not break these optimizations.

Be careful when overriding Field resolvers. If you define a custom resolver for model field which uses model data outside of the field itself, those fields may not have been fetched if they are not also part of the query. More generally, you need to be careful when using models outside of the GraphQL context. A common place where this may happen is in permission checks, which often need to access model data for object permissions, etc.

To deal with this, Undine includes methods to specify additional optimizations manually, see the Manual Optimizations section below.

Manual Optimizations🔗

Undine includes two hooks for adding additional manual optimizations your schema.

from undine import GQLInfo, QueryType
from undine.optimizer import OptimizationData

from .models import Project, Step, Task


class ProjectType(QueryType[Project]): ...


class TaskType(QueryType[Task]):
    @classmethod
    def __optimizations__(cls, data: OptimizationData, info: GQLInfo) -> None:
        data.only_fields.add("name")
        data.add_select_related("project", query_type=ProjectType)
        data.add_prefetch_related("steps", query_type=StepType)


class StepType(QueryType[Step]): ...

The QueryType.__optimizations__ method is called by the optimizer when it encounters a new QueryType during optimizations. See order of optimizations for when this is called. See optimization data for how optimization itself works.

from undine import Field, GQLInfo, QueryType
from undine.optimizer import OptimizationData

from .models import Project, Step, Task


class ProjectType(QueryType[Project]): ...


class TaskType(QueryType[Task]):
    name = Field()

    @name.optimize
    def optimize_name(self, data: OptimizationData, info: GQLInfo) -> None:
        data.only_fields.add("name")


class StepType(QueryType[Step]): ...

The <field_name>.optimize method is called by the optimizer when it encounters a new Field during optimizations. See order of optimizations for when this is called. See optimization data for how optimization itself works.

Optimization data🔗

The OptimizationData object holds the optimizations that the optimizer has gathered from the query. You can add new optimizations to the data to ensure that, e.g., required fields are fetched, even if they are otherwise not needed in the query. Let's go over the structure of the OptimizationData object.

model🔗

This is the model class which the optimizations in the data correspond to.

info🔗

The resolver info object for the request, as it applies for this OptimizationData. During field resolving, the field_name, field_nodes, return_type and parent_type of the resolver info object are different depending on the ObjectType being resolved, so each OptimizationData needs to know how the resolver info would look when its optimizations are needed. Various methods in Undine get passed this info object so that users of the library can use it do their own introspections.

The model field being optimized. Can be None if the OptimizationData is for the root-level.

parent🔗

If the OptimizationData is for a related model, this links to the optimization data of the parent model. Conversely, the parent OptimizationData has a link this OptimizationData using either select_related, prefetch_related or generic_prefetches.

only_fields🔗

Contains fields that will be applied to QuerySet.only(). This prevents the over-fetching issue by only fetching the required fields for the query.

aliases🔗

Contains the expressions that will be applied to QuerySet.alias(). Various methods in Undine can add to these aliases to enable more clearer use of for annotations.

annotations🔗

Contains the expressions that will be applied to QuerySet.annotate(). Fields that resolve using an expression will store the expression here.

Contains OptimizationData for related fields that should be fetched together using QuerySet.select_related(). New related fields should be added using add_select_related to ensure that the correct references are places in both OptimizationData.

Contains OptimizationData for related fields that should be fetched together using QuerySet.prefetch_related(). New related fields should be added using add_prefetch_related to ensure that the correct references are places in both OptimizationData.

Note that the key in the mapping can be either the name of the related field, or an alias that the data should be fetched with (using Prefetch(..., to_attr=<alias>)).

generic_prefetches🔗

Contains OptimizationData for generic foreign keys that should be fetched together using QuerySet.prefetch_related(). New generic prefetches should be added using add_generic_prefetch_related to ensure that the correct references are places in both OptimizationData.

filters🔗

Contains Q expressions that will be applied to QuerySet.filter(). Normally, these are compiled from a FilterSet.

order_by🔗

Contains OrderBy expressions that will be applied to QuerySet.order_by(). Normally, these are compiled from an OrderSet.

distinct🔗

Whether QuerySet.distinct() should be applied. Normally, the optimizer is able to determine this based on the FilterSet Filters used in the query.

none🔗

Whether QuerySet.none() should be applied. Note that using QuerySet.none() will result in an empty QuerySet regardless of other optimizations. Normally, this is only applied if a FilterSet Filter raises an EmptyFilterResult exception.

pagination🔗

Contains the pagination information for the QuerySet in the form of a PaginationHandler object. Normally, this is set by the optimizer automatically based on if the field uses a Connection or not.

queryset_callback🔗

A callback function that initializes the QuerySet for the OptimizationData. By default, this is set to use the Manager.get_queryset() method of the OptimizationData's model's default manager, or the QueryType.__get_queryset__ method for related fields to other QueryTypes.

pre_filter_callback🔗

A callback function that will be called before order_by, distinct, filters, or field_calculations are applied to the QuerySet. Normally, this is populated using the QueryType.__filter_queryset__ method, if it has been overridden from the default.

post_filter_callback🔗

A callback function that will be called after order_by, distinct, filters, and field_calculations are applied to the QuerySet. Normally, this is populated using the FilterSet.__filter_queryset__ method, if it has been overridden from the default.

field_calculations🔗

A list of Calculation instances that should be run and annotated to the QuerySet. Normally, the optimizer will automatically add Fields using Calculation objects to this list.

A method for adding a new select_related optimization. Must provide the field_name for the model relation, and optionally a QueryType that the relation should use.

Passing the QueryType will fill the queryset_callback, pre_filter_callback, and post_filter_callback from the QueryType automatically. Otherwise, the method will make sure that the created select_related optimization has the correct references to its parent OptimizationData, which it needs so that it can be compiled correctly.

A method for adding a new prefetch_related optimization. Must provide the field_name for the model relation, and optionally a QueryType that the relation should use, as well as a to_attr for the prefetch alias.

Passing the QueryType will fill the queryset_callback, pre_filter_callback, and post_filter_callback from the QueryType automatically. Otherwise, the method will make sure that the created prefetch_related optimization has the correct references to its parent OptimizationData, which it needs so that it can be compiled correctly.

The string passed in to_attr will be used in Prefetch(..., to_attr=<to_attr>), which will prefetch the related field to the given attribute name. Normally, the optimizer uses this to fetch "to-many" relations to aliases of custom schema names.

A method for adding a new generic_prefetch_related optimization. Must provide the field_name for the model relation, the related_model that the generic prefetch should be done for, and optionally a QueryType that the relation should use, and to_attr for the prefetch alias.

Passing the QueryType will fill the queryset_callback, pre_filter_callback, and post_filter_callback from the QueryType automatically. Otherwise, the method will make sure that the created generic_prefetch_related optimization has the correct references to its parent OptimizationData, which it needs so that it can be compiled correctly.

The string passed in to_attr will be used in GenericPrefetch(..., to_attr=<to_attr>), which will prefetch the related field to the given attribute name. Normally, the optimizer uses this to fetch "to-many" relations to aliases of custom schema names.

Optimization results🔗

Once the whole query has been analyzed, the OptimizationData is processed to OptimizationResults, which can then be applied to the QuerySet. OptimizationData processing simply copies over most of the data from the OptimizationData, but notably, it also converts any select_related, prefetch_related, or generic_prefetch_related optimizations to values that can be applied to a QuerySet.

For select_related, this means either promotion to a prefetch, or extending the related field's optimizations to the parent model (e.g. querying the name of a Project related to a Task will extend the name lookups to the Task model: project__name, and add it to the Task model's OptimizationResults.only_fields).

For prefetch_related, the prefetch OptimizationResults are processed and applied to the queryset from taken from OptimizationResults.queryset_callback. The resulting Prefetch() object is added to the parent OptimizationResults.prefetch_related.

generic_prefetch_related is processed similarly to prefetch_related, expect a GenericPrefetch() object is created instead.

Promotion to prefetch🔗

In certain cases, a select_related optimization must be promoted to a prefetch_related. This can happen for one of the following reasons:

  1. Any annotations (or aliases) are requested from the relation. A prefetch must be made so that the annotation remains available in the related object.
  2. Any field_calculations are present. Calculation will become annotations, so the reason is the same as above.
  3. A pre_filter_callback or post_filter_callback is needed. Since these callbacks might filter out the related object, a prefetch must be done to ensure this. Note that this might result in a null value for a field that would otherwise not be null!

Order of optimizations🔗

The order in which the optimizations are applied is as follows:

  1. If none is True, return an empty QuerySet and exit early.
  2. If select_related is not empty, apply select_related optimizations using QuerySet.select_related().
  3. If prefetch_related is not empty, apply prefetch_related optimizations using QuerySet.prefetch_related().
  4. If only_fields is not empty, and apply the DISABLE_ONLY_FIELDS_OPTIMIZATION setting is not False, only_fields optimizations using QuerySet.only().
  5. If aliases is not empty, apply aliases optimizations using QuerySet.alias().
  6. If annotations is not empty, apply annotations optimizations using QuerySet.annotate().
  7. If pre_filter_callback exists, call it.
  8. If order_by is not empty, apply order_by optimizations using QuerySet.order_by().
  9. If distinct is True, apply QuerySet.distinct().
  10. If field_calculations is not empty, run the Calculation and annotate the result to the QuerySet using the Calculation's __field_name__.
  11. If filters is not empty, apply filters optimizations using QuerySet.filter()
  12. If post_filter_callback exists, call it.
  13. If pagination is not empty, run either pagination.paginate_queryset() or pagination.paginate_prefetch_queryset() depending on whether a related_field exists or not.
  14. Return the optimized QuerySet.