Optimizer🔗
This section covers Undine's query optimizer, which is responsible for optimizing queries to your GraphQL schema in order to reduce the amount of database queries that are made when resolving a request.
The Problems🔗
Before we take a look at how the optimizer works, let's first understand why it exists by going over some common problems that arise when using GraphQL to fetch data from a relational database.
The N+1 Problem🔗
Let's say you have a collection of models like this:
And a schema like this:
Now, let's say you query tasks
like this:
In GraphQL, each field in the query will be resolved separately, and most importantly, if a field returns a list of objects with subfields, the resolvers for those subfields will be called for each object in the list. Normally, a field's resolver knows nothing about the query it is in, and so it only fetches the data it needs.
In this case, the top level resolver for tasks
will fetch all Task
objects,
but won't make any joins to related models its subfields might need. For example,
when the subfield for the related Project
resolves, that resolver will
try to look up the Project
from the root Task
instance it received, but since the
Project
was not fetched along with the Task
, it needs to make another query to the database.
This means that for the whole query we first fetch all Tasks
, and then all Projects
and Steps
for each Task
. If we had 100 Tasks
, each of which is linked to a Project
,
but also to 10 Steps
. In total this would result in 201 queries to the database!
It's important to notice that the amount of queries is proportional to the amount of Tasks
in the database.
You can imagine how this can get out of hand quickly, especially when you start nesting
relations deeper and deeper. Each additional level of nesting will grow the number of queries
exponentially! That's why it's called the N+1 problem: 1 query for the root object,
and N queries for all subfields, where N is the amount of root objects.
Over-fetching🔗
Another issue with resolving Django models using normal resolvers is that when a model is fetched from the database, all of its non-relational fields are also fetched by default. This means that we'll fetch fields that are not needed in the query. This can be expensive if the model has many fields or fields that contain a lot of data. This is called over-fetching, in contrast to the N+1 problem, which is a problem of under-fetching.
The Optimizer🔗
Undine includes an optimizer that fixes the above problems automatically
by introspecting the incoming query and generating the appropriate
QuerySet
optimizations like prefetch_related
and select_related
calls.
It plugs into the top-level resolvers in the schema, so in the above example,
the resolver for the tasks
entrypoint, and makes the necessary optimizations
to reduce the amount of database queries made. This way all subfields can resolve normally,
knowing that the data they need is already fetched.
For the most part, this is all you need to know about the optimizer. However, there are a few things you need to know to not break these optimizations.
Be careful when overriding Field
resolvers. If you define a custom resolver
for model field which uses model data outside of the field itself,
those fields may not have been fetched if they are not also part of the query.
More generally, you need to be careful when using models outside of the GraphQL context.
A common place where this may happen is in permission checks, which often need to access
model data for object permissions, etc.
To deal with this, Undine includes methods to specify additional optimizations manually, see the Manual Optimizations section below.
Manual Optimizations🔗
Undine includes two hooks for adding additional manual optimizations your schema.
The QueryType.__optimizations__
method is called by the optimizer when it encounters
a new QueryType
during optimizations. See order of optimizations
for when this is called. See optimization data for how optimization itself works.
The <field_name>.optimize
method is called by the optimizer when it encounters
a new Field
during optimizations. See order of optimizations
for when this is called. See optimization data for how optimization itself works.
Optimization data🔗
The OptimizationData
object holds the optimizations that the optimizer has gathered
from the query. You can add new optimizations to the data to ensure that, e.g., required fields
are fetched, even if they are otherwise not needed in the query. Let's go over the structure of
the OptimizationData
object.
model
🔗
This is the model class which the optimizations in the data correspond to.
info
🔗
The resolver info object for the request, as it applies for this OptimizationData
.
During field resolving, the field_name
, field_nodes
, return_type
and parent_type
of the resolver info object are different depending on the ObjectType
being resolved,
so each OptimizationData
needs to know how the resolver info would look when its
optimizations are needed. Various methods in Undine
get passed this info
object
so that users of the library can use it do their own introspections.
related_field
🔗
The model field being optimized. Can be None
if the OptimizationData
is for the root-level.
parent
🔗
If the OptimizationData
is for a related model, this links to the
optimization data of the parent model. Conversely, the parent
OptimizationData
has a link this OptimizationData
using either select_related
,
prefetch_related
or generic_prefetches
.
only_fields
🔗
Contains fields that will be applied to QuerySet.only()
. This prevents the
over-fetching issue by only fetching the required fields for the query.
aliases
🔗
Contains the expressions that will be applied to QuerySet.alias()
. Various
methods in Undine
can add to these aliases to enable more clearer use of
for annotations
.
annotations
🔗
Contains the expressions that will be applied to QuerySet.annotate()
.
Fields
that resolve using an expression will store the expression here.
select_related
🔗
Contains OptimizationData
for related fields that should be fetched together
using QuerySet.select_related()
. New related fields should be added using
add_select_related
to ensure that the correct references
are places in both OptimizationData
.
prefetch_related
🔗
Contains OptimizationData
for related fields that should be fetched together
using QuerySet.prefetch_related()
. New related fields should be added using
add_prefetch_related
to ensure that the correct references
are places in both OptimizationData
.
Note that the key in the mapping can be either the name of the related field,
or an alias that the data should be fetched with (using Prefetch(..., to_attr=<alias>)
).
generic_prefetches
🔗
Contains OptimizationData
for generic foreign keys that should be fetched together
using QuerySet.prefetch_related()
. New generic prefetches should be added using
add_generic_prefetch_related
to ensure that the correct references
are places in both OptimizationData
.
filters
🔗
Contains Q
expressions that will be applied to QuerySet.filter()
.
Normally, these are compiled from a FilterSet
.
order_by
🔗
Contains OrderBy
expressions that will be applied to QuerySet.order_by()
.
Normally, these are compiled from an OrderSet
.
distinct
🔗
Whether QuerySet.distinct()
should be applied. Normally, the optimizer is able
to determine this based on the FilterSet
Filters
used in the query.
none
🔗
Whether QuerySet.none()
should be applied. Note that using QuerySet.none()
will result in an empty QuerySet
regardless of other optimizations.
Normally, this is only applied if a FilterSet
Filter
raises an
EmptyFilterResult
exception.
pagination
🔗
Contains the pagination information for the QuerySet
in the form of a
PaginationHandler
object. Normally, this is set by the optimizer automatically
based on if the field uses a Connection
or not.
queryset_callback
🔗
A callback function that initializes the QuerySet
for the OptimizationData
.
By default, this is set to use the Manager.get_queryset()
method of the
OptimizationData's
model's
default manager, or the
QueryType.__get_queryset__
method for related fields to other QueryTypes
.
pre_filter_callback
🔗
A callback function that will be called before order_by
, distinct
,
filters
, or field_calculations
are applied to the
QuerySet
. Normally, this is populated using the QueryType.__filter_queryset__
method,
if it has been overridden from the default.
post_filter_callback
🔗
A callback function that will be called after order_by
, distinct
,
filters
, and field_calculations
are applied to the
QuerySet
. Normally, this is populated using the FilterSet.__filter_queryset__
method,
if it has been overridden from the default.
field_calculations
🔗
A list of Calculation
instances that should be run and annotated to the QuerySet
.
Normally, the optimizer will automatically add Fields
using Calculation
objects
to this list.
add_select_related()
🔗
A method for adding a new select_related
optimization. Must provide the
field_name
for the model relation, and optionally a QueryType
that the relation
should use.
Passing the QueryType
will fill the queryset_callback
,
pre_filter_callback
, and post_filter_callback
from the QueryType
automatically. Otherwise, the method will make sure that the created
select_related
optimization has the correct references to its parent OptimizationData
,
which it needs so that it can be compiled correctly.
add_prefetch_related()
🔗
A method for adding a new prefetch_related
optimization. Must provide the
field_name
for the model relation, and optionally a QueryType
that the relation
should use, as well as a to_attr
for the prefetch alias.
Passing the QueryType
will fill the queryset_callback
,
pre_filter_callback
, and post_filter_callback
from the QueryType
automatically. Otherwise, the method will make sure that the created
prefetch_related
optimization has the correct references to its parent OptimizationData
,
which it needs so that it can be compiled correctly.
The string passed in to_attr
will be used in Prefetch(..., to_attr=<to_attr>)
,
which will prefetch the related field to the given attribute name. Normally, the optimizer
uses this to fetch "to-many" relations to aliases of custom schema names.
add_generic_prefetch_related()
🔗
A method for adding a new generic_prefetch_related
optimization. Must provide the
field_name
for the model relation, the related_model
that the generic prefetch
should be done for, and optionally a QueryType
that the relation should use,
and to_attr
for the prefetch alias.
Passing the QueryType
will fill the queryset_callback
,
pre_filter_callback
, and post_filter_callback
from the QueryType
automatically. Otherwise, the method will make sure that the created
generic_prefetch_related
optimization has the correct references to its parent OptimizationData
,
which it needs so that it can be compiled correctly.
The string passed in to_attr
will be used in GenericPrefetch(..., to_attr=<to_attr>)
,
which will prefetch the related field to the given attribute name. Normally, the optimizer
uses this to fetch "to-many" relations to aliases of custom schema names.
Optimization results🔗
Once the whole query has been analyzed, the OptimizationData
is processed to
OptimizationResults
, which can then be applied to the QuerySet
. OptimizationData
processing simply copies over most of the data from the OptimizationData
, but notably,
it also converts any select_related
, prefetch_related
, or generic_prefetch_related
optimizations to values that can be applied to a QuerySet
.
For select_related
, this means either promotion to a prefetch,
or extending the related field's optimizations to the parent model (e.g. querying the name
of a Project
related to a Task
will extend the name
lookups to the Task
model: project__name
,
and add it to the Task
model's OptimizationResults.only_fields
).
For prefetch_related
, the prefetch OptimizationResults
are processed
and applied to the queryset from taken from OptimizationResults.queryset_callback
.
The resulting Prefetch()
object is added to the parent OptimizationResults.prefetch_related
.
generic_prefetch_related
is processed similarly to prefetch_related
, expect a GenericPrefetch()
object is created instead.
Promotion to prefetch🔗
In certain cases, a select_related
optimization must be promoted to a prefetch_related
.
This can happen for one of the following reasons:
- Any
annotations
(oraliases
) are requested from the relation. A prefetch must be made so that the annotation remains available in the related object. - Any
field_calculations
are present. Calculation will become annotations, so the reason is the same as above. - A
pre_filter_callback
orpost_filter_callback
is needed. Since these callbacks might filter out the related object, a prefetch must be done to ensure this. Note that this might result in a null value for a field that would otherwise not be null!
Order of optimizations🔗
The order in which the optimizations are applied is as follows:
- If
none
isTrue
, return an emptyQuerySet
and exit early. - If
select_related
is not empty, applyselect_related
optimizations usingQuerySet.select_related()
. - If
prefetch_related
is not empty, applyprefetch_related
optimizations usingQuerySet.prefetch_related()
. - If
only_fields
is not empty, and apply theDISABLE_ONLY_FIELDS_OPTIMIZATION
setting is not False,only_fields
optimizations usingQuerySet.only()
. - If
aliases
is not empty, applyaliases
optimizations usingQuerySet.alias()
. - If
annotations
is not empty, applyannotations
optimizations usingQuerySet.annotate()
. - If
pre_filter_callback
exists, call it. - If
order_by
is not empty, applyorder_by
optimizations usingQuerySet.order_by()
. - If
distinct
isTrue
, applyQuerySet.distinct()
. - If
field_calculations
is not empty, run theCalculation
and annotate the result to theQuerySet
using theCalculation's
__field_name__
. - If
filters
is not empty, applyfilters
optimizations usingQuerySet.filter()
- If
post_filter_callback
exists, call it. - If
pagination
is not empty, run eitherpagination.paginate_queryset()
orpagination.paginate_prefetch_queryset()
depending on whether arelated_field
exists or not. - Return the optimized
QuerySet
.