Skip to content

Speed up JSONRenderer.extract_included #412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 18, 2018

Conversation

amw
Copy link
Contributor

@amw amw commented Feb 15, 2018

I've hit a nasty performance issue with one of my index-type API endpoints. It returns only about 600 elements each with 3 additional resources included (one of them nested under another). The included resources are widely shared (loaded with prefetch_related). All database queries for this request taken together execute for less than 10ms, but the entire request takes over 2 seconds to complete. That's not reasonable at all.

Turns out main view code was taking "only" about 500ms (still a lot for this amount of data) while JSONRenderer.render was taking almost 2 seconds.

In this PR I am introducing some optimizations to JSONRenderer.extract_included – instead of serializing all related resources and then deduplicating them near the end, I pass a cache of serialized resources and skip processing when encountering the same resource.

Here are some measurements I've put around the render method. Before my changes:

Successfully completed rendering in 1.807s
Successfully completed rendering in 1.790s
Successfully completed rendering in 1.788s
Successfully completed rendering in 1.809s

After my changes:

Successfully completed rendering in 0.465s
Successfully completed rendering in 0.460s
Successfully completed rendering in 0.463s
Successfully completed rendering in 0.463s

Just to give the idea of how much time extract_included takes for my request and what is the ceiling for our optimizations I've ran another test with extract_included short-circuited to return immediately without extracting anything:

Successfully completed rendering in 0.054s
Successfully completed rendering in 0.056s
Successfully completed rendering in 0.054s
Successfully completed rendering in 0.054s

Warning

Optimization in this PR is aggressive and there is a theoretical scenario where it could fail to extract some resources. It is related to nested included resources, e.g. ["contributors", "author.opinions"]. If author happened to be first extracted via a different path (like contributors) then their related opinions would never get extracted.

I do not run this risk, because I don't include the same resource type via multiple paths in a single endpoint. I think this risk is acceptable for this library overall. Including one resource type via multiple paths in a single endpoint is already unsafe operation today – if you were to use different Serializer classes for these different paths then only one serialization would be kept for included key at the end of rendering. The versions of the same serialized resource would be discarded as long as they have the same type and id values.

Caveat

I have focused on optimizing extraction via a simple ResourceRelatedField. There are more optimizations that can be made for ListSerializer and the included_cache dictionary that I have introduced paves the way for others to implement them.

@mblayman
Copy link
Collaborator

@amw I'm definitely open to this fix. I think the warning you described sounds like acceptable risk to me. Looks like there is an isort error in the CI build (I think it's an ASCII sorting issue thing). I'm happy to review once Travis is green.

@amw amw force-pushed the speed-up-extract-included branch from 6bbff46 to 2792b87 Compare February 15, 2018 22:03
@amw
Copy link
Contributor Author

amw commented Feb 15, 2018

I've sent a fix for isort, but Travis is still complaining – I don't see about what though – all the logs seem empty.

@mblayman
Copy link
Collaborator

Looks like you fixed isort. Now you have a flake8 error.

$ flake8
./rest_framework_json_api/renderers.py:318:101: E501 line too long (103 > 100 characters)
./rest_framework_json_api/renderers.py:401:33: E126 continuation line over-indented for hanging indent
./rest_framework_json_api/renderers.py:406:29: E121 continuation line under-indented for hanging indent
./rest_framework_json_api/renderers.py:424:29: E126 continuation line over-indented for hanging indent
The command "flake8" exited with 1.

@amw amw force-pushed the speed-up-extract-included branch from 2792b87 to 2b86226 Compare February 16, 2018 10:23
@codecov-io
Copy link

codecov-io commented Feb 16, 2018

Codecov Report

Merging #412 into master will decrease coverage by 0.03%.
The diff coverage is 95.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #412      +/-   ##
==========================================
- Coverage   91.79%   91.75%   -0.04%     
==========================================
  Files          55       55              
  Lines        2925     2923       -2     
==========================================
- Hits         2685     2682       -3     
- Misses        240      241       +1
Impacted Files Coverage Δ
rest_framework_json_api/exceptions.py 84.61% <100%> (+0.61%) ⬆️
rest_framework_json_api/renderers.py 85.32% <95.23%> (-0.49%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5d34e2...de31d66. Read the comment docs.

@amw
Copy link
Contributor Author

amw commented Feb 16, 2018

All green.

Copy link
Collaborator

@mblayman mblayman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me. I've got a couple of stylistic changes as comments and one suggestion for the final include flattening.

I think the approach of "make a small dict cache" is solid and you've got the numbers to back it up. Nice work!

After a tiny bit of cleanup, I'm happy to merge this PR.



def rendered_with_json_api(view):
from .renderers import JSONRenderer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind switching the import style from relative to absolute for all the imports you added in this PR? I don't know that the code is 100% consistent (it probably isn't), but there are more absolute imports than relative ones. It would be nice to push toward a single style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -313,12 +315,12 @@ def extract_relation_instance(cls, field_name, field, resource_instance, seriali
return relation_instance

@classmethod
def extract_included(cls, fields, resource, resource_instance, included_resources):
def extract_included(cls, included_cache, fields, resource, resource_instance,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was included_cache set as the first parameter? That strikes me as an unusual choice. Generally, I see new parameters at the end. From an API perspective, having it first feels off since extract_included is making its logical decisions from fields and resource. Could you move the parameter to the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that included_cache is the target of the operation, almost a subject like self and I usually place these first. But I don't care – changed as requested.

)
if included_cache:
render_data['included'] = list()
for included_type in sorted(included_cache.keys()):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling .keys() on this line and the next is not necessary. sorted applies to the keys of a dictionary by default.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, this could be compacted with a nested list comprehension:

render_data['included'] = [
    instance for instance_type, type_values in sorted(instance_cache.items())
             for id, instance in sorted(type_values.items())]

This works because sorted on items works on the keys. This version eliminates a bunch of append operations. There's probably a small perf gain here, but I bet it's tiny. Here's an example to illustrate:

>>> instance_cache = {'c': {'b': 6, 'a': 5}, 'a': {'b': 2, 'a': 1}, 'b': {'b': 4, 'a': 3}}
>>> [instance for instance_type, type_values in sorted(instance_cache.items()) for id, instance in sorted(type_values.items())]
[1, 2, 3, 4, 5, 6]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind I prefer explicitness in code. Most people know that iterating a dictionary in Python yields keys, but using keys() makes it obvious for everyone – seniors and juniors from whatever programming language background.

I feel similarly about that nested list comprehension. I didn't know that a thing like that existed in Python, but I have to say it is very difficult to read. Seems arcane.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can appreciate your point of view here. Thanks for at least considering it.

@amw amw force-pushed the speed-up-extract-included branch from 2b86226 to de31d66 Compare February 18, 2018 11:08
Copy link
Collaborator

@mblayman mblayman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

)
if included_cache:
render_data['included'] = list()
for included_type in sorted(included_cache.keys()):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can appreciate your point of view here. Thanks for at least considering it.

@mblayman mblayman merged commit 4ef5bb4 into django-json-api:master Feb 18, 2018
@amw amw deleted the speed-up-extract-included branch August 6, 2019 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants