Speed up JSONRenderer.extract_included #412

amw · 2018-02-15T20:22:01Z

I've hit a nasty performance issue with one of my index-type API endpoints. It returns only about 600 elements each with 3 additional resources included (one of them nested under another). The included resources are widely shared (loaded with prefetch_related). All database queries for this request taken together execute for less than 10ms, but the entire request takes over 2 seconds to complete. That's not reasonable at all.

Turns out main view code was taking "only" about 500ms (still a lot for this amount of data) while JSONRenderer.render was taking almost 2 seconds.

In this PR I am introducing some optimizations to JSONRenderer.extract_included – instead of serializing all related resources and then deduplicating them near the end, I pass a cache of serialized resources and skip processing when encountering the same resource.

Here are some measurements I've put around the render method. Before my changes:

Successfully completed rendering in 1.807s
Successfully completed rendering in 1.790s
Successfully completed rendering in 1.788s
Successfully completed rendering in 1.809s

After my changes:

Successfully completed rendering in 0.465s
Successfully completed rendering in 0.460s
Successfully completed rendering in 0.463s
Successfully completed rendering in 0.463s

Just to give the idea of how much time extract_included takes for my request and what is the ceiling for our optimizations I've ran another test with extract_included short-circuited to return immediately without extracting anything:

Successfully completed rendering in 0.054s
Successfully completed rendering in 0.056s
Successfully completed rendering in 0.054s
Successfully completed rendering in 0.054s

Warning

Optimization in this PR is aggressive and there is a theoretical scenario where it could fail to extract some resources. It is related to nested included resources, e.g. ["contributors", "author.opinions"]. If author happened to be first extracted via a different path (like contributors) then their related opinions would never get extracted.

I do not run this risk, because I don't include the same resource type via multiple paths in a single endpoint. I think this risk is acceptable for this library overall. Including one resource type via multiple paths in a single endpoint is already unsafe operation today – if you were to use different Serializer classes for these different paths then only one serialization would be kept for included key at the end of rendering. The versions of the same serialized resource would be discarded as long as they have the same type and id values.

Caveat

I have focused on optimizing extraction via a simple ResourceRelatedField. There are more optimizations that can be made for ListSerializer and the included_cache dictionary that I have introduced paves the way for others to implement them.

mblayman · 2018-02-15T21:59:31Z

@amw I'm definitely open to this fix. I think the warning you described sounds like acceptable risk to me. Looks like there is an isort error in the CI build (I think it's an ASCII sorting issue thing). I'm happy to review once Travis is green.

amw · 2018-02-15T22:22:51Z

I've sent a fix for isort, but Travis is still complaining – I don't see about what though – all the logs seem empty.

mblayman · 2018-02-16T01:24:10Z

Looks like you fixed isort. Now you have a flake8 error.

$ flake8
./rest_framework_json_api/renderers.py:318:101: E501 line too long (103 > 100 characters)
./rest_framework_json_api/renderers.py:401:33: E126 continuation line over-indented for hanging indent
./rest_framework_json_api/renderers.py:406:29: E121 continuation line under-indented for hanging indent
./rest_framework_json_api/renderers.py:424:29: E126 continuation line over-indented for hanging indent
The command "flake8" exited with 1.

codecov-io · 2018-02-16T10:33:14Z

Codecov Report

Merging #412 into master will decrease coverage by 0.03%.
The diff coverage is 95.83%.

@@            Coverage Diff             @@
##           master     #412      +/-   ##
==========================================
- Coverage   91.79%   91.75%   -0.04%     
==========================================
  Files          55       55              
  Lines        2925     2923       -2     
==========================================
- Hits         2685     2682       -3     
- Misses        240      241       +1

Impacted Files	Coverage Δ
rest_framework_json_api/exceptions.py	`84.61% <100%> (+0.61%)`	⬆️
rest_framework_json_api/renderers.py	`85.32% <95.23%> (-0.49%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5d34e2...de31d66. Read the comment docs.

amw · 2018-02-16T10:35:59Z

All green.

mblayman

This looks pretty good to me. I've got a couple of stylistic changes as comments and one suggestion for the final include flattening.

I think the approach of "make a small dict cache" is solid and you've got the numbers to back it up. Nice work!

After a tiny bit of cleanup, I'm happy to merge this PR.

mblayman · 2018-02-18T01:49:48Z

rest_framework_json_api/exceptions.py



 def rendered_with_json_api(view):
+    from .renderers import JSONRenderer


Do you mind switching the import style from relative to absolute for all the imports you added in this PR? I don't know that the code is 100% consistent (it probably isn't), but there are more absolute imports than relative ones. It would be nice to push toward a single style.

mblayman · 2018-02-18T01:57:13Z

rest_framework_json_api/renderers.py

@@ -313,12 +315,12 @@ def extract_relation_instance(cls, field_name, field, resource_instance, seriali
        return relation_instance

    @classmethod
-    def extract_included(cls, fields, resource, resource_instance, included_resources):
+    def extract_included(cls, included_cache, fields, resource, resource_instance,


Why was included_cache set as the first parameter? That strikes me as an unusual choice. Generally, I see new parameters at the end. From an API perspective, having it first feels off since extract_included is making its logical decisions from fields and resource. Could you move the parameter to the end?

My thinking was that included_cache is the target of the operation, almost a subject like self and I usually place these first. But I don't care – changed as requested.

mblayman · 2018-02-18T02:03:06Z

rest_framework_json_api/renderers.py

-            )
+        if included_cache:
+            render_data['included'] = list()
+            for included_type in sorted(included_cache.keys()):


Calling .keys() on this line and the next is not necessary. sorted applies to the keys of a dictionary by default.

Alternatively, this could be compacted with a nested list comprehension:

render_data['included'] = [ instance for instance_type, type_values in sorted(instance_cache.items()) for id, instance in sorted(type_values.items())]

This works because sorted on items works on the keys. This version eliminates a bunch of append operations. There's probably a small perf gain here, but I bet it's tiny. Here's an example to illustrate:

>>> instance_cache = {'c': {'b': 6, 'a': 5}, 'a': {'b': 2, 'a': 1}, 'b': {'b': 4, 'a': 3}} >>> [instance for instance_type, type_values in sorted(instance_cache.items()) for id, instance in sorted(type_values.items())] [1, 2, 3, 4, 5, 6]

If you don't mind I prefer explicitness in code. Most people know that iterating a dictionary in Python yields keys, but using keys() makes it obvious for everyone – seniors and juniors from whatever programming language background.

I feel similarly about that nested list comprehension. I didn't know that a thing like that existed in Python, but I have to say it is very difficult to read. Seems arcane.

I can appreciate your point of view here. Thanks for at least considering it.

mblayman

👍

mblayman · 2018-02-18T13:41:26Z

rest_framework_json_api/renderers.py

-            )
+        if included_cache:
+            render_data['included'] = list()
+            for included_type in sorted(included_cache.keys()):


I can appreciate your point of view here. Thanks for at least considering it.

amw force-pushed the speed-up-extract-included branch from 6bbff46 to 2792b87 Compare February 15, 2018 22:03

amw force-pushed the speed-up-extract-included branch from 2792b87 to 2b86226 Compare February 16, 2018 10:23

mblayman reviewed Feb 18, 2018

View reviewed changes

Speed up JSONRenderer.extract_included

de31d66

amw force-pushed the speed-up-extract-included branch from 2b86226 to de31d66 Compare February 18, 2018 11:08

mblayman approved these changes Feb 18, 2018

View reviewed changes

mblayman merged commit 4ef5bb4 into django-json-api:master Feb 18, 2018

stas mentioned this pull request Aug 5, 2018

Avoid exception when trying to include skipped relationship #453

Merged

5 tasks

amw deleted the speed-up-extract-included branch August 6, 2019 10:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up JSONRenderer.extract_included #412

Speed up JSONRenderer.extract_included #412

amw commented Feb 15, 2018

mblayman commented Feb 15, 2018

amw commented Feb 15, 2018

mblayman commented Feb 16, 2018

codecov-io commented Feb 16, 2018 •

edited

Loading

amw commented Feb 16, 2018

mblayman left a comment

mblayman Feb 18, 2018

amw Feb 18, 2018

mblayman Feb 18, 2018

amw Feb 18, 2018

mblayman Feb 18, 2018

mblayman Feb 18, 2018

amw Feb 18, 2018

mblayman Feb 18, 2018

mblayman left a comment

mblayman Feb 18, 2018



		def rendered_with_json_api(view):
		from .renderers import JSONRenderer

Speed up JSONRenderer.extract_included #412

Speed up JSONRenderer.extract_included #412

Conversation

amw commented Feb 15, 2018

Warning

Caveat

mblayman commented Feb 15, 2018

amw commented Feb 15, 2018

mblayman commented Feb 16, 2018

codecov-io commented Feb 16, 2018 • edited Loading

Codecov Report

amw commented Feb 16, 2018

mblayman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mblayman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Feb 16, 2018 •

edited

Loading