fix(openai): add cache read tokens from returned usage block #2820

dinmukhamedm · 2025-04-18T16:02:24Z

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

Important

Adds tracking of cache read tokens in OpenAI instrumentation and updates tests to verify this behavior.

Behavior:
- Updates _set_response_attributes() in __init__.py to track cached_tokens from prompt_tokens_details in OpenAI responses.
Tests:
- Adds test_openai_prompt_caching.py to test cache creation and reading for both sync and async OpenAI clients.
- Includes new test data in 1024+tokens.txt and updates VCR cassettes in test_openai_prompt_caching.yaml and test_openai_prompt_caching_async.yaml.

^{This description was created by}^{for f694c8c. It will automatically update as commits are pushed.}

ellipsis-dev

👍 Looks good to me! Reviewed everything up to f694c8c in 1 minute and 19 seconds

More details

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py:188

Draft comment:
New cache read tokens attribute is added correctly with a default fallback of 0, ensuring graceful handling when 'prompt_tokens_details' is missing.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
This comment is purely informative, as it is simply stating that a new attribute is added correctly with a default fallback. It does not provide any actionable feedback or suggestions for improvement.

2. packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py:54

Draft comment:
Test assertions now verify new attribute 'gen_ai.usage.cache_read_input_tokens' for both creation and cache read spans. The expected values are asserted correctly.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
This comment is purely informative, as it only states that test assertions now verify a new attribute and that the expected values are asserted correctly. It does not provide any actionable feedback or suggestions for improvement.

3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py:188

Draft comment:
Good addition of extracting the cache read token count from the usage block. Consider explicitly casting the value to an integer (if not already) or validating its type, to guard against unexpected non-numeric values.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
Looking at the rest of the file, I see that similar usage metrics like total_tokens, completion_tokens, and prompt_tokens don't use explicit integer casting. The _set_span_attribute function seems to handle various value types. The default value of 0 already ensures we get an integer when the key doesn't exist. Without evidence that non-integer values are actually possible here, this seems like unnecessary defensive programming.
I could be wrong about the OpenAI API contract - maybe it's possible for cached_tokens to be returned as a string or float in some cases. Also, the other token counts might be implicitly validated elsewhere.
However, without concrete evidence that non-integer values are possible or that they would cause problems, adding type casting would be speculative defensive programming. The existing code follows the established pattern used for other token counts.
The comment should be removed as it suggests a speculative change without evidence of an actual issue. The current code follows the established pattern used for other similar metrics.

4. packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py:54

Draft comment:
The tests now explicitly assert the new 'cache_read_input_tokens' attribute for both cached and non-cached responses. This effectively verifies the added functionality.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
This comment is purely informative, as it praises the addition of tests without suggesting any changes or improvements. It doesn't provide any actionable feedback or raise any concerns about the code.

5. packages/opentelemetry-instrumentation-openai/tests/data/1024+tokens.txt:2

Draft comment:
Consider changing 'YCombinator-backed' to 'Y Combinator-backed' for improved readability.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
Since this is a test data file, the exact formatting of 'YCombinator' vs 'Y Combinator' is likely irrelevant to the actual functionality being tested. The file appears to be used for testing token counting or similar features, where such minor formatting differences wouldn't impact the test's purpose. Making purely stylistic suggestions on test fixture data is not valuable.
Perhaps the consistent branding of Y Combinator is important even in test files for professionalism. Maybe the test file could be used as documentation or example that others will read.
While proper branding is good practice, this is clearly just test data meant to provide sample text with a certain length/characteristics. The exact content and formatting is not important for its testing purpose.
Delete this comment as it suggests a purely stylistic change to test fixture data that has no impact on functionality.

Workflow ID: wflow_uK0Wr9iRHFFkubqF

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

nirga

Nice work on the tests as well :)

fix(openai): add cache read tokens from returned usage block

f694c8c

ellipsis-dev bot reviewed Apr 18, 2025

View reviewed changes

fix lint

fd0c911

nirga approved these changes Apr 18, 2025

View reviewed changes

nirga merged commit d096c9a into traceloop:main Apr 18, 2025
9 checks passed

dinmukhamedm deleted the openai-caching-usage branch April 18, 2025 17:43

Provide feedback