Skip to content

fix(openai): add cache read tokens from returned usage block #2820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 18, 2025

Conversation

dinmukhamedm
Copy link
Contributor

@dinmukhamedm dinmukhamedm commented Apr 18, 2025

#2819

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Important

Adds tracking of cache read tokens in OpenAI instrumentation and updates tests to verify this behavior.

  • Behavior:
    • Updates _set_response_attributes() in __init__.py to track cached_tokens from prompt_tokens_details in OpenAI responses.
  • Tests:
    • Adds test_openai_prompt_caching.py to test cache creation and reading for both sync and async OpenAI clients.
    • Includes new test data in 1024+tokens.txt and updates VCR cassettes in test_openai_prompt_caching.yaml and test_openai_prompt_caching_async.yaml.

This description was created by Ellipsis for f694c8c. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to f694c8c in 1 minute and 19 seconds

More details
  • Looked at 1050 lines of code in 5 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 drafted comments based on config settings.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py:188
  • Draft comment:
    New cache read tokens attribute is added correctly with a default fallback of 0, ensuring graceful handling when 'prompt_tokens_details' is missing.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    This comment is purely informative, as it is simply stating that a new attribute is added correctly with a default fallback. It does not provide any actionable feedback or suggestions for improvement.
2. packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py:54
  • Draft comment:
    Test assertions now verify new attribute 'gen_ai.usage.cache_read_input_tokens' for both creation and cache read spans. The expected values are asserted correctly.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    This comment is purely informative, as it only states that test assertions now verify a new attribute and that the expected values are asserted correctly. It does not provide any actionable feedback or suggestions for improvement.
3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py:188
  • Draft comment:
    Good addition of extracting the cache read token count from the usage block. Consider explicitly casting the value to an integer (if not already) or validating its type, to guard against unexpected non-numeric values.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    Looking at the rest of the file, I see that similar usage metrics like total_tokens, completion_tokens, and prompt_tokens don't use explicit integer casting. The _set_span_attribute function seems to handle various value types. The default value of 0 already ensures we get an integer when the key doesn't exist. Without evidence that non-integer values are actually possible here, this seems like unnecessary defensive programming.
    I could be wrong about the OpenAI API contract - maybe it's possible for cached_tokens to be returned as a string or float in some cases. Also, the other token counts might be implicitly validated elsewhere.
    However, without concrete evidence that non-integer values are possible or that they would cause problems, adding type casting would be speculative defensive programming. The existing code follows the established pattern used for other token counts.
    The comment should be removed as it suggests a speculative change without evidence of an actual issue. The current code follows the established pattern used for other similar metrics.
4. packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py:54
  • Draft comment:
    The tests now explicitly assert the new 'cache_read_input_tokens' attribute for both cached and non-cached responses. This effectively verifies the added functionality.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    This comment is purely informative, as it praises the addition of tests without suggesting any changes or improvements. It doesn't provide any actionable feedback or raise any concerns about the code.
5. packages/opentelemetry-instrumentation-openai/tests/data/1024+tokens.txt:2
  • Draft comment:
    Consider changing 'YCombinator-backed' to 'Y Combinator-backed' for improved readability.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    Since this is a test data file, the exact formatting of 'YCombinator' vs 'Y Combinator' is likely irrelevant to the actual functionality being tested. The file appears to be used for testing token counting or similar features, where such minor formatting differences wouldn't impact the test's purpose. Making purely stylistic suggestions on test fixture data is not valuable.
    Perhaps the consistent branding of Y Combinator is important even in test files for professionalism. Maybe the test file could be used as documentation or example that others will read.
    While proper branding is good practice, this is clearly just test data meant to provide sample text with a certain length/characteristics. The exact content and formatting is not important for its testing purpose.
    Delete this comment as it suggests a purely stylistic change to test fixture data that has no impact on functionality.

Workflow ID: wflow_uK0Wr9iRHFFkubqF


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on the tests as well :)

@nirga nirga merged commit d096c9a into traceloop:main Apr 18, 2025
9 checks passed
@dinmukhamedm dinmukhamedm deleted the openai-caching-usage branch April 18, 2025 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants