-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[tailsamplingprocessor] Fix the decision timer metric to measure > 50ms #37722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tailsamplingprocessor] Fix the decision timer metric to measure > 50ms #37722
Conversation
…er latencies beyond 50ms. Previously the decision timer metric was in microseconds. Because the max histogram bucket was 50000, it capped the maximum measurable latency at 50ms. This commit changes the metric to be in milliseconds, which allows for measuring latencies of up to 50 seconds. Since the default decision tick is 1s, it's necessary to observe when the latency is approaching 1s. I'm not sure how big of a breaking change it is to update the unit, but given the previous value was not super useful, I thought it was better to change it.
I've added |
# Use this changelog template to create an entry for release notes. | ||
|
||
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' | ||
change_type: bug_fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the fence, do we consider this breaking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking more, it does feel like a breaking change. Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only nit is related to the change type, bug_fix vs breaking.
Once the merge conflict is resolved, this is ready to go. |
@jpkrohling Thanks, I think it's good to go! |
Description
Previously the decision timer metric was in microseconds. Because the max histogram bucket was 50000, it capped the maximum measurable latency at 50ms. This commit changes the metric to be in milliseconds, which allows for measuring latencies of up to 50 seconds.
Since the default decision tick is 1s, it's necessary to observe when the latency is approaching 1s.
I'm not sure how big of a breaking change it is to update the unit, but given the previous value was not super useful, I thought it was better to change it.
Here's an example of the current state:

Notice how the average latency is running much higher than the p99, indicating that we are missing important data on how slow the decision tick actually is.