-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New component: Metric Start Time Processor #37186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@quentinmit looks like you reviewed the normalizesums processor back in the day. Let me know if you have thoughts or suggestions. |
This could even be expanded to support using a metric (process_start_time) to set the start time, or the collector's start time. |
I presented this at the Collector SIG today, and most people on the call supported this functionality being added as a processor, and were supportive of this functionality being moved out of the Prometheus Receiver and GoogleCloud exporter into its own processor. There is concern that this may be a regression for some users if the Prometheus receiver stops inserting "true reset" points by default. If/when that happens, we will need to make the change with a feature gate. |
Having contributed to https://github.com/lightstep/opentelemetry-prometheus-sidecar, I would be glad to review this logic, which is documented in https://opentelemetry.io/docs/specs/otel/metrics/data-model/#cumulative-streams-handling-unknown-start-time and surrounding sections. Thanks @dashpole! |
Also happy to help review/implement this. I've been working on several changes to the metric adjuster in the prometheus receiver and would love to see it simplified |
…a featuregate This change creates an Alpha feature gate for this functionality since this can be replaced when open-telemetry#37186 is implemented.
…a featuregate This change creates an Alpha feature gate for this functionality since this can be replaced when open-telemetry#37186 is implemented.
…a featuregate This change creates an Alpha feature gate for this functionality since this can be replaced when open-telemetry#37186 is implemented.
…a featuregate This change creates an Alpha feature gate for this functionality since this can be replaced when open-telemetry#37186 is implemented.
…a featuregate This change creates an Alpha feature gate for this functionality since this can be replaced when open-telemetry#37186 is implemented.
…a featuregate This change creates an Alpha feature gate for this functionality since this can be replaced when open-telemetry#37186 is implemented.
#### Description Add initial skeleton for the metricstarttime processor. #### Link to tracking issue Part of #37186 cc @jmacd @ridwanmsharif
#### Description Add initial skeleton for the metricstarttime processor. #### Link to tracking issue Part of open-telemetry#37186 cc @jmacd @ridwanmsharif
#### Description Start time metric adjustment is still enabled by default, but can now be disabled using a feature gate. It is being replaced by the metricstarttime processor: #37186 #### Link to tracking issue Part of #37186 #### Testing I updated the unit test for counters #### Documentation Updated the README
…ceiver (#37855) #### Description Copy the "metric adjuster" from the prometheus receiver to a new "true_reset_point" strategy in the metricstarttimeprocessor #### Link to tracking issue Part of #37186 #### Testing Copied unit tests #### Documentation Updated the README. #### Notes for reviewers * I would recommend reviewing commit-by-commit. The first commit is copied verbatim, with minimal changes other than splitting into multiple files. * I changed the function signature of `AdjustMetrics` to match the processMetrics function so that it doesn't need an additional wrapper layer. * I removed the MetricsAdjuster interface, since it isn't needed (we implement the processor function for metrics now). * I had to remove the validation of job + instance being present because it didn't pass generated tests. Regardless, we should not rely on those in a generic processor.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping |
This was accepted and exists here: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/metricstarttimeprocessor Its at the very least not stale. But also, we may be able to close this issue once the feature parity is complete with the |
/label -Stale |
This is particularly annoying downside that came a bit by surprise during our migration to otel collector. For majority of this we can workaround by changing counter to gauges and all is goo. However, some metric we collect as histogram and it turned out that
Does this mean we can modify somehow otel configuration to benefit from accurate counters? Or do we need to wait until this issue is closed and for new version of otel being released? |
#### Description Add initial skeleton for the metricstarttime processor. #### Link to tracking issue Part of open-telemetry#37186 cc @jmacd @ridwanmsharif
The purpose and use-cases of the new component
Some receivers (e.g. prometheus) can produce metrics without a start time. It would be helpful to have a way to properly set the start time for cumulative metrics where the start time is not known.
The way this is typically done (e.g. in the googlecloud exporter here, or in this out-of-tree "noramlizesums" processor)) is by dropping the first point in a timeseries without a start timestamp and recording its timestamp and value. Subsequent points use the timestamp of the initial point as the start timestamp, and "subtract" the initial value from the value of the subsequent point. This results in the time range [start, end] properly accounting for the value of the cumulative within that range.
It comes with a downside: while rates are now accurate, the absolute value of a counter is lower than what it would be otherwise. Generally speaking, the cumulative value of a counter is not particularly important--the rate is what matters.
This component could be represented using a processor, receiver helper, or exporter helper.
A receiver helper could make sense, since only a limited number of receivers can produce metrics without a start time (PRW, OTLP, Prometheus come to mind). An exporter helper could also make sense, as whether or not a start time is necessary at all is probably determined by the destination. But since this processor involves a lot of caching, it would be inefficient to have these caches not be shared (and it isn't necessarily safe to share).
Overall, I think a processor makes the most sense, as it can be inserted before exporters that require start time, and guarantees that all metrics passing though have a correct one set.
Example configuration for the component
The component will not require any configuration. E.g.
Telemetry data types supported
Metrics
Code Owner(s)
@dashpole, @ridwanmsharif
Sponsor (optional)
@dashpole
Additional context
cc @ArthurSens @Aneurysm9
I would love for this to replace some of the complex start time handling logic that is currently embedded in the Prometheus receiver. It seems like a separate component would be more self-contained and would be able to apply to PRW as well as the Prometheus receiver. Let me know if you are interested in collaborating
The text was updated successfully, but these errors were encountered: