Help filtering old metrics (>30m) before export — avoid high processing cost in incident scenarios #39294
Unanswered
fabriciodf
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello team,
I'm working with OpenTelemetry Collector in a large-scale environment and would like to request help understanding the best way to filter out metrics that are older than 30 minutes before they are exported.
The reason behind this is related to operational resilience:
In incident scenarios (e.g., heavy traffic, network outages, collector overload), we occasionally experience buffered/stuck metric batches that are eventually flushed — but by then, the data is too old to be relevant and causes unnecessary load on downstream systems like Mimir.
We would like to drop those metrics at the collector level to reduce the load and allow faster recovery and processing of fresh data after an incident.
What we’ve tried so far:
The
filterprocessor
seems promising, and we saw that it supports OTTL expressions withNow()
and time_unix_nano. But we're unsure if this is fully stable for this use case and whether it's the best practice for metric datapoints.The docs mention support for metrics.datapoint, and we’ve considered using something like:
Questions:
Is this the recommended and stable way to filter metrics older than 30 minutes?
Is there any caveat or limitation when applying this filter to metrics coming from the OpenTelemetry Operator?
For example, metrics from the kubelet, Prometheus exporters, or opentelemetry-collector sidecars — does anything differ in how timestamps are handled?
Would it be safer or more performant to use a custom processor for this?
Would adding a native timestamp-based filtering processor make sense as a core feature?
Thanks in advance for your guidance — we’re happy to contribute or test things if needed.
Best regards,
Fabrício
Beta Was this translation helpful? Give feedback.
All reactions