Skip to content

New component: Metric Start Time Processor #37186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dashpole opened this issue Jan 14, 2025 · 9 comments
Open

New component: Metric Start Time Processor #37186

dashpole opened this issue Jan 14, 2025 · 9 comments
Assignees
Labels
Accepted Component New component has been sponsored

Comments

@dashpole
Copy link
Contributor

dashpole commented Jan 14, 2025

The purpose and use-cases of the new component

Some receivers (e.g. prometheus) can produce metrics without a start time. It would be helpful to have a way to properly set the start time for cumulative metrics where the start time is not known.

The way this is typically done (e.g. in the googlecloud exporter here, or in this out-of-tree "noramlizesums" processor)) is by dropping the first point in a timeseries without a start timestamp and recording its timestamp and value. Subsequent points use the timestamp of the initial point as the start timestamp, and "subtract" the initial value from the value of the subsequent point. This results in the time range [start, end] properly accounting for the value of the cumulative within that range.

It comes with a downside: while rates are now accurate, the absolute value of a counter is lower than what it would be otherwise. Generally speaking, the cumulative value of a counter is not particularly important--the rate is what matters.

This component could be represented using a processor, receiver helper, or exporter helper.

A receiver helper could make sense, since only a limited number of receivers can produce metrics without a start time (PRW, OTLP, Prometheus come to mind). An exporter helper could also make sense, as whether or not a start time is necessary at all is probably determined by the destination. But since this processor involves a lot of caching, it would be inefficient to have these caches not be shared (and it isn't necessarily safe to share).

Overall, I think a processor makes the most sense, as it can be inserted before exporters that require start time, and guarantees that all metrics passing though have a correct one set.

Example configuration for the component

The component will not require any configuration. E.g.

processors:
    metricstarttime:

Telemetry data types supported

Metrics

Code Owner(s)

@dashpole, @ridwanmsharif

Sponsor (optional)

@dashpole

Additional context

cc @ArthurSens @Aneurysm9

I would love for this to replace some of the complex start time handling logic that is currently embedded in the Prometheus receiver. It seems like a separate component would be more self-contained and would be able to apply to PRW as well as the Prometheus receiver. Let me know if you are interested in collaborating

@dashpole dashpole added Sponsor Needed New component seeking sponsor needs triage New item requiring triage Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor labels Jan 14, 2025
@dashpole
Copy link
Contributor Author

@quentinmit looks like you reviewed the normalizesums processor back in the day. Let me know if you have thoughts or suggestions.

@dashpole
Copy link
Contributor Author

This could even be expanded to support using a metric (process_start_time) to set the start time, or the collector's start time.

cc @ridwanmsharif

@dashpole dashpole self-assigned this Jan 15, 2025
@dashpole dashpole removed the needs triage New item requiring triage label Jan 15, 2025
@dashpole
Copy link
Contributor Author

I presented this at the Collector SIG today, and most people on the call supported this functionality being added as a processor, and were supportive of this functionality being moved out of the Prometheus Receiver and GoogleCloud exporter into its own processor.

There is concern that this may be a regression for some users if the Prometheus receiver stops inserting "true reset" points by default. If/when that happens, we will need to make the change with a feature gate.

@jmacd
Copy link
Contributor

jmacd commented Jan 15, 2025

Having contributed to https://github.com/lightstep/opentelemetry-prometheus-sidecar, I would be glad to review this logic, which is documented in https://opentelemetry.io/docs/specs/otel/metrics/data-model/#cumulative-streams-handling-unknown-start-time and surrounding sections.

Thanks @dashpole!

@ridwanmsharif
Copy link
Member

Also happy to help review/implement this. I've been working on several changes to the metric adjuster in the prometheus receiver and would love to see it simplified

ridwanmsharif added a commit to ridwanmsharif/opentelemetry-collector-contrib that referenced this issue Jan 20, 2025
…a featuregate

This change creates an Alpha feature gate for this functionality since
this can be replaced when
open-telemetry#37186
is implemented.
ridwanmsharif added a commit to ridwanmsharif/opentelemetry-collector-contrib that referenced this issue Jan 20, 2025
…a featuregate

This change creates an Alpha feature gate for this functionality since
this can be replaced when
open-telemetry#37186
is implemented.
ridwanmsharif added a commit to ridwanmsharif/opentelemetry-collector-contrib that referenced this issue Jan 20, 2025
…a featuregate

This change creates an Alpha feature gate for this functionality since
this can be replaced when
open-telemetry#37186
is implemented.
ridwanmsharif added a commit to ridwanmsharif/opentelemetry-collector-contrib that referenced this issue Jan 20, 2025
…a featuregate

This change creates an Alpha feature gate for this functionality since
this can be replaced when
open-telemetry#37186
is implemented.
ridwanmsharif added a commit to ridwanmsharif/opentelemetry-collector-contrib that referenced this issue Jan 20, 2025
…a featuregate

This change creates an Alpha feature gate for this functionality since
this can be replaced when
open-telemetry#37186
is implemented.
ridwanmsharif added a commit to ridwanmsharif/opentelemetry-collector-contrib that referenced this issue Jan 23, 2025
…a featuregate

This change creates an Alpha feature gate for this functionality since
this can be replaced when
open-telemetry#37186
is implemented.
songy23 pushed a commit that referenced this issue Jan 27, 2025
#### Description

Add initial skeleton for the metricstarttime processor.

#### Link to tracking issue
Part of #37186

cc @jmacd @ridwanmsharif
chengchuanpeng pushed a commit to chengchuanpeng/opentelemetry-collector-contrib that referenced this issue Jan 28, 2025
#### Description

Add initial skeleton for the metricstarttime processor.

#### Link to tracking issue
Part of open-telemetry#37186

cc @jmacd @ridwanmsharif
songy23 pushed a commit that referenced this issue Feb 18, 2025
#### Description

Start time metric adjustment is still enabled by default, but can now be
disabled using a feature gate.

It is being replaced by the metricstarttime processor:
#37186

#### Link to tracking issue

Part of
#37186

#### Testing

I updated the unit test for counters

#### Documentation

Updated the README
andrzej-stencel pushed a commit that referenced this issue Mar 3, 2025
…ceiver (#37855)

#### Description

Copy the "metric adjuster" from the prometheus receiver to a new
"true_reset_point" strategy in the metricstarttimeprocessor

#### Link to tracking issue
Part of
#37186

#### Testing

Copied unit tests

#### Documentation

Updated the README.

#### Notes for reviewers

* I would recommend reviewing commit-by-commit. The first commit is
copied verbatim, with minimal changes other than splitting into multiple
files.
* I changed the function signature of `AdjustMetrics` to match the
processMetrics function so that it doesn't need an additional wrapper
layer.
* I removed the MetricsAdjuster interface, since it isn't needed (we
implement the processor function for metrics now).
* I had to remove the validation of job + instance being present because
it didn't pass generated tests. Regardless, we should not rely on those
in a generic processor.
atoulme pushed a commit that referenced this issue Mar 7, 2025
#### Description

Add a skeleton for the `subtract_initial_point` strategy. The
implementation will follow in subsequent PRs.

#### Link to tracking issue

Part of
#37186,
and #38379

#### Testing

Added config_test.go

#### Documentation

Updated the README.
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Mar 18, 2025
@ridwanmsharif
Copy link
Member

This was accepted and exists here: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/metricstarttimeprocessor

Its at the very least not stale. But also, we may be able to close this issue once the feature parity is complete with the prometheusreceiver

@pintohutch
Copy link
Contributor

pintohutch commented Mar 18, 2025

/label -Stale

@github-actions github-actions bot removed the Stale label Mar 18, 2025
@RafalSkolasinski
Copy link

It comes with a downside: while rates are now accurate, the absolute value of a counter is lower than what it would be otherwise. Generally speaking, the cumulative value of a counter is not particularly important--the rate is what matters.

This is particularly annoying downside that came a bit by surprise during our migration to otel collector.
Though in majority of cases I totally agree one should only care about the rates, we have cases where we are care about absolute values.

For majority of this we can workaround by changing counter to gauges and all is goo. However, some metric we collect as histogram and it turned out that _bucket metric is counter so it suffers from inaccurate / offset values...

This was accepted and exists here: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/metricstarttimeprocessor

Does this mean we can modify somehow otel configuration to benefit from accurate counters? Or do we need to wait until this issue is closed and for new version of otel being released?

zeck-ops pushed a commit to zeck-ops/opentelemetry-collector-contrib that referenced this issue Apr 23, 2025
#### Description

Add initial skeleton for the metricstarttime processor.

#### Link to tracking issue
Part of open-telemetry#37186

cc @jmacd @ridwanmsharif
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Component New component has been sponsored
Projects
None yet
Development

No branches or pull requests

5 participants