[exporter/kafka] Replace "topic" setting by "traces_topic", "logs_topic" and "metrics_topic" #35432

aklemp · 2024-09-26T10:25:37Z

Component(s)

exporter/kafka

Is your feature request related to a problem? Please describe.

Inspired by #32735 because it is a related problem:

When the setting "topic" is not specified, the same kafka exporter config can be used in all three pipelines if the topic names match the default values:

exporter:
  kafka:

pipelines: 
  metrics:
     exporters: [kafka] # publishes topic otlp_metrics
  logs:
     exporters: [kafka] # publishes topic otlp_logs
  traces:
     exporters: [kafka] # publishes topic otlp_spans

If the topic is set to any value, this structure will work in exporter perspective.

exporter:
  kafka:
    topic: custom_traces_topic

pipelines: 
  metrics:
     exporter: [kafka] # publishes topic custom_traces_topic
  logs:
     exporter: [kafka] # publishes topic custom_traces_topic
  traces:
     exporter: [kafka] # publishes topic custom_traces_topic

What happens in this case is that the three exporters will send to the same topic. This is a race condition that will succeed in 1/3 of scenarios at the receiving end (see #32735).

To avoid this problem, the user must create three different exporters for each pipeline to set custom topic names. This is error prone and inconsistent with the default behavior that allows having one exporter for all three pipelines with the default topic names.

Describe the solution you'd like

Having three different topic names by default but being able to override it with only a single one is a strange feature.
Just create three topic properties of the kafka exporter:

exporter:
  kafka:
    traces_topic: custom_traces_topic # default otlp_spans
    metrics_topic: custom_metrics_topic # default otlp_metrics
    logs_topic: custom_logs_topic # default otlp_logs

pipelines: 
  metrics:
     exporters: [kafka] # publishes topic custom_metrics_topic
  logs:
     exporters: [kafka] # publishes topic custom_logs_topic
  traces:
     exporters: [kafka] # publishes topic custom_traces_topic

Alternative definition (solution should match for exporter and receiver):

exporter:
  kafka:
    topic:
      traces: custom_traces_topic # default otlp_spans
      metrics: custom_metrics_topic # default otlp_metrics
      logs: custom_logs_topic # default otlp_logs

Describe alternatives you've considered

The documentation gives some hints about determining the actual topic:

The client application sending telemetry data to OpenTelemetry should not be concerned with setting topic names in attributes that are used internally to transport OpenTelemetry information using Kafka.
The context could be configured with topic names.
- found no example how to configure that
- need to have some logic to determine from telemetry data to configure the topic to use
- evaluated for every message
- more complex setup than simply defining three static properties
Feature enhancement of this ticket.

Current workaround: define three exporters with all properties redundant except the topic property and use them individually in three pipelines.

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2024-09-26T10:25:54Z

Pinging code owners:

exporter/kafka: @pavolloffay @MovieStoreGuy

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2024-12-02T03:39:33Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/kafka: @pavolloffay @MovieStoreGuy

See Adding Labels via Comments if you do not have permissions to add labels yourself.

aklemp · 2024-12-02T07:52:51Z

This is still relevant and I would actually consider this a bug instead of a feature because setting the provided configuration value leads to runtime errors.

github-actions · 2025-02-03T03:33:27Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/kafka: @pavolloffay @MovieStoreGuy

See Adding Labels via Comments if you do not have permissions to add labels yourself.

aklemp · 2025-02-03T07:40:28Z

Anyone to respond except a bot?

axw · 2025-03-18T04:31:23Z

@aklemp this seems reasonable to me on the surface. Before pressing ahead, I'd like to discuss the alternatives a bit.

The client application sending telemetry data to OpenTelemetry should not be concerned with setting topic names in attributes that are used internally to transport OpenTelemetry information using Kafka.

Agreed.

On a more general note, it may be useful for the exporter to dynamically choose the topic based on some information it has about the client, such as a tenant ID. This could be done at the request/batch level, so you wouldn't need to evaluate it for every single message.

For example you might set a topic name template to something like ${tenant}-otlp_logs. In theory the solution to that could also work for signals, e.g. set topic to ${tenant}-otlp_${signal}, but that is a little bit inflexible.

There's another alternative that I have been thinking about on and off: what if we sent everything to the same topic, and included the signal type and encoding as message headers? Then we could support producing to and receiving from a single topic: the receiver would use the signal type and encoding headers to figure out how to decode the data, and use the signal type header to route to the correct pipeline.

Have you considered this option? Would it suit your needs, or do you prefer separating signals into different topics?

aklemp · 2025-03-19T10:13:00Z

@axw Thank you for your response.

I'm not sure about choosing topics dynamically based on client information. We currently don't have a use case for that and to me it would be probably a decision per telemetry type, like logs for this client are going here and logs for that client are going there. The telemetry type itself is already differentiated by the collector and I could apply different pipelines and configuration for that so the exporter doesn't have to worry.

Regarding sending all telemetry types via one topic is possible and basically an internal contract between exporter and receiver. The OpenTelemetry user isn't aware of that detail. But after thinking about it, I found several arguments from architecture, security and IT operations perspective against it.

Schemas could be associated per telemetry type (and I assume it is already the case as the receiver detects mismatching telemetry data on the same topic).
Depending on requirements different policies could be applied per topic, like validation, virus scanning, encryption.
The telemetry type messages typically vary a lot regarding number and size. Separate topics can be tuned accordingly (e.g. partitioning, retention times, replication factors).
Separate topics are easier to monitor because one could easily spot if there is a problem regarding one of the telemetry types or alert rules can be defined according to the specific amount of messages.
Other people like Kafka admins can easily see what kind of data is transported (e.g. only metrics and traces are going via Kafka and logs are delivered to a different system directly).

From exporter/receiver development perspective there are also a few arguments against joining all telemetry data on a single topic.

It already works with separated topics as long as we don't change the topic configuration of exporter/receiver.
Only handling of configuration has to be changed instead of the actual implementation.

So overall, I would still prefer a solution with separate but easily to configure topics (without duplicating Kafka cluster configuration) together with #32735 (just closed because of inactivity).

axw · 2025-03-20T05:40:00Z

@aklemp thank you! That all makes sense to me, except:

Virus scanning: is that relevant for logs, metrics, traces, or profiles? It sounds like a concern for other types of data.
Encryption: why would you encrypt one and not the other?

I ask because if these are per-signal concerns, then I would be worried about a "slippery slope" of logs_encryption, metrics_encryption, etc. I think of these as cross-cutting concerns that should apply to all the data; and in the (I expect unusual) event that they do not, then you could always create a separate exporter.

I'm not sure about choosing topics dynamically based on client information. We currently don't have a use case for that and to me it would be probably a decision per telemetry type, like logs for this client are going here and logs for that client are going there. The telemetry type itself is already differentiated by the collector and I could apply different pipelines and configuration for that so the exporter doesn't have to worry.

OK. For what it's worth, this is a concern for my team, that's why I brought it up. It's a way of enabling multi-tenancy: https://kafka.apache.org/documentation/#multitenancy-topic-naming

One major benefit of having a combined topic for all the signals is that it may lead to fewer partitions. This can matter at large scale (e.g. when combined with per-tenant topics), as each cluster can only handle so many partitions, and managed Kafka (e.g. AWS MSK) typically have a per-partition cost.

I am going to investigate the template approach a bit more. Essentially what I have in mind is to to enable setting a config like this:

receivers:
  kafka:
    topic: "custom_${signal}_topic" # defaults to "otlp_${signal}"

exporters:
  kafka:
   topic: "custom_${signal}_topic" # defaults to "otlp_${signal}"

(Not necessarily with that syntax.)

I think we need to make that work to enable namespacing, but if for whatever reason it can't be done then I think we could consider the <signal>_topic config option.

aklemp · 2025-03-20T18:23:05Z

I agree that it sounds strange to do encryption or virus scanning per signal type. Despite the fact that it would just give the user freedom to do so, one might argue that metrics and traces have a quite defined format and transport technical information. Logs however could contain anything including confidential data (not a great idea to log such things, but sometimes it is, what it is...). Processes like the mentioned increase latency and resource consumption (e.g. CPU, memory), so one might want to apply these only for the risky signals.

A template like custom_${signal}_topic would be fine as the result would be the same, having the signal types in separate topics.

axw · 2025-03-20T23:40:37Z

For the template, I can think of a few options:

Use OTTL for evaluating the topic name

Pros:

Doesn't introduce a new template/expression language

Cons:

We may need to introduce a new config field like topic_ottl or something similar; otherwise we would need to have a way to reliably differentiate a static topic name from a dynamic template.
A little more verbose for simple cases, e.g. Format("%s_otlp_%s", request["tenant-id"], signal)
A lot more verbose for more complex cases, e.g. where different signals follow different formats such as {"logs": "logs_topic", "traces": Format("%s.otlp_traces", request["tenant-id"]), "metrics": Format("%s.otlp_metrics", request["tenant-id"])}[signal] (and there would be no short-circuiting of expressions, i.e. we would have to evaluate the entire map in that example before we know which one to choose)

Use Go's text/template

Pros:

This could trivially be supported in the existing topic config field: we could parse the config value as a template because a static topic name cannot have "{{" in it (that would be invalid for a Kafka topic name)

Cons:

Introduces a template language not commonly used in OTel Collector configurations
A bit more verbose than above for simple cases, e.g. {{printf "%s_otlp_%s" .Signal (index .Request "tenant-id") }}
Possible but verbose to implement the more complex cases using conditionals, e.g. {{if .eq .Signal "logs"}}logs_topic{{else}}{{ index .Request "tenant-id" }}.otlp_{{ .Signal }}{{end}}

Use https://github.com/expr-lang/expr

Basically the same pros/cons as OTTL, but a little less verbose and a little more expressive (e.g. it has conditonals built into the language). Expr-lang is used in the collector already (namely in receivercreator), but is less prolific than OTTL.

Simple case: request["tenant-id"] + "_otlp_" + signal
Complex case: signal == "logs" ? "logs_topics" : request["tenant-id"] + "_otlp_" + signal

I'm currently looking into whether we could combine either OTTL expressions or expr-lang with confmap's variable resolution. That way we could do something like ${request["tenant-id"]}_otlp_${signal} where the expression inside the ${...} is interpreted as either OTTL or expr-lang. I think this will give us the best of both worlds.

axw · 2025-03-20T23:46:41Z

Of course it now occurs to me that if we use confmap variable syntax, we'll need to escape it in configurations, which may become a footgun. I can't think of a better alternative at the moment...

aklemp · 2025-03-21T07:56:41Z

I like the approach with alternatives with pros/cons. Unfortunately, I don't have deep experience either in Go or with Otel Collector best practices. So I cannot give a useful opinion on the options. The only thing is from user perspective, that simple and less verbose is probably better.

axw · 2025-03-21T08:22:15Z

Thanks @aklemp. I'll bring this to the attention of other code owners and go from there.

axw · 2025-03-24T01:40:08Z

I've opened a proposal here: #38888

Deprecate `topic` and `encoding`, and introduce signal-specific equivalents: - `logs::topic`, `metrics::topic`, and `traces::topic` - `logs::encoding`, `metrics::encoding`, and `traces::encoding` This enables users to explicitly define a configuration equivalent to the default configuration, or some variation thereof. It also enables specifying different encodings for each signal type, which may be important due to the fact that some encodings only support a subset of signals. Closes open-telemetry#35432

) #### Description Deprecate `topic` and `encoding`, and introduce signal-specific equivalents: - `logs::topic`, `metrics::topic`, and `traces::topic` - `logs::encoding`, `metrics::encoding`, and `traces::encoding` This enables users to explicitly define a configuration equivalent to the default configuration, or some variation thereof. It also enables specifying different encodings for each signal type, which may be important due to the fact that some encodings only support a subset of signals.  #### Link to tracking issue Fixes open-telemetry#35432 #### Testing Unit tests added. #### Documentation Updated README. --------- Co-authored-by: Antoine Toulme <[email protected]>

aklemp · 2025-04-23T08:32:03Z

@axw I was wondering when and how the version v0.124.0 containing the changes is created on Docker hub. The last version there is 0.123.0 from 22 days ago...

axw · 2025-04-23T08:43:36Z

@aklemp please see open-telemetry/opentelemetry-collector-releases#926

) #### Description Deprecate `topic` and `encoding`, and introduce signal-specific equivalents: - `logs::topic`, `metrics::topic`, and `traces::topic` - `logs::encoding`, `metrics::encoding`, and `traces::encoding` This enables users to explicitly define a configuration equivalent to the default configuration, or some variation thereof. It also enables specifying different encodings for each signal type, which may be important due to the fact that some encodings only support a subset of signals.  #### Link to tracking issue Fixes open-telemetry#35432 #### Testing Unit tests added. #### Documentation Updated README. --------- Co-authored-by: Antoine Toulme <[email protected]>

aklemp added enhancement New feature or request needs triage New item requiring triage labels Sep 26, 2024

github-actions bot added the exporter/kafka label Sep 26, 2024

github-actions bot mentioned this issue Oct 1, 2024

Weekly Report: 2024-09-24 - 2024-10-01 #35498

Closed

atoulme removed the needs triage New item requiring triage label Oct 2, 2024

github-actions bot added the Stale label Dec 2, 2024

github-actions bot removed the Stale label Dec 3, 2024

github-actions bot added the Stale label Feb 3, 2025

github-actions bot removed the Stale label Feb 4, 2025

axw mentioned this issue Mar 18, 2025

[processor/elasticinframetrics] Add support for auto mapping mode elastic/opentelemetry-collector-components#455

Open

1 task

axw mentioned this issue Mar 24, 2025

Kafka: support templated topic name #38888

Open

axw mentioned this issue Apr 7, 2025

exporter/kafkaexporter: add signal-specific config #39204

Merged

MovieStoreGuy closed this as completed in #39204 Apr 9, 2025

MovieStoreGuy closed this as completed in 1be66b3 Apr 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[exporter/kafka] Replace "topic" setting by "traces_topic", "logs_topic" and "metrics_topic" #35432

[exporter/kafka] Replace "topic" setting by "traces_topic", "logs_topic" and "metrics_topic" #35432

aklemp commented Sep 26, 2024 •

edited

Loading

github-actions bot commented Sep 26, 2024

github-actions bot commented Dec 2, 2024

aklemp commented Dec 2, 2024

github-actions bot commented Feb 3, 2025

aklemp commented Feb 3, 2025

axw commented Mar 18, 2025 •

edited

Loading

aklemp commented Mar 19, 2025

axw commented Mar 20, 2025

aklemp commented Mar 20, 2025

axw commented Mar 20, 2025

axw commented Mar 20, 2025

aklemp commented Mar 21, 2025

axw commented Mar 21, 2025

axw commented Mar 24, 2025

aklemp commented Apr 23, 2025

axw commented Apr 23, 2025

[exporter/kafka] Replace "topic" setting by "traces_topic", "logs_topic" and "metrics_topic" #35432

[exporter/kafka] Replace "topic" setting by "traces_topic", "logs_topic" and "metrics_topic" #35432

Comments

aklemp commented Sep 26, 2024 • edited Loading

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

github-actions bot commented Sep 26, 2024

github-actions bot commented Dec 2, 2024

aklemp commented Dec 2, 2024

github-actions bot commented Feb 3, 2025

aklemp commented Feb 3, 2025

axw commented Mar 18, 2025 • edited Loading

aklemp commented Mar 19, 2025

axw commented Mar 20, 2025

aklemp commented Mar 20, 2025

axw commented Mar 20, 2025

axw commented Mar 20, 2025

aklemp commented Mar 21, 2025

axw commented Mar 21, 2025

axw commented Mar 24, 2025

aklemp commented Apr 23, 2025

axw commented Apr 23, 2025

aklemp commented Sep 26, 2024 •

edited

Loading

axw commented Mar 18, 2025 •

edited

Loading