Skip to content

[collector] [receiver/k8s_observer] filelog/regex_parser configuration do not work as expected #39163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kuisathaverat opened this issue Apr 4, 2025 · 4 comments · Fixed by #39258
Assignees
Labels
bug Something isn't working receiver/receivercreator

Comments

@kuisathaverat
Copy link

Component(s)

No response

What happened?

Description

We are testing the k8s_observer as a replacement for the filelog receiver. We have found that a configuration that works with the filelog receiver has a different behavior with the k8s_observer the regexp used does not match.

Steps to Reproduce

I have prepared two configurations that show both case, these configuration deploy an Apache pod and the OpenTelemetry collector configured to show the log document in the logs with the debug exported. You can see that the parse of the same log passes in the filelog receiver and does not match in the k8s_observer receiver.

Filelog receiver configuration in the OpenTelemetry Collector

    filelog:
      include:
        - /var/log/pods/*_apache*/*/*.log
      start_at: end
      operators:
        - type: container
          id: container-parser
        - id: apache-logs
          type: regex_parser
          regex: ^(?P<source_ip>\d+\.\d+.\d+\.\d+)\s+-\s+-\s+\[(?P<timestamp_log>\d+/\w+/\d+:\d+:\d+:\d+\s+\+\d+)\]\s"(?P<http_method>\w+)\s+(?P<http_path>.*)\s+(?P<http_version>.*)"\s+(?P<http_code>\d+)\s+(?P<http_size>\d+)$

k8s_observer Configuration in the pod annotations

    io.opentelemetry.discovery.logs.apache/enabled: "true"
    io.opentelemetry.discovery.logs.apache/config: |
      include:
        - /var/log/pods/*_apache*/*/*.log
      start_at: end
      operators:
        - type: container
          id: container-parser
        - id: apache-logs
          type: regex_parser
          regex: ^(?P<source_ip>\d+\.\d+.\d+\.\d+)\s+-\s+-\s+\[(?P<timestamp_log>\d+/\w+/\d+:\d+:\d+:\d+\s+\+\d+)\]\s"(?P<http_method>\w+)\s+(?P<http_path>.*)\s+(?P<http_version>.*)"\s+(?P<http_code>\d+)\s+(?P<http_size>\d+)$

Expected Result

The same parse results in both cases; all the attributes are parsed from the log and added to the document.

2025-04-04T15:00:54.799Z        info    ResourceLog #0
Resource SchemaURL:
Resource attributes:
     -> k8s.container.name: Str(apache)
     -> k8s.namespace.name: Str(oteldemo-oanrf-default)
     -> k8s.pod.name: Str(apache-7b8dc8dc56-wfwtf)
     -> k8s.container.restart_count: Str(0)
     -> k8s.pod.uid: Str(30639f85-44b3-4438-80ae-99a1297e742b)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope
LogRecord #0
ObservedTimestamp: 2025-04-04 15:00:54.52243288 +0000 UTC
Timestamp: 2025-04-04 15:00:54.514243451 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(10.1.46.1 - - [04/Apr/2025:15:00:54 +0000] "GET / HTTP/1.1" 200 45)
Attributes:
     -> log.file.path: Str(/var/log/pods/oteldemo-oanrf-default_apache-7b8dc8dc56-wfwtf_30639f85-44b3-4438-80ae-99a1297e742b/apache/0.log)
     -> log.iostream: Str(stdout)
     -> logtag: Str(F)
     -> http_code: Str(200)
     -> timestamp_log: Str(04/Apr/2025:15:00:54 +0000)
     -> http_method: Str(GET)
     -> http_path: Str(/)
     -> http_version: Str(HTTP/1.1)
     -> http_size: Str(45)
     -> source_ip: Str(10.1.46.1)

Actual Result

The regexp does not match in the k8s_observer use case.

2025-04-04T15:17:23.861Z        error   helper/transformer.go:122       Failed to process entry {"name": "filelog/308d875f-50e6-49ae-8a54-731e425f2e0e_apache/receiver_creator/logs{endpoint=\"10.1.46.13\"}/k8s_observer/308d875f-50e6-49ae-8a54-731e425f2e0e/apache", "operator_id": "apache-logs", "operator_type": "regex_parser", "error": "regex pattern does not match", "action": "send", "entry.timestamp": "2025-04-04T15:17:23.856Z", "log.file.path": "/var/log/pods/oteldemo-oanrf-default_apache-6d8c49ff46-9mftt_308d875f-50e6-49ae-8a54-731e425f2e0e/apache/0.log", "log.iostream": "stdout", "logtag": "F"}

Collector version

0.123.0

Environment information

Environment

OS: k8s container docker.io/otel/opentelemetry-collector-contrib:latest

OpenTelemetry Collector configuration

exporters:
    # Debug exporter to see the logs in the console
    debug:
      verbosity: detailed
  extensions:
    # Kubernetes observer to discover the pods, nodes, services and ingresses
    k8s_observer:
      observe_pods: true
      observe_nodes: true
      observe_services: true
      observe_ingresses: true
    # Health check extension to check the health of the collector
    health_check:
      endpoint: ${env:MY_POD_IP}:13133
  receivers:
    # Receiver to watch the pods and discover the logs
    receiver_creator/logs:
      watch_observers: [k8s_observer]
      discovery:
        enabled: true
      receivers:
    # Disable the default receivers
    prometheus: null
    zipkin: null
  service:
    # Disable the default telemetry
    telemetry: {}
    # Enable the health check extension
    # and the k8s observer extension
    extensions: [k8s_observer, health_check]
    # Define the pipelines for the logs
    pipelines:
      # Disable the default pipelines
      metrics: null
      traces: null
      # Define the logs pipeline
      logs:
        receivers:
          - receiver_creator/logs
        processors:
          - batch
        exporters:
          - debug

Log output

2025-04-04T15:17:23.861Z        error   helper/transformer.go:122       Failed to process entry {"name": "filelog/308d875f-50e6-49ae-8a54-731e425f2e0e_apache/receiver_creator/logs{endpoint=\"10.1.46.13\"}/k8s_observer/308d875f-50e6-49ae-8a54-731e425f2e0e/apache", "operator_id": "apache-logs", "operator_type": "regex_parser", "error": "regex pattern does not match", "action": "send", "entry.timestamp": "2025-04-04T15:17:23.856Z", "log.file.path": "/var/log/pods/oteldemo-oanrf-default_apache-6d8c49ff46-9mftt_308d875f-50e6-49ae-8a54-731e425f2e0e/apache/0.log", "log.iostream": "stdout", "logtag": "F"}
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*TransformerOperator).HandleEntryError
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/helper/transformer.go:122
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ParseWith
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/helper/parser.go:142
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ProcessWithCallback
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/helper/parser.go:111
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ProcessWith
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/helper/parser.go:98
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/regex.(*Parser).Process
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/parser/regex/parser.go:35
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*WriterOperator).Write
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/helper/writer.go:73
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/container.(*Parser).consumeEntries
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/parser/container/parser.go:298
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*BatchingLogEmitter).flusher
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/helper/emitter.go:171
2025-04-04T15:17:23.862Z        error   container/parser.go:300 failed to write entry   {"name": "filelog/308d875f-50e6-49ae-8a54-731e425f2e0e_apache/receiver_creator/logs{endpoint=\"10.1.46.13\"}/k8s_observer/308d875f-50e6-49ae-8a54-731e425f2e0e/apache", "operator_id": "container-parser", "operator_type": "container", "error": "regex pattern does not match"}
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/container.(*Parser).consumeEntries
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/parser/container/parser.go:300
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*BatchingLogEmitter).flusher
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/helper/emitter.go:171
2025-04-04T15:17:23.980Z        info    Logs    {"resource logs": 2, "log records": 2}
2025-04-04T15:17:23.980Z        info    ResourceLog #0
Resource SchemaURL:
Resource attributes:
     -> k8s.container.restart_count: Str(0)
     -> k8s.pod.uid: Str(308d875f-50e6-49ae-8a54-731e425f2e0e)
     -> k8s.container.name: Str(apache)
     -> k8s.namespace.name: Str(oteldemo-oanrf-default)
     -> k8s.pod.name: Str(apache-6d8c49ff46-9mftt)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope
LogRecord #0
ObservedTimestamp: 2025-04-04 15:17:23.863873858 +0000 UTC
Timestamp: 2025-04-04 15:17:23.856663589 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(10.1.46.1 - - [04/Apr/2025:15:17:23 +0000] "GET / HTTP/1.1" 200 45)
Attributes:
     -> log.iostream: Str(stdout)
     -> logtag: Str(F)
     -> log.file.path: Str(/var/log/pods/oteldemo-oanrf-default_apache-6d8c49ff46-9mftt_308d875f-50e6-49ae-8a54-731e425f2e0e/apache/0.log)
Trace ID:
Span ID:
Flags: 0
ResourceLog #1
Resource SchemaURL:
Resource attributes:
     -> k8s.container.name: Str(apache)
     -> k8s.namespace.name: Str(oteldemo-oanrf-default)
     -> k8s.pod.name: Str(apache-6d8c49ff46-9mftt)
     -> k8s.container.restart_count: Str(0)
     -> k8s.pod.uid: Str(308d875f-50e6-49ae-8a54-731e425f2e0e)
     -> container.id: Str(ed1cd5fb4ae33458cfa34d984d84000d65661442c5144aeb07c4e0372ce27848)
     -> container.image.name: Str(docker.io/bitnami/apache:2.4.63-debian-12-r7)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope
LogRecord #0
ObservedTimestamp: 2025-04-04 15:17:23.861723778 +0000 UTC
Timestamp: 2025-04-04 15:17:23.856663589 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(10.1.46.1 - - [04/Apr/2025:15:17:23 +0000] "GET / HTTP/1.1" 200 45)
Attributes:
     -> log.file.path: Str(/var/log/pods/oteldemo-oanrf-default_apache-6d8c49ff46-9mftt_308d875f-50e6-49ae-8a54-731e425f2e0e/apache/0.log)
     -> log.iostream: Str(stdout)
     -> logtag: Str(F)
Trace ID:
Span ID:
Flags: 0

Additional context

No response

@kuisathaverat kuisathaverat added bug Something isn't working needs triage New item requiring triage labels Apr 4, 2025
@ChrsMark ChrsMark removed the needs triage New item requiring triage label Apr 7, 2025
@ChrsMark
Copy link
Member

ChrsMark commented Apr 7, 2025

Thank's for reporting this @kuisathaverat. I was able to reproduce the issue but still not sure where the problem comes from.

FWIWI the produced configuration by the receiver creator looks correct:

2025-04-07T12:38:33.553Z	info	[email protected]/observerhandler.go:201	starting receiver	{"name": "filelog/4142839f-4ca2-462b-9e62-e0d72eed67f2_apache", "endpoint": "10.244.0.6", "endpoint_id": "k8s_observer/4142839f-4ca2-462b-9e62-e0d72eed67f2/apache", "config": {"include":["/var/log/pods/default_apache-84d6d9fcbc-kmb7b_4142839f-4ca2-462b-9e62-e0d72eed67f2/apache/*.log"],"include_file_name":false,"include_file_path":true,"operators":[{"id":"container-parser","type":"container"},{"field":"attributes.tag","id":"some","type":"add","value":"hints"},{"id":"apache-logs","regex":"^(?P<source_ip>\\d+\\.\\d+.\\d+\\.\\d+)\\s+-\\s+-\\s+\\[(?P<timestamp_log>\\d+/\\w+/\\d+:\\d+:\\d+:\\d+\\s+\\+\\d+)\\]\\s\"(?P<http_method>\\w+)\\s+(?P<http_path>.*)\\s+(?P<http_version>.*)\"\\s+(?P<http_code>\\d+)\\s+(?P<http_size>\\d+)$","type":"regex_parser"}],"start_at":"end"}}
2025-04-07T12:38:33.553Z	info	adapter/receiver.go:41	Starting stanza receiver	{"name": "filelog/4142839f-4ca2-462b-9e62-e0d72eed67f2_apache/receiver_creator/logs{endpoint=\"10.244.0.6\"}/k8s_observer/4142839f-4ca2-462b-9e62-e0d72eed67f2/apache"}
2025-04-07T12:38:33.758Z	info	fileconsumer/file.go:265	Started watching file	{"name": "filelog/4142839f-4ca2-462b-9e62-e0d72eed67f2_apache/receiver_creator/logs{endpoint=\"10.244.0.6\"}/k8s_observer/4142839f-4ca2-462b-9e62-e0d72eed67f2/apache", "component": "fileconsumer", "path": "/var/log/pods/default_apache-84d6d9fcbc-kmb7b_4142839f-4ca2-462b-9e62-e0d72eed67f2/apache/0.log"}

The config part:

{
  "name": "filelog/4142839f-4ca2-462b-9e62-e0d72eed67f2_apache",
  "endpoint": "10.244.0.6",
  "endpoint_id": "k8s_observer/4142839f-4ca2-462b-9e62-e0d72eed67f2/apache",
  "config": {
    "include": [
      "/var/log/pods/default_apache-84d6d9fcbc-kmb7b_4142839f-4ca2-462b-9e62-e0d72eed67f2/apache/*.log"
    ],
    "include_file_name": false,
    "include_file_path": true,
    "operators": [
      {
        "id": "container-parser",
        "type": "container"
      },
      {
        "field": "attributes.tag",
        "id": "some",
        "type": "add",
        "value": "hints"
      },
      {
        "id": "apache-logs",
        "regex": "^(?P<source_ip>\\d+\\.\\d+.\\d+\\.\\d+)\\s+-\\s+-\\s+\\[(?P<timestamp_log>\\d+/\\w+/\\d+:\\d+:\\d+:\\d+\\s+\\+\\d+)\\]\\s\"(?P<http_method>\\w+)\\s+(?P<http_path>.*)\\s+(?P<http_version>.*)\"\\s+(?P<http_code>\\d+)\\s+(?P<http_size>\\d+)$",
        "type": "regex_parser"
      }
    ],
    "start_at": "end"
  }
}

I will try to find more time to investigate this soon.

@ChrsMark
Copy link
Member

ChrsMark commented Apr 7, 2025

Looking more carefully into the resulted regex produced by the receiver creator I can validate my original assumption that the issue lies somewhere in the yaml unmarshaling. Indeed it seems that it comes with extra escapes compared to the original one:

< ^(?P<source_ip>\d+\.\d+.\d+\.\d+)\s+-\s+-\s+\[(?P<timestamp_log>\d+/\w+/\d+:\d+:\d+:\d+\s+\+\d+)\]\s"(?P<http_method>\w+)\s+(?P<http_path>.*)\s+(?P<http_version>.*)"\s+(?P<http_code>\d+)\s+(?P<http_size>\d+)$
---
> ^(?P<source_ip>\\d+\\.\\d+.\\d+\\.\\d+)\\s+-\\s+-\\s+\\[(?P<timestamp_log>\\d+/\\w+/\\d+:\\d+:\\d+:\\d+\\s+\\+\\d+)\\]\\s\"(?P<http_method>\\w+)\\s+(?P<http_path>.*)\\s+(?P<http_version>.*)\"\\s+(?P<http_code>\\d+)\\s+(?P<http_size>\\d+)$

That might be something coming from the unmarshaling that takes place at

if err := yaml.Unmarshal([]byte(configStr), &conf); err != nil {
but I will need to investigate this further.

Copy link
Contributor

github-actions bot commented Apr 8, 2025

Pinging code owners for receiver/receivercreator: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

@ChrsMark ChrsMark self-assigned this Apr 8, 2025
@ChrsMark
Copy link
Member

ChrsMark commented Apr 8, 2025

It seems to be a generic issue with the receiver creator (not only the annotation based discovery) and how escapes are handled in order to support escaped backticks like the one in the testcase:

{"escaped backticks", args{nil, "\\`foo bar\\`"}, "`foo bar`", false},
.

I could reproduce it with the following static receiver creator configuration:

receiver_creator/logsstatic:
  watch_observers: [ k8s_observer ]
  receivers:
    filelog/apache:
      rule: type == "pod.container" && container_name == "apache"
      config:
        include:
          - /var/log/pods/`pod.namespace`_`pod.name`_`pod.uid`/`container_name`/*.log
        include_file_name: false
        include_file_path: true
        operators:
          - id: container-parser
            type: container
          - type: add
            field: attributes.log.template
            value: apache
          - id: apache-logs
            type: regex_parser
            regex: ^(?P<source_ip>\d+\.\d+.\d+\.\d+)\s+-\s+-\s+\[(?P<timestamp_log>\d+/\w+/\d+:\d+:\d+:\d+\s+\+\d+)\]\s"(?P<http_method>\w+)\s+(?P<http_path>.*)\s+(?P<http_version>.*)"\s+(?P<http_code>\d+)\s+(?P<http_size>\d+)$

The docs mention Dynamic values are surrounded by backticks (`). If a literal backtick is needed use \` to escape it. Dynamic values can be used with static values in which case they are concatenated.

Hence we need to explicitly check for the backtick at

case '\\':
if i+1 == len(configValue) {
return nil, errors.New(`encountered escape (\) without value at end of expression`)
}
output.WriteByte(configValue[i+1])
i++
and only handle its escape.

I'll send a PR to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/receivercreator
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants