Skip to content

Loss of data when search engine is unavailable using Opensearch exporter #38846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
charan906 opened this issue Mar 21, 2025 · 0 comments
Open
Labels
bug Something isn't working exporter/opensearch needs triage New item requiring triage

Comments

@charan906
Copy link

charan906 commented Mar 21, 2025

Component(s)

exporter/opensearch

What happened?

##Description
Loss of data when search engine is unavailable and bought back , but search engine results in loss of data, even though queue and retry mechanisms are configured

##Test Strategy

  1. Initially ,,made data pod replicas to 0
  2. produced 1 log per second for 10 seconds
  3. Brought back data pods up after 10 seconds

##Observations:

  1. Telemetry collector received 10 log records and exported
  2. out of 10 only 9 log records are stored , when backend available

Expected Result

Complete data in Search engine

Actual Result

Search Engine count is 9(expected 10)

Collector version

v0.121.0

OpenTelemetry Collector configuration

opensearch:
    sending_queue:
      enabled: true
      storage: file_storage/opensearch
      num_consumers: 2
      queue_size: 40000
    retry_on_failure:
      enabled: true
      initial_interval: 1s
      max_interval: 30s
      max_elapsed_time: 0
    http:
      endpoint: <SE host>:9000
extensions:
  file_storage/opensearch:
    directory: /tmp/otel/queue/opensearch
    create_directory: true
    timeout: 10s
processors:
  memory_limiter:
    check_interval: 5s
    limit_percentage: 90
    spike_limit_percentage: 15
  batch:
    send_batch_size: 50
    timeout: 2s

Log output

Additional context

POST /_bulk HTTP/1.1
Host: search-engine:9200
User-Agent: opensearch-go/2.3.0 (linux amd64; Go 1.23.3)
Content-Length: 1110
Content-Type: application/json
Accept-Encoding: gzip

{"create":{"_index":"ss4o_logs-default-namespace"}}
{"attributes":{"data_stream":{"dataset":"default","namespace":"namespace","type":"record"}},"body":"Hello! This is a Testing log","instrumentationScope":{"name":"test-app"},"observedTimestamp":"2025-03-19T12:22:44.663646777Z","resource":{"service.name":"test-app","telemetry.sdk.language":"go","telemetry.sdk.name":"opentelemetry","telemetry.sdk.version":"1.33.0"},"severity":{"number":5},"@timestamp":"2025-03-19T12:22:43.633743348Z"}
{"create":{"_index":"ss4o_logs-default-namespace"}}
{"attributes":{"data_stream":{"dataset":"default","namespace":"namespace","type":"record"}},"body":"Hello! This is a Testing log","instrumentationScope":{"name":"test-app"},"observedTimestamp":"2025-03-19T12:22:44.664584006Z","resource":{"service.name":"test-app","telemetry.sdk.language":"go","telemetry.sdk.name":"opentelemetry","telemetry.sdk.version":"1.33.0"},"severity":{"number":5},"@timestamp":"2025-03-19T12:22:44.648203511Z"}
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-encoding: gzip
content-length: 580

{"took":59937,"errors":true,"items":[{"create":{"_index":"ss4o_logs-default-namespace","_id":"WaBbrpUBfWGueJ6HKMa7","status":503,"error":{"type":"unavailable_shards_exception","reason":"[ss4o_logs-default-namespace][1] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[ss4o_logs-default-namespace][1]] containing [index {[ss4o_logs-default-namespace][WaBbrpUBfWGueJ6HKMa7], source[{"attributes":{"data_stream":{"dataset":"default","namespace":"namespace","type":"record"}},"body":"Hello! This is a Testing log","instrumentationScope":{"name":"test-app"},"observedTimestamp":"2025-03-19T12:22:44.663646777Z","resource":{"service.name":"test-app","telemetry.sdk.language":"go","telemetry.sdk.name":"opentelemetry","telemetry.sdk.version":"1.33.0"},"severity":{"number":5},"@timestamp":"2025-03-19T12:22:43.633743348Z"}]}]]"}}},{"create":{"_index":"ss4o_logs-default-namespace","_id":"WqBbrpUBfWGueJ6HKMa7","status":503,"error":{"type":"unavailable_shards_exception","reason":"[ss4o_logs-default-namespace][4] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[ss4o_logs-default-namespace][4]] containing [index {[ss4o_logs-default-namespace][WqBbrpUBfWGueJ6HKMa7], source[{"attributes":{"data_stream":{"dataset":"default","namespace":"namespace","type":"record"}},"body":"Hello! This is a Testing log","instrumentationScope":{"name":"test-app"},"observedTimestamp":"2025-03-19T12:22:44.664584006Z","resource":{"service.name":"test-app","telemetry.sdk.language":"go","telemetry.sdk.name":"opentelemetry","telemetry.sdk.version":"1.33.0"},"severity":{"number":5},"@timestamp":"2025-03-19T12:22:44.648203511Z"}]}]]"}}}]}

POST /_bulk HTTP/1.1
Host: search-engine:9200
User-Agent: opensearch-go/2.3.0 (linux amd64; Go 1.23.3)
Content-Length: 555
Content-Type: application/json
Accept-Encoding: gzip

{"create":{"_index":"ss4o_logs-default-namespace"}}
{"attributes":{"data_stream":{"dataset":"default","namespace":"namespace","type":"record"}},"body":"Hello! This is a Testing log","instrumentationScope":{"name":"test-app"},"observedTimestamp":"2025-03-19T12:23:45.536550316Z","resource":{"service.name":"test-app","telemetry.sdk.language":"go","telemetry.sdk.name":"opentelemetry","telemetry.sdk.version":"1.33.0"},"severity":{"number":5},"@timestamp":"2025-03-19T12:22:43.633743348Z"}
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-encoding: gzip
content-length: 208

{"took":1465,"errors":false,"items":[{"create":{"_index":"ss4o_logs-default-namespace","_id":"XaBcrpUBfWGueJ6HFsaB","_version":1,"result":"created","_shards":

{"total":2,"successful":1,"failed":0}
,"_seq_no":0,"_primary_term":1,"status":201}}]}

In each data packet , if the request failed with 503, if it has 2 retriable items, but only 1 item is retrying, which results in loss

Is there any changes needed to expect complete data from Search engine , when backends unavailable and bring back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/opensearch needs triage New item requiring triage
Projects
None yet
Development

No branches or pull requests

2 participants