Skip to content

Add SRV resolver to loadbalancer exporter to use hostnames and track IPs #29760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .chloggen/srv-resolver-for-loadbalancing-exporter.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: 'enhancement'

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: loadbalancingexporter

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: New SRV resolver for loadbalancing exporter for static hostnames with changing IPs

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: ["18412"]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: []
62 changes: 55 additions & 7 deletions exporter/loadbalancingexporter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Note that either the Trace ID or Service name is used for the decision on which

This load balancer is especially useful for backends configured with tail-based samplers or red-metrics-collectors, which make a decision based on the view of the full trace.

When a list of backends is updated, some of the signals will be rerouted to different backends.
When a list of backends is updated, some of the signals will be rerouted to different backends.
Around R/N of the "routes" will be rerouted differently, where:

* A "route" is either a trace ID or a service name mapped to a certain backend.
Expand All @@ -60,7 +60,7 @@ The `loadbalancingexporter` will, irrespective of the chosen resolver (`static`,
Refer to [config.yaml](./testdata/config.yaml) for detailed examples on using the processor.

* The `otlp` property configures the template used for building the OTLP exporter. Refer to the OTLP Exporter documentation for information on which options are available. Note that the `endpoint` property should not be set and will be overridden by this exporter with the backend endpoint.
* The `resolver` accepts a `static` node, a `dns` or a `k8s` service. If all three are specified, `k8s` takes precedence.
* The `resolver` accepts a `static` node, a `dns`, `srv`, or a `k8s` service. If all four are specified, `k8s` takes precedence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would anyone ever specify all four? This got updated from "both" where it made sense, to "all three" which didn't make sense but someone got it through, but now "all four" doesn't make any sense at all.

Please change this to specify the resolution ordering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do, however, I don't think I really ever looked at how the order was determined. Now I'm wondering if there should be a prescribed precedence.

I would need to dig into this deeper, but my best guess is that these are merely in reverse order of when the resolver was added. If a resolver is at the bottom of this block then that resolver takes most precedence.

Happy to leave everything as is and either make the resolver I'm adding to have the most precedence or leaving it as k8s if that was a conscious decision. If not, my suggestion might be the regular DNS resolver. It seems to me it'd be the most commonly used one.

* The `hostname` property inside a `dns` node specifies the hostname to query in order to obtain the list of IP addresses.
* The `dns` node also accepts the following optional properties:
* `hostname` DNS hostname to resolve.
Expand All @@ -71,9 +71,13 @@ Refer to [config.yaml](./testdata/config.yaml) for detailed examples on using th
* `service` Kubernetes service to resolve, e.g. `lb-svc.lb-ns`. If no namespace is specified, an attempt will be made to infer the namespace for this collector, and if this fails it will fall back to the `default` namespace.
* `ports` port to be used for exporting the traces to the addresses resolved from `service`. If `ports` is not specified, the default port 4317 is used. When multiple ports are specified, two backends are added to the load balancer as if they were at different pods.
* The `routing_key` property is used to route spans to exporters based on different parameters. This functionality is currently enabled only for `trace` pipeline types. It supports one of the following values:
* `service`: exports spans based on their service name. This is useful when using processors like the span metrics, so all spans for each service are sent to consistent collector instances for metric collection. Otherwise, metrics for the same services are sent to different collectors, making aggregations inaccurate.
* `traceID` (default): exports spans based on their `traceID`.
* If not configured, defaults to `traceID` based routing.
* `service`: exports spans based on their service name. This is useful when using processors like the span metrics, so all spans for each service are sent to consistent collector instances for metric collection. Otherwise, metrics for the same services are sent to different collectors, making aggregations inaccurate.
* `traceID` (default): exports spans based on their `traceID`.
* If not configured, defaults to `traceID` based routing.
* The `dnssrvnoa` node accepts the following optional properties:
* `hostname` DNS SRV hostname to resolve.
* `interval` resolver interval in go-Duration format, e.g. `5s`, `1d`, `30m`. If not specified, `5s` will be used.
* `timeout` resolver timeout in go-Duration format, e.g. `5s`, `1d`, `30m`. If not specified, `1s` will be used.

Simple example
```yaml
Expand All @@ -100,9 +104,9 @@ exporters:
- backend-2:4317
- backend-3:4317
- backend-4:4317
# Notice to config a headless service DNS in Kubernetes
# Notice to config a headless service DNS in Kubernetes
# dns:
# hostname: otelcol-headless.observability.svc.cluster.local
# hostname: otelcol-headless.observability.svc.cluster.local

service:
pipelines:
Expand Down Expand Up @@ -162,6 +166,44 @@ service:
- loadbalancing
```

DNSSRVNOA means "Using DNS resolution of SRV records without doing a second resolution of the resulting A records." The provided SRV record will be resolved and return A records. Instead of querying DNS for the IPs of those A records the hostnames will be returned to the load balancer and the kernel will perform the DNS resolutions.

An example where DNSSRVNOA is useful would be a `StatefulSet`-backed headless kubernetes `Service` with istio.

Note that we do not define a port in the config since the port is provided by the record. The target must be either an A or AAAA record. For more information see https://www.ietf.org/rfc/rfc2782.txt

> [!IMPORTANT]
> Currently priority and weight are not supported features. Additionally, all targets must map to a single IP address.


Example Config:
```yaml
receivers:
otlp:
protocols:
grpc:
endpoint: localhost:4317

processors:

exporters:
loadbalancing:
protocol:
otlp: {}
resolver:
dnssrvnoa:
hostname: _<svc-port-name>._<svc-port-protocol>.<svc-name>.<svc-namespace>.svc.cluster.local

service:
pipelines:
traces:
receivers:
- otlp
processors: []
exporters:
- loadbalancing
```

For testing purposes, the following configuration can be used, where both the load balancer and all backends are running locally:
```yaml
receivers:
Expand Down Expand Up @@ -273,6 +315,12 @@ service:
- debug
```

## Picking a Resolver
* `static` should be used when you know every endpoint and the IP addresses for those endpoints do not change.
* `dns` should be used when a single hostname resolves to all endpoints.
* `k8s` should be used when a single kubernetes `Service` contains all endpoints.
* `dnssrvnoa` should be used when you have an SRV record which covers all endpoints, each endpoint must map to a single IP address, and you want the endpoints provided to the loadbalancer to be in the form of a hostname (`A` record) instead of an IP address.

## Metrics

The following metrics are recorded by this processor:
Expand Down
17 changes: 13 additions & 4 deletions exporter/loadbalancingexporter/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,10 @@ type Protocol struct {

// ResolverSettings defines the configurations for the backend resolver
type ResolverSettings struct {
Static *StaticResolver `mapstructure:"static"`
DNS *DNSResolver `mapstructure:"dns"`
K8sSvc *K8sSvcResolver `mapstructure:"k8s"`
Static *StaticResolver `mapstructure:"static"`
DNS *DNSResolver `mapstructure:"dns"`
K8sSvc *K8sSvcResolver `mapstructure:"k8s"`
DNSSRVNOA *DNSSRVNOAResolver `mapstructure:"dnssrvnoa"`
}

// StaticResolver defines the configuration for the resolver providing a fixed list of backends
Expand All @@ -50,8 +51,16 @@ type DNSResolver struct {
Timeout time.Duration `mapstructure:"timeout"`
}

// K8sSvcResolver defines the configuration for the DNS resolver
// K8sSvcResolver defines the configuration for the kubernetes Service resolver
type K8sSvcResolver struct {
Service string `mapstructure:"service"`
Ports []int32 `mapstructure:"ports"`
}

// TODO: Make a common struct to be used for dns-based resolvers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO still needs to be done?

// DNSSRVResolver defines the configuration for the DNS resolver of SRV records for headless Services
type DNSSRVNOAResolver struct {
Hostname string `mapstructure:"hostname"`
Interval time.Duration `mapstructure:"interval"`
Timeout time.Duration `mapstructure:"timeout"`
}
1 change: 1 addition & 0 deletions exporter/loadbalancingexporter/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ require (
go.opentelemetry.io/otel/trace v1.23.1
go.uber.org/multierr v1.11.0
go.uber.org/zap v1.26.0
golang.org/x/exp v0.0.0-20230711023510-fffb14384f22
k8s.io/api v0.29.2
k8s.io/apimachinery v0.29.2
k8s.io/client-go v0.29.2
Expand Down
9 changes: 9 additions & 0 deletions exporter/loadbalancingexporter/loadbalancer.go
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,15 @@ func newLoadBalancer(params exporter.CreateSettings, cfg component.Config, facto
return nil, err
}
}
if oCfg.Resolver.DNSSRVNOA != nil {
dnssrvnoaLogger := params.Logger.With(zap.String("resolver", "dnssrvnoa"))

var err error
res, err = newDNSSRVNOAResolver(dnssrvnoaLogger, oCfg.Resolver.DNSSRVNOA.Hostname, oCfg.Resolver.DNSSRVNOA.Interval, oCfg.Resolver.DNSSRVNOA.Timeout)
if err != nil {
return nil, err
}
}

if res == nil {
return nil, errNoResolver
Expand Down
Loading