Skip to content

[BUG] FQDN use or customizable cluster domain for DD_AGENT_HOST #35526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
danibaeyens opened this issue Mar 26, 2025 · 1 comment
Open

[BUG] FQDN use or customizable cluster domain for DD_AGENT_HOST #35526

danibaeyens opened this issue Mar 26, 2025 · 1 comment
Labels
team/container-platform The Container Platform Team team/dynamic-instrumentation Dynamic Instrumentation team/triage

Comments

@danibaeyens
Copy link

We're debugging an excess of DNS requests to CoreDNS. We see datadog looking for weird search domains:

[INFO] 10.8.29.133:36496 - 58940 "A IN datadog.datadog.svc.cluster.local.my-namespace.svc.cluster.local. udp 84 false 512" NXDOMAIN qr,aa,rd 177 0.000054548s
[INFO] 10.8.29.133:54141 - 35076 "A IN datadog.datadog.svc.cluster.local.svc.cluster.local. udp 70 false 512" NXDOMAIN qr,aa,rd 163 0.00005738s
[INFO] 10.8.29.133:51287 - 19014 "A IN datadog.datadog.svc.cluster.local.cluster.local. udp 66 false 512" NXDOMAIN qr,aa,rd 159 0.00007852s
[INFO] 10.8.29.133:50022 - 54612 "A IN datadog.datadog.svc.cluster.local.eu-central-1.compute.internal. udp 82 false 512" NXDOMAIN qr,aa,rd,ra 201 0.000037818s
[INFO] 10.8.29.133:46214 - 59056 "A IN datadog.datadog.svc.cluster.local. udp 52 false 512" NXDOMAIN qr,aa,rd 145 0.000046209s

So it makes sense, as by default in Kubernetes' resolv.conf has:

search my-namespace.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal
nameserver 172.20.0.10
options ndots:5

and datadog.datadog.svc.cluster.local has only 4 dots, it starts looking for additional domains, duplicating the search domains. This is a 6x increase in requests for every datadog.datadog resolution.

Whenever a service is calling the short service name:http://<service> this hits on the first try of a search domain. With the next level http://<service>.<namespace>, it hits on the second try. I think that in case a service uses the full domain of the cluster, the recommended way would be using FQDNs (adding a . like datadog.datadog.svc.cluster.local.) to completely skip searching through the domains, and not forcing to search through all searchable domains.

Moreover, if a service decides to use a cluster domain different from cluster.local this:

var injectedConfig, injectedEntity, injectedExternalEnv bool
var (
agentHostIPEnvVar = corev1.EnvVar{
Name: agentHostEnvVarName,
Value: "",
ValueFrom: &corev1.EnvVarSource{
FieldRef: &corev1.ObjectFieldSelector{
FieldPath: "status.hostIP",
},
},
}
agentHostServiceEnvVar = corev1.EnvVar{
Name: agentHostEnvVarName,
Value: i.config.localServiceName + "." + apiCommon.GetMyNamespace() + ".svc.cluster.local",
}

will fail, as it hard-codes the domain.

What are your thoughts on using FQDNs or customizing the cluster domain? That way, I cannot only use example.org domain, if I need to, but also use example.org. or cluster.local. to set FQDNs, and reduce the amount of DNS calls.

Agent Environment
Agent (v7.62.0)

Describe what happened:
Agent creates an admission controller with a full domain but without FQDN

Describe what you expected:
I expected a customizable MutationWebhookConfiguration object.

Steps to reproduce the issue:

  • Install datadog agent
  • Inject datadog configuration into a pod
  • DD_AGENT_HOST injected url is datadog.datadog.svc.cluster.local, as defined here

Additional environment details (Operating System, Cloud provider, etc):

@danibaeyens danibaeyens changed the title [BUG] FQDN use or customizable DD_AGENT_HOST [BUG] FQDN use or customizable cluster domain for DD_AGENT_HOST Mar 26, 2025
@danibaeyens
Copy link
Author

Of course, I forgot to add that as an alternative to non-default cluster.local domains, users can switch to hostIP mode instead of service, but I still think cluster domains should be still customizable 😇

@pimlu pimlu added team/dynamic-instrumentation Dynamic Instrumentation and removed team/networks labels Mar 31, 2025
@betterengineering betterengineering added the team/container-platform The Container Platform Team label Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team/container-platform The Container Platform Team team/dynamic-instrumentation Dynamic Instrumentation team/triage
Projects
None yet
Development

No branches or pull requests

3 participants