-
Notifications
You must be signed in to change notification settings - Fork 69
Description
Description
When kresd receives a truncated UDP response (TC=1) for a DNSKEY query that exceeds the 1232-byte EDNS buffer, it logs "truncated response, failover to TCP" but never actually sends a TCP packet. Instead, it re-queries the same server over UDP, gets truncated again, and loops until hitting the iteration count limit (100), returning SERVFAIL with EDE 22 (No Reachable Authority).
tcpdump -i any tcp port 53 captures zero TCP packets during the entire resolution attempt, confirming TCP is never attempted despite the log message.
This appears to be a v6 regression. The TC=1 handler in lib/layer/iterate.c sets local_state->truncated = true and returns KR_STATE_CONSUME, but this does not trigger a new transport selection that would switch to TCP in v6's rewritten session layer.
Steps to Reproduce
- Run Knot Resolver v6.2.0 with default EDNS buffer size (1232 bytes) and DNSSEC enabled.
- Clear cache completely (
kresctl cache clearorrm -rf /var/cache/knot-resolver/*). - Resolve any domain under a TLD whose DNSKEY + RRSIG exceeds 1232 bytes over UDP. The
.aiTLD is one such example (~1297 bytes with DNSSEC).
# Confirm the DNSSEC response exceeds 1232 bytes:
$ dig DNSKEY ai. @199.115.157.1 +bufsize=1232 +dnssec
;; MSG SIZE rcvd: 1297
# Attempt resolution through kresd:
$ dig @127.0.0.1 -p 5336 claude.ai +timeout=10
;; status: SERVFAIL
;; EDE: 22 (No Reachable Authority): (QLPL)
# Confirm zero TCP was attempted:
$ tcpdump -i any tcp port 53 -c 20 -n &
$ dig @127.0.0.1 -p 5336 claude.ai +timeout=10
# Result: 0 packets captured- Verify the authoritative server handles TCP correctly:
$ dig +tcp DNSKEY ai. @199.115.157.1 +dnssec +timeout=5
;; Query time: 63 msec
;; MSG SIZE rcvd: 1297
# Works perfectly over TCPkresd debug log showing the loop
[resolv] querying: 'v0n1.nic.ai.'@'199.115.153.1#00053' zone cut: 'ai.' qname: 'ai.' qtype: 'DNSKEY' proto: 'udp'
[select] updating: 'v0n1.nic.ai.'@'199.115.153.1#00053' with rtt 41
[iterat] <= truncated response, failover to TCP
[select] noting selection error: 'v0n1.nic.ai.'@'199.115.153.1#00053' error: 13 TRUNCATED
[iterat] 'ai.' type 'DNSKEY' new uid was assigned .29, parent uid .01
[srvstl] => no reachable NS, using stale data "ai."
[select] choosing: 'v0n1.nic.ai.'@'199.115.153.1#00053' with timeout 61 ms
[resolv] querying: 'v0n1.nic.ai.'@'199.115.153.1#00053' zone cut: 'ai.' qname: 'ai.' qtype: 'DNSKEY' proto: 'udp'
^^^ still UDP
[iterat] <= truncated response, failover to TCP
...repeats until...
[worker] cancelling query due to exceeded iteration count limit of 100
Note: every retry says proto: 'udp' — TCP is never used despite "failover to TCP" being logged.
Expected behavior
After receiving TC=1 on UDP, kresd should send the retry query over TCP to the same or a different authoritative server, as it logs it will do.
Actual behavior
kresd logs "failover to TCP" but continues sending UDP queries in a loop. No TCP SYN is ever sent (confirmed by packet capture). After 100 iterations, the query fails with SERVFAIL EDE 22.
Analysis
In lib/layer/iterate.c, the TC=1 handler:
VERBOSE_MSG("<= truncated response, failover to TCP\n");
query->server_selection.error(query, req->upstream.transport, KR_SELECTION_TRUNCATED);
return KR_STATE_CONSUME;In lib/selection.c, the TRUNCATED error handler sets local_state->truncated = true and decrements error_count (not penalizing the server).
In lib/selection_iter.c, truncated is checked:
bool tcp = qry->flags.TCP || qry->server_selection.local_state->truncated;However, KR_STATE_CONSUME returned from the iterate layer does not appear to trigger iter_choose_transport() with the updated truncated flag in v6's session/transport layer. The old v5 code path used qry->flags.TCP directly in lib/resolve.c, but v6's rewritten session layer may not connect these two mechanisms.
A related historical fix (MR !1711, v6.0.15) corrected swapped EDNS buffer sizes in the v6 YAML config layer, suggesting this area of the codebase has had configuration-to-runtime translation issues.
Workaround
Forward the affected TLD to another resolver that handles TCP fallback correctly:
forward:
- subtree: ai.
servers:
- address: 127.0.0.1@5335
options:
dnssec: falseEnvironment
| Field | Value |
|---|---|
| Knot Resolver | 6.2.0 |
| OS | Debian 13 (Trixie) |
| EDNS buffer | 1232 bytes (default) |
| DNSSEC | enabled |
| Affected TLD | .ai (DNSKEY + RRSIG = 1297 bytes, exceeds 1232B EDNS) |
| Authoritative servers | All v*.nic.ai (199.115.15x.x) — TCP works correctly |
| Confirmed by | tcpdump showing 0 TCP packets during resolution |