Skip to main content

DNSSEC, systemd-resolved and general user friendliness

DNSSEC is perceived as difficult to deploy, but it’s actually the client-facing side that is more of a challenge, as my recent experience with systemd-resolved demonstrates.

A decade ago configuring DNSSEC on your authoritative DNS server for your domains was quite challenging, to put that lightly. Manual zone signing was a nightmare and required careful timing and advance key roll-over. BIND automatic zone signing finally made it manageable, even though it was only introduced in 2018. Most DNS providers already integrated this and zone signing is as easy as a single clik in web console.

There’s however another problem, much more severe from business perspective, which is most likely the reason why most large Internet monopolies do not sign their top-level domains with DNSSEC.

It’s the client-side user experience of DNSSEC validation. Or, to be precise, lack of it.

In case of a web server TLS certificate issues (e.g. expiration) there’s a technical but rather clear explanation in the web browsers and in some cases the user can make an informed decision to continue in spit of the warning. In case of DNSSEC validation failure however there is literally zero information the user gets, simply because none is returned to the browser from the local DNS resolver.

Here is a response from my local Unbound resolver, which is fully recursive and validating DNS resolver. The response is empty and there is zero indication about the reason, apart from rather cryptic SERVFAIL, (which could be for many reasons). Basically, from the user’s perspective, the domain is broken and it looks like its owner’s fault, even if it isn’t:

$ dig a dnssec-failed.org @unbound

; <<>> DiG 9.16.11 <<>> a dnssec-failed.org @192.168.1.252
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 29689
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;dnssec-failed.org.		IN	A

Some reasons why DNSSEC validation can fail and domain owner isn’t to blame are listed further down. At the same time, CloudFlare public 1.1.1.1 resolver does return DNSSEC Bogus extended status coode (RFC 8914) which is something that could be consumed by the web browser making the request to display a meaningful error message:

> dig a dnssec-failed.org  @1.1.1.1

; <<>> DiG 9.16.11 <<>> a dnssec-failed.org @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 1503
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; EDE: 6 (DNSSEC Bogus)
;; QUESTION SECTION:
;dnssec-failed.org.		IN	A

As RFC 8914 has been just published in October 2020, it probably will take some time before implementations will support it.

Just for reference, this dnssec-failed.org domain does have an IP address, it’s just that its DNSSEC signature is intentionally broken to help with testing. It can be only seen when the resolver explicitly sets the Checking Disabled flag in the query, enabled by dig +cd option:

$ dig a dnssec-failed.org +cd @1.1.1.1

; <<>> DiG 9.16.11 <<>> a dnssec-failed.org +cd @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4571
;; flags: qr rd ra cd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;dnssec-failed.org.		IN	A

;; ANSWER SECTION:
dnssec-failed.org.	7179	IN	A	69.252.80.75

Lack of the Extended DNS Errors support isn’t the only problem. Most notably, DNSSEC validation will fail if the clock on the validating nameserver isn’t synchronised — which I have personally experienced on a cluster of SOpines and on a RaspberryPi without RTC, and it took me a while to figure out due to the cryptic nature of DNSSEC failure communication to the client.

The recursive DNS resolver implementations are quite inconsistent, to be honest. Some, such as djbdns simply ignore DNSSEC as a principle, because its author believes DNSSEC doesn’t work. I don’t know what AWS is using as their recursive resolver for EC2 instances, but it shows a very similar behaviour — it just ignores any DNSSEC flags in both requests and responses (and EDNS0 as a matter of fact):

$ dig soa ipsec.pl +dnssec @172.31.0.2

; <<>> DiG 9.16.1-Ubuntu <<>> soa ipsec.pl +dnssec @172.31.0.2
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9618
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;ipsec.pl.			IN	SOA

;; ANSWER SECTION:
ipsec.pl.		3600	IN	SOA	pns31.cloudns.net. hostmaster.krvtz.net. 2021021101 7200 1800 1209600 3600

The problem is further aggravated by the stacked nature of recursive DNS resolution. On a typical modern Linux instance you will have systemd-resolved which describes itself as a “caching and validating stub resolver”. Stub resolver means it doesn’t contact TLD nameservers directly, but needs an upstream recursive resolver, but then it performs DNSSEC validation on responses received from upstream. Obviously, if upstream responses are stripped from any DNSSEC data (even if present at authoritative level), there’s no validation.

The systemd-resolved has one more weird feature, which is striping the DNSSEC data from the DNSSEC responses, as seen in the below example:

$ dig soa ipsec.pl @1.1.1.1 +dnssec

; <<>> DiG 9.16.11 <<>> soa ipsec.pl @1.1.1.1 +dnssec
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1247
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;ipsec.pl.			IN	SOA

;; ANSWER SECTION:
ipsec.pl.		3600	IN	SOA	pns31.cloudns.net. hostmaster.krvtz.net. 2021021101 7200 1800 1209600 3600
ipsec.pl.		3600	IN	RRSIG	SOA 13 2 3600 20210312230004 20210210230004 17569 ipsec.pl. JxPebozuZvP8cKIAAT9ouINY/hltRZMsvOWOeyWDG5db+ooQ2fq5a9Or MMelj4m5xt6r98kL7r/0Kbc8Y+CRog==

$ dig soa ipsec.pl  +dnssec

; <<>> DiG 9.16.11 <<>> soa ipsec.pl +dnssec
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12072
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 65494
;; QUESTION SECTION:
;ipsec.pl.			IN	SOA

;; ANSWER SECTION:
ipsec.pl.		3506	IN	SOA	pns31.cloudns.net. hostmaster.krvtz.net. 2021021101 7200 1800 1209600 3600

The first response, received from 1.1.1.1 contains full information: both the AD (Authenticated Data) flag that indicates the domain is DNSSEC signed, and the actual signature information in RRSIG record.

Presence of this flag in DNS response could be actually indicated by the user agent, pretty much as is the validity of TLS certificates — using a green lock or something like that.

The response from systemd-resolved however is stripped from any of the RRSIG records, even if it does contain the AD flag. The reasons for this have been discussed in the developer community since 2016 and there’s like a dozen of tickets (#4621 #12317 #4621 #6434) related to this behaviour, and it seems like it will be eventually fixed in 2021.

Note that this is not a problem unique to systemd-resolved — I just mention it because I use it a lot. Approach to DNSSEC validation is simply inconsistent across various stub resolvers, as documented in excellent this article Evaluating Local DNSSEC Validators (2019).

Stub resolvers in the “mainstream” operating systems don’t even bother about doing any DNSSEC validation so they appear to be “not causing problems”. But this is pretty much like making a web browser user-friendly by not displaying certificate mismatch warnings.

Summary

As can be seen from the above examples, it’s not enabling DNSSEC on your domain that poses a challenge. It’s the interpretation of results and debugging issues caused frequently by failures of third parties (such as your local ISP with a drifting clock) which result in DNSSEC validation failures, for which however you are being blamed because it’s your domain that doesn’t work.

In spite of that, I believe DNSSEC should be used wherever possible on both the server side (authoritative servers) and on the client side (local recursive nameservers). Worth noting, that while BigTech firms do not sign their domains, all the popular public DNS services do validate DNSSEC.

DNSSEC and DNS primer

DNS is confusing on its own, DNSSEC is even more so here’s a few clarifications:

  • The fundamental concepts of both DNS and DNSSEC are very simple, it’s mostly the weird naming that makes it complicated, and inconsistent implementations that make it sometimes hard to use.
  • When a DNS client wants a DNSSEC validated response, it should set a DO (DNSSEC OK) flag in the DNS requests. This is optional because historically not all clients could cope with slightly larger responses required by DNSSEC. In dig it’s triggered by +dnssec flag.
  • If the response is validated, the resolver could come back with a AD (Authenticated Data) flag. In dig the flag is concealed deep in the ;; flags: qr rd ra ad line.
  • If the domain signature is broken (or the resolver is…), the response will be SERVFAIL, which in basic DNS is rather vague. The client may send a request with CD (Checking Disabled) flag to receive raw DNSSEC data and try on its own, but this is rarely used.
  • The path between your web browser and the authoritative DNS server can be… complicated. A brief primer into different recursive resolver architectures can be found in Terminology for DNS Transports and Location.