Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing TXT records does not properly de-escape quotes #1384

Open
TomOnTime opened this issue Jun 20, 2022 · 5 comments
Open

Parsing TXT records does not properly de-escape quotes #1384

TomOnTime opened this issue Jun 20, 2022 · 5 comments

Comments

@TomOnTime
Copy link
Contributor

When calling NewRR() on a TXT record, an escaped quote isn't properly de-escaped.

A line such as:

example.com. IN TXT "inner\"quote"

It gets parsed to: inner\"quote
I was expecting: inner"quote

Maybe I'm misreading the RFC? It just seems that without removing the \ there's no way to know what the \ means in the resulting string.

I've created #1383 to demonstrate the issue.

@miekg
Copy link
Owner

miekg commented Jun 20, 2022 via email

@tlimoncelli
Copy link
Contributor

"you lost the fact that it was quoted" -- I would imagine that the fact that it is ASCII 34 means it was quoted.

Can the string be formed any other way?

@miekg
Copy link
Owner

miekg commented Jul 7, 2022

trouble parsing what you mean here. I still believe current behavior is correct, as calling this should give the same output:

NewRR -> String() -> NewRR on that string -> String() etc.

@TomOnTime
Copy link
Contributor Author

Sorry for being unclear.

Question: Should newRR() simply remove the outer quotes (if they exist) or should it de-escape interior double-quotes too?

I expected it to do the latter. I wrote a TestTXTEscapeJustParse which only does the first half of what TestTXTEscapeParsing does. That way I can see the intermediate result.

It looks like NewRR() doesn't de-escape interior quotes. (i.e. replace \" with ").

  • Input to NewRR: "escaped\"quote"
  • Expected output of result.String(): escaped"quote
  • Actual output of result.String(): escaped\"quote

When we round-trip the strings by calling sprintTxt(), we get the original input, because it ignores the \.

If you look at https://github.com/miekg/dns/pull/1383/files, you'll see the a 2 tests, one should fail and the other should pass.

$ go test ./...
--- FAIL: TestIsPacketConn (0.00s)
    client_test.go:74: unable to run test server: listen unixpacket /var/folders/j8/8pxbmmjs6xjbd4nxjpd_q0rm0000gp/T/TestIsPacketConn2401358967/002/unixpacket.sock: socket: protocol not supported
--- FAIL: TestTXTEscapeJustParse (0.00s)
    parse_test.go:163: mismatch after parsing `"escaped\"quote"` TXT record: `escaped\"quote` != `escaped"quote`
FAIL
FAIL	github.com/miekg/dns	0.874s
ok  	github.com/miekg/dns/dnsutil	0.130s
FAIL

If no de-escaping is intended, then there's no bug.

Tom

@janik-cloudflare
Copy link
Collaborator

janik-cloudflare commented Feb 15, 2024

We've just run into a related issue. In this comment, X represents a string containing the character a 253 times. Then:

  • a TXT with zone file content "X0123456789" parses into slice X01 + 23456789 (as expected)
  • a TXT with zone file content "X\;0123456789" parses into slice X\; + 0123456789 (this seems like a bug: the backslash and semicolon are each counted as a byte, and the string split earlier than necessary, even though this escape sequence would only produce a single byte in wire format)

The String() function removes unnecessary backslashes, so parsing and then calling String() on something like "X\;a" "test" turns it into "X;" "a" "test", adding an unnecessary string. In my opinion, the way to go would be to remove unnecessary backslashes immediately upon parsing instead of in String(). This seems more logical and consistent to me: a string in a parsed dns.TXT would then really only contain the contents of that string, without additional zone file syntax. AAAA addresses, for example, are also stored in canonical form (e.g., unnecessary explicit zeros are removed during parsing) so there should be no need to keep literal backslashes in TXT records, right?

(For simplicity, all strings in this comment are raw/literal strings, without Go syntax. In other words, every backslash above is a literal backslash and every quote is a literal quote, such as a quote in a zone file.)


Edit: I've opened #1540 to fix the immediate issue, though I still think it'd be smart not to store escape sequences in parsed data structures (just like JSON escape sequences in strings aren't preserved when using Unmarshal, etc.), even though that's technically a breaking change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants