Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the real world isn't RFC compliant #14

Closed
wakemaster39 opened this issue Sep 4, 2020 · 7 comments · Fixed by #20
Closed

the real world isn't RFC compliant #14

wakemaster39 opened this issue Sep 4, 2020 · 7 comments · Fixed by #20

Comments

@wakemaster39
Copy link
Contributor

Unfortunately the real world does not listen to RFCs, _ can be in hostname parameters, DNS respects it, chrome respects it but we fail here.

We should introduce a strict mode and a loose mode for trying to allow for better real world compatibility where required.

@ypcrts
Copy link
Owner

ypcrts commented Sep 4, 2020 via email

@wakemaster39
Copy link
Contributor Author

wakemaster39 commented Sep 5, 2020

no I don't have any authority, my knowledge is mostly statck overflow, random microsoft docs, bind9 configuration and a bit surrounding the other DNS record specs which rely on _ pretty heavily. But this change does not support those at all.

I haven't found in my search any other character besides _ and punycode handling for UTF characters.

@ypcrts
Copy link
Owner

ypcrts commented Sep 5, 2020 via email

@ypcrts
Copy link
Owner

ypcrts commented Sep 5, 2020

look at this madness https://tools.ietf.org/html/rfc1123


   2.5  GENERAL APPLICATION REQUIREMENTS SUMMARY

                                               |          | | | |S| |
                                               |          | | | |H| |F
                                               |          | | | |O|M|o
                                               |          | |S| |U|U|o
                                               |          | |H| |L|S|t
                                               |          |M|O| |D|T|n
                                               |          |U|U|M| | |o
                                               |          |S|L|A|N|N|t
                                               |          |T|D|Y|O|O|t
FEATURE                                        |SECTION   | | | |T|T|e
-----------------------------------------------|----------|-|-|-|-|-|--
                                               |          | | | | | |
User interfaces:                               |          | | | | | |
  Allow host name to begin with digit          |2.1       |x| | | | |
  Host names of up to 635 characters           |2.1       |x| | | | |
  Host names of up to 255 characters           |2.1       | |x| | | |
  Support dotted-decimal host numbers          |2.1       | |x| | | |
  Check syntactically for dotted-dec first     |2.1       | |x| | | |
                                               |          | | | | | |

@wakemaster39
Copy link
Contributor Author

There is a typo in that table, it had me concerned there for a minute. Must is 63 not 635 characters like is already supported.

Unfortunately it doesn't talk any more about _, but I feel like 255 hostname length should be a different flag than strict vs loose considering 255 is a should support but it will really depend on your backing system if that is acceptable or not.

@ypcrts
Copy link
Owner

ypcrts commented Sep 8, 2020

Yeah, I agree that 635 must be a typo.

Originally this repo had the goal of validating domain names for certificate authorities.

All certificates containing an underscore character in any dNSName entry and having a validity period of more than 30 days MUST be revoked prior to January 15, 2019.
-- https://cabforum.org/2018/11/12/ballot-sc-12-sunset-of-underscores-in-dnsnames/

Following some of the debate on Stack Overflow, hostnames seem to be a subset of domain names following the rules we know already in this repo.

A record's name in a DNS server can contain an underscore, such as _dkim.example.com. Formally, by RFC 1035 the name is not restricted in the binary content of the octets. DNS was built in such as way that it even handles null octets in the name.

Like you said, what you're replicating here isn't for hostnames, and it isn't for web certificate authorities.

So what is it? Idk? Something that Chromium does? Here's Chromium source:

chromium - net/dns/dns_util.cc
image

We do gain something from this that diverges from our current regex: this allows the last octet in a label to be a hyphen. And looking back at RFC 1035 s. 2.3.1 Preferred Name that's a bug which I've filed as #16. (oops)

I haven't quite traced out the full code path yet, but it seems like labels with leading dashes are accepted by both Chrome and Firefox.

Anyway, the top of the chromium source notes:
image

And that would seem to point to Daniel J. Berenstein and the djbdns server and dns utils.

This is a sad place to end the egg hunt, because I don't see a good reason why underscores are included other than tradition.

I don't mind accepting this feature, but perhaps should be named after djb or called allow_underscores.

@wakemaster39
Copy link
Contributor Author

wakemaster39 commented Sep 9, 2020

I agree this is a really shitty answer, as I have also not been able to find a reason for underscores other than chrome/firefox/safari said they accept them. I am happy to change the name, I went with strict as I expect one day someone will find the 27th letter to the alphabet and it can be combined and strict just means follow the RFC and let the other-one explode.

I would vote for allow_underscores if we did want a different name as djb requires context of the problem where the former does what it says.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants