Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should protect against UTF-8 homographs #1138

Open
Zugschlus opened this issue Dec 3, 2024 · 12 comments
Open

should protect against UTF-8 homographs #1138

Zugschlus opened this issue Dec 3, 2024 · 12 comments

Comments

@Zugschlus
Copy link

Zugschlus commented Dec 3, 2024

Hi,

it looks like most Linux distributions today "blindly" allow creating user names that contain unicode code points. useradd in this case writes a correctly encoded /etc/passwd in UTF-8 encoding.

I am not sure whether this is a good thing to do before at least doing some checks regarding unicode normalization. For example, useradd will happily create two users étienne and étienne (one with the e aigu character, the other with an e aigu composed of an accent aigu and an e). In my opinion, the second account should be rejected.

Recommended reading about ways to handle this kind of stuff are the PRECIS RFCs, 8264 and 8265 (8266 is not THIS important), and the unicode TR 15 document (https://unicode.org/reports/tr15/) about normalization forms of unicode strings.

I do fully understand if you don't want to unlock this can of worms (I had the same issue with Debian's adduser), but please consider restricting the character set to disallow this otherwise, or at least writing documentation on this matter.

Greetings
Marc

@rbalint
Copy link
Contributor

rbalint commented Dec 3, 2024

Hi,

Debian-devel thread starts here: https://lists.debian.org/debian-devel/2024/11/msg00250.html

Debian carries a patch that relaxes checks: https://lists.debian.org/debian-devel/2024/12/msg00045.html

I probably should have dropped the patch while maintaining shadow in Debian, but could not find time to follow up all the potential breakages before doing so. :-(

Thanks @Zugschlus for raising the topic and working on that.

I think the best way out would be dropping the carried useradd patch (https://salsa.debian.org/debian/shadow/-/commit/08e5e0a148b548a3eb2f5ba7acfd6ab406533268) in Debian and making the required changes in adduser, too.

@Zugschlus
Copy link
Author

Zugschlus commented Dec 4, 2024

Before I filed this upstream issue, I tried creating märc, Étienne and mΩrc on Centos Stream 9, Alma Linux 9 and Fedora 41 (spun up VMs on Hetzner Cloud) using their adduser variant (needed --badnames) and the creation went through like it does on Debian. Hence I filed this upstream since this is an issue that other (all?) Linux distributions suffer from.

Their adduser man page symlinks to useradd, does this also apply to the binary?

@rbalint
Copy link
Contributor

rbalint commented Dec 4, 2024

@Zugschlus Ah, OK. I think I misunderstood the the original issue. By "blindly" I thought you were referring to Debian's behavior, where you can create UTF-8 usernames without passing --badname, and I believe this is not OK.
IMO when passing --badname there are no extra checks needed, since the user already accepted that the username may cause unspecified problems, including looking like an other different username.

@Zugschlus
Copy link
Author

The problem that I see is that people speaking languages that need special characters or maybe even special alphabets might get used to using --badname without understanding the implication. Would you be willing at least to document that there might be issues?

@rbalint
Copy link
Contributor

rbalint commented Dec 4, 2024

I don't believe that there are many people using special characters in their username, but a warning would be useful to help newcomers, indeed.

@alejandro-colomar
Copy link
Collaborator

$ man useradd | sed -n '/badname/,/^$/p'
       --badname
           Allow names that do not conform to standards.

How about adding a sentence there that warns about this?
Any proposal of a warning?

If it was just me, I'd just remove the --badname option. I guess it stays there for any scripts that need it. But please don't use it. Maybe we should just state this?

@alejandro-colomar
Copy link
Collaborator

https://lists.debian.org/debian-devel/2024/11/msg00271.html:

The collaboration between src:shadow, base-passwd and adduser is a
relatively fresh thing that came from the fact that src:shadow recently
introduced changes that made adduser's test suite break.

Reference please. What was the breakage mentioned there?

@alejandro-colomar
Copy link
Collaborator

alejandro-colomar commented Dec 5, 2024

Hi,

Hi,

it looks like most Linux distributions today "blindly" allow creating user names that contain unicode code points. useradd in this case writes a correctly encoded /etc/passwd in UTF-8 encoding.

I wouldn't say blindly. As @rbalint said, only Debian does it blindly. We do require --badname, which has a negative connotation, and with its own flag name it implies that it's a bad idea to use it.

I am not sure whether this is a good thing to do before at least doing some checks regarding unicode normalization. For example, useradd will happily create two users étienne and étienne (one with the e aigu character, the other with an e aigu composed of an accent aigu and an e). In my opinion, the second account should be rejected.

I oppose adding more code (and thus complexity) to setuid programs. More code, more bugs.

Recommended reading about ways to handle this kind of stuff are the PRECIS RFCs, 8264 and 8265 (8266 is not THIS important), and the unicode TR 15 document (https://unicode.org/reports/tr15/) about normalization forms of unicode strings.

Counter-recommendation: POSIX recommends that usernames be restricted to the portable character set for portability (and portability in setuid binaries also implies security, if I may add).

I do fully understand if you don't want to unlock this can of worms (I had the same issue with Debian's adduser), but please consider restricting the character set to disallow this otherwise, or at least writing documentation on this matter.

We already do that by default. But yeah, I would fully support removing the --badname flag and setting the default behavior in stone.

Greetings Marc

Cheers,
Alex

@Zugschlus
Copy link
Author

Zugschlus commented Dec 5, 2024

https://lists.debian.org/debian-devel/2024/11/msg00271.html:

The collaboration between src:shadow, base-passwd and adduser is a
relatively fresh thing that came from the fact that src:shadow recently
introduced changes that made adduser's test suite break.

Reference please. What was the breakage mentioned there?

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1074306

This has led to #1103 which sadly has not received any answer until today. Since useradd is going to be even more restrictive soon, Debian's adduser will begin parsing useradd's output shortly if there is no other way to detect useradd saying "invalid user name".

Upon second reading, this all sounds like I am trying to blame the failure in adduser's test suite to useradd. That was not my intention.

It is, however, too bad that Linux is going to make a step backwards away from international language support. I can live with that.

@Zugschlus
Copy link
Author

I oppose adding more code (and thus complexity) to setuid programs. More code, more bugs.

For the protocol: useradd is not setuid, at least not in Debian.

@alejandro-colomar
Copy link
Collaborator

I oppose adding more code (and thus complexity) to setuid programs. More code, more bugs.

For the protocol: useradd is not setuid, at least not in Debian.

Ahh, yeah, sorry, I didn't check.

Anyway, more code, more bugs. And you need to run it as root, which makes bugs worse. And other programs which do be setuid would have to deal with those user names at some point.

I vote against adding support for non-portable names.

@alejandro-colomar
Copy link
Collaborator

https://lists.debian.org/debian-devel/2024/11/msg00271.html:

The collaboration between src:shadow, base-passwd and adduser is a
relatively fresh thing that came from the fact that src:shadow recently
introduced changes that made adduser's test suite break.

Reference please. What was the breakage mentioned there?

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1074306

Ahh, it's a breakage in Debian, not upstream. Makes sense.

This has led to #1103 which sadly has not received any answer until today. Since useradd is going to be even more restrictive soon, Debian's adduser will begin parsing useradd's output shortly if there is no other way to detect useradd saying "invalid user name".

In my case, I haven't replied because I didn't know what to say. I've now commented there.

Upon second reading, this all sounds like I am trying to blame the failure in adduser's test suite to useradd. That was not my intention.

It is, however, too bad that Linux is going to make a step backwards away from international language support. I can live with that.

I think there are limits.

User-facing strings being internationalized is good.
Strings that have a meaning to the system (login name, file names, ...) I don't think it's good to translate them.
I would even patch the kernel to forbid whitespace in file names; especially newlines. That would make many scripts much more robust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants