Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat/remove-pydantic-dependency #195

Merged
merged 47 commits into from
Jan 11, 2024
Merged

Conversation

MasterKale
Copy link
Collaborator

@MasterKale MasterKale commented Jan 5, 2024

An experiment in simplifying the list of this package's dependencies. PR is currently a WIP while I continue to explore this.

Fixes #196.

@MasterKale
Copy link
Collaborator Author

OK - I can work with UTF-8 as user_handle...

I'll be sure to document this behavior too, I promise 🙏

@MasterKale MasterKale marked this pull request as ready for review January 8, 2024 23:34
@MasterKale MasterKale added this to the v2.0.0 milestone Jan 8, 2024
@jwag956
Copy link

jwag956 commented Jan 9, 2024

I am still struggling to get this to work with my sample front-end. I don't understand this code in def options_to_json():

        if isinstance(options.user.id, bytes):
            _user["id"] = bytes_to_base64url(options.user.id)
        else:
            _user["id"] = options.user.id

generate_registration_options already has done:

user=PublicKeyCredentialUserEntity(
            id=user_id.encode("utf-8"),
            name=user_name,
            display_name=user_display_name,
        ),

So it will always be 'bytes' and then convert it to base64?

@MasterKale
Copy link
Collaborator Author

So it will always be 'bytes' and then convert it to base64?

Right, this is current behavior that's confusing and definitely not desirable as per another comment I left over in #180:

You know what, I just realized that I've been base64url-encoding user.id the entire time, which means userHandle is coming back as the base64url-encoded version, not the string passed in as user_id into generate_registration_options(). I thought I'd fixed that... 🤦‍♂️

I've stashed a couple of approaches I plan on taking after merging this PR to fix this. I'll either A) always treat user.id and user_handle as UTF-8 strings, or B) allow UTF-8-string or bytes as values for user.id, base64url-encode user.id if it's bytes, and then treat that user identifier as UTF-8 when handling it later during authentication as user_handle.

tl;dr: for this PR the behavior will remain as-is. I'll fix this behavior afterwards as it's very confusing.

@jwag956
Copy link

jwag956 commented Jan 9, 2024

Sorry to be a pain here - this affects the front-end API and changing that needs careful consideration and of course breaks backwards compatibility. So front-ends (mostly JS I presume) must take user.id and b64decode into a Uint8Array to pass into credentials.create - then take the response from authentication (user_handle) and send it back as plain utf-8? Or send it back as b64encoded and have RP code do the decode (which will break if you change it).

I suppose I am asking if this could be 'fixed' (either way) PRIOR to releasing. (different PR is fine - just prior to release) and and that you give this a major release change (2.0).

As always - thanks for all the effort and consideration you put into the package... we are all very very appreciative.

@MasterKale
Copy link
Collaborator Author

So front-ends (mostly JS I presume) must take user.id and b64decode into a Uint8Array to pass into credentials.create - then take the response from authentication (user_handle) and send it back as plain utf-8? Or send it back as b64encoded and have RP code do the decode (which will break if you change it).

To get the most out of userHandle during authentication an RP using this library would have to know to base64url-decode user.id during registration after calling options_to_json(generate_registration_options(...)) and before calling navigator.credentials.create().

If a user of this library does base64url-decode user.id before calling WebAuthn's navigator.credentials.create(), then the bytes they get later for userHandle after a call to navigator.credentials.get() would be the same UTF-8 bytes they specified as the string argument user_id when calling generate_registration_options().

If they do not base64url-decode user.id before calling WebAuthn's navigator.credentials.create(), then the bytes they get later for userHandle after a call to navigator.credentials.get() would always be the base64url-encoded UTF-8 bytes they specified as the string argument user_id when calling generate_registration_options(). The RP would need to base64url-decode userHandle to bytes, and then encode those bytes to UTF-8 to get the same UTF-8 bytes they specified as the string argument user_id when calling generate_registration_options().

This...is potentially going to make migrating from v1.x to v2.0 of py_webauthn a little messy. I think the simplest migration path would involve base64url-encoding the string they pass in for user_id when calling generate_registration_options(). This is anticipating my following up this PR with a subsequent change to always treat the string value user_id as UTF-8 bytes and stop base64url-encoding it - it's not necessary, the value is already a string.

To reiterate, a goal of the upcoming v2 is for the UTF-8 bytes that an RP specifies for user_id (as a basic str, i.e. user_id="user_id_here") when calling generate_registration_options() be the same bytes that get returned for userHandle after a call to navigator.credentials.get(). I could then advise users of py_webauthn to UTF-8 encode these bytes on the front end when preparing JSON to submit to the server. That would at least make it much easier to talk about how one value becomes the other.

All of this said, I honestly don't think userHandle should have much value to RPs. Even during conditional UI, when no username is provided, the credential ID that corresponds to the public key that can successfully verify the signature coming out of WebAuthn's .get() should then be used internally to figure out which user to sign in. userHandle on the front end isn't signed over and so it can't really be trusted. I don't think this gets me off the hook to figuring out a path forward around this, admittedly...

I suppose I am asking if this could be 'fixed' (either way) PRIOR to releasing. (different PR is fine - just prior to release) and and that you give this a major release change (2.0).

I don't think I want to put much effort into trying to solve this with the current v1.x version of py_webauthn as I think it's kind of a breaking change all in its own. I'd rather bundle it with a more "drastic" change like this PR and break many things at once instead of coordinating one per major release.

@jwag956
Copy link

jwag956 commented Jan 9, 2024

As always - thank you for your detailed response..Part of the confusion for me is that string (unicode) != UTF-8 in python - and in python, converting a string to UTF-8 ("my string").encode() results in bytes - which isn't JSON encodable (you have to cast it back to string). So your goal of having user.id be a 'string' seems good but does anything care if it encoded as long as what is returned as part of authentication is the same as was passed into registration?

All that aside - I have everything working with this PR - with just the one modification to b64decode in my signin code.
Thanks.

@MasterKale
Copy link
Collaborator Author

Part of the confusion for me is that string (unicode) != UTF-8 in python

I went digging into Python's Unicode support and it's my understanding "Unicode support" in Python commonly refers to UTF-8, based on https://docs.python.org/3/howto/unicode.html#encodings:

UTF-8 is one of the most commonly used encodings, and Python often defaults to using it. UTF stands for “Unicode Transformation Format”, and the ‘8’ means that 8-bit values are used in the encoding. (There are also UTF-16 and UTF-32 encodings, but they are less frequently used than UTF-8.)

As for the results of your testing...

All that aside - I have everything working with this PR - with just the one modification to b64decode in my signin code.

Thank you for taking the time to test this PR. I'll go ahead with it, but it won't go out immediately as v2.0. I'll be making a couple of other breaking changes (like the user ID stuff we've talked about in here) before rolling all of that into v2.0.

@jwag956
Copy link

jwag956 commented Jan 10, 2024

In python 3 all strings are unicode. It is true that utf-8 is the default encoding (e.g. "hi there".encode()) will convert the unicode string "hi there" to a UTF-8 encoded series of bytes. One thing I noticed was the W3C spec calls user.id "A user handle is an opaque byte sequence with a maximum size of 64 bytes, and is not meant to be displayed to the user."

Since unicode may or may not be 1-1 with bytes - I wonder if it would be better for the py_webauthn API to take bytes - not string - in order to avoid an issue with upon encoding it is too long. Then I suppose encoding that using b64 encoding in order to create a JSON serializable value and having the front-end convert to a uint8array (as I know mine does today) . The same in the reverse upon authentication response. Of course - this behavior is pretty much exactly what py_webauthn and pydantic 1.0 did :-)

I don't really care either way - my user_handle is a 64 bytes hex string. At a minimum it is probably worth documenting the restrictions on passing in an arbitrary unicode string.

@MasterKale MasterKale merged commit a3fb0a5 into master Jan 11, 2024
4 checks passed
@MasterKale MasterKale deleted the feat/remove-pydantic-dependency branch January 11, 2024 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC: Removing Pydantic as a dependency
2 participants