Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client now accepts :encoding option. #64

Merged
merged 2 commits into from
Apr 12, 2023
Merged

Client now accepts :encoding option. #64

merged 2 commits into from
Apr 12, 2023

Conversation

adrianna-chang-shopify
Copy link
Collaborator

When :encoding is supplied to the client, the charset for the MySQL server is set accordingly. Additionally, we ensure that the client sends query strings in the encoding the connection expects.

This will align Trilogy's encoding behaviour with Mysql2's.

Looking for Feedback

  • Should we set a default encoding of utf8mb4 if none is passed to the client, or not set anything?
  • Does it make sense to transcode strings inside rb_trilogy_escape as well, as long as the connection encoding is ASCII-compatible?

@brianmario
Copy link
Contributor

Should we set a default encoding of utf8mb4 if none is passed to the client, or not set anything?

Down in the C library, UTF8 is passed as the default encoding during the authentication phase of connection. Even though utf8mb4 existed when we decided to do that, most other client libraries back then passed utf8 as well so we went with that. utf8mb4 is the modern replacement of utf8 and should be used as the default for the Ruby extension IMO 👍

Along those lines, it would be a little more efficient to change the C API to allow the client encoding to be passed in to trilogy_build_auth_packet. Then the Ruby code could just pass it through in initialize, and you wouldn't have to issue the SET NAMES query immediately after connection. That would be a breaking change in the C API though, so I'll defer to you all on that decision ;)

@adrianna-chang-shopify
Copy link
Collaborator Author

That's great feedback, thanks @brianmario ! ❤️

I'll add a commit with the proposed changes to pass encoding through to trilogy_build_auth_packet, and we can decide if we're okay with those breaking changes.

If we'd like to avoid breaking changes to the C API, sounds like we can at the very least avoid calling SET NAMES for the default UTF8 encoding since that'll already be configured in the auth phase. Perhaps we'd want to change the default encoding to TRILOGY_CHARSET_UTF8MB4_GENERAL_CI there as well? (Not sure if that would be considered a breaking change).

@brianmario
Copy link
Contributor

Perhaps we'd want to change the default encoding to TRILOGY_CHARSET_UTF8MB4_GENERAL_CI there as well?

If you're referring to the Ruby client then I personally think so, yeah. I'm not sure if we should change the default down in the C library as well. Probably? But that would be a breaking change as well.

@adrianna-chang-shopify
Copy link
Collaborator Author

I'm not sure if we should change the default down in the C library as well. Probably? But that would be a breaking change as well.

Yeah I was referring to the C library, but figured that would be a breaking change 😅 So maybe not doable if we wanted to avoid that.

@matthewd
Copy link
Contributor

I think I'm inclined to wear the breakage here... it seems like something we're going to have to address at some point, so the sooner the better.

And making it configurable at connect time should allow a smooth user-level transition (C API change notwithstanding): we can start defaulting to the more modern utf8mb4, and existing users (👋🏻) can configure back to the prior utf8 default and then separately consider & manage the transition.

(Without investigation, changing the initial default rather than using a post-connect SET NAMES feels less likely to complicate our relationship with any intervening ProxySQL-type middleman, too.)

@adrianna-chang-shopify adrianna-chang-shopify force-pushed the ac-encoding branch 5 times, most recently from ed7038d to 960aa62 Compare April 4, 2023 19:02
@jhawthorn
Copy link
Member

jhawthorn commented Apr 4, 2023

Does it make sense to transcode strings inside rb_trilogy_escape as well, as long as the connection encoding is ASCII-compatible?

I wonder if we should restrict the accepted encodings to only ASCII-compatible (at least from the Ruby client). It feels dangerous to expose an API which could have a broken escape implementation depending on encoding chosen. We have rb_enc_asciicompat which might make it relatively easy to test for?

inc/trilogy/protocol.h Outdated Show resolved Hide resolved
@adrianna-chang-shopify
Copy link
Collaborator Author

@jhawthorn sorry missed your comment earlier!

I wonder if we should restrict the accepted encodings to only ASCII-compatible (at least from the Ruby client).

The implementation right now does restrict encodings to ASCII-compatible only:
https://github.com/github/trilogy/blob/a2696834caa4844b6abf234a11e0627a94ea68e7/contrib/ruby/ext/trilogy-ruby/cext.c#L871-L877

I haven't made any changes to escape -- it maintains the string's original encoding though, so I was wondering if we wanted to change it to encode in the expected connection encoding (assuming the connection encoding is ASCII-compatible too). What do you think?

eileencodes added a commit to rails/buildkite-config that referenced this pull request Apr 7, 2023
In trilogy-libraries/trilogy#64 we're adding Trilogy as an
additional mysql client for Rails. This change adds a new build for it.
@jhawthorn
Copy link
Member

One suggestion (and a rebase needed). Otherwise looks good to me 🙌

adrianna-chang-shopify and others added 2 commits April 12, 2023 09:14
When :encoding is supplied to the client, the charset for the MySQL server
is set accordingly. Additionally, we ensure that the client sends query
strings in the encoding the connection expects.

Co-authored-by: Jean Boussier <jean.boussier@shopify.com>
Rather than performing `SET NAMES` in the client constructor on the
Ruby side, we can set the charset during the authentication phase.
To do so, we must find the mysql-to-charset mapping for the requested
encoding, and pass it through to the C API.
@jhawthorn jhawthorn merged commit 44d3494 into main Apr 12, 2023
@jhawthorn jhawthorn deleted the ac-encoding branch April 12, 2023 23:09
@jhawthorn
Copy link
Member

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants