Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions regarding binary_id columns and the binary_id_type configuration #70

Closed
Blond11516 opened this issue May 14, 2022 · 11 comments
Closed

Comments

@Blond11516
Copy link
Contributor

When I tried using binary_id primary keys for my tables, I was surprised to see IDs in elixir's binary format rather than strings. After some research, I found the (undocumented) :binary_id_type option.

The option supports either :string (default) or :binary as values. I was surprised to find in the loader for binary_id that the :binary (rather than :string) type uses Ecto.UUID (and consequently handles IDs as strings). The actual SQLite column types do seem to match the configuration more intuitively: :binary -> UUID column with BLOB affinity, :string -> TEXT_UUID column with TEXT affinity.

My resulting questions are:

  1. Why does this parameter exist? Does the raw binary save space in the database?
  2. Shouldn't the :stringconfiguration be the one to use Ecto.UUID (and therefore handles IDs as plain strings). That would be much more intuitive to me.
  3. Whatever the answer to question 2 is, should the default behavior be to handle UUID columns as plain strings? This would be more in line with the ecto_sql's builtin sql adapters, which all make use of Ecto.UUID and handle binary ids as strings.
  4. How do the :uuid column type and :uuid_type configuration play into this? From my quick testing this type seems to always be handled as strings, :uuid_type only affecting the type of the column in SQLite itself.

When I get some clarification on this I'd be more than happy to contribute the necessary documentation (and code if applicable).

@warmwaffles
Copy link
Member

The original implementation was from about a year ago here #52 and I am having trouble recalling the rationale for the "why". @LostKobrakai could maybe shed some light.

  1. Why does this parameter exist? Does the raw binary save space in the database?

The parameter exists as far as I can tell to maybe bridge existing sqlite databases that were created with tables using a different affinity for the column type.

  1. Shouldn't the :string configuration be the one to use Ecto.UUID (and therefore handles IDs as plain strings). That would be much more intuitive to me.

I do agree that the string type may be the better default choice. I do believe that it being stored as a binary blob is smaller because you do not have the - character to store with each record, but it's sort of insignificant in the grand scheme of things.

We probably should make the change to be more inline with ecto_sql and how other adapters handle it.

  1. How do the :uuid column type and :uuid_type configuration play into this? From my quick testing this type seems to always be handled as strings, :uuid_type only affecting the type of the column in SQLite itself.

I think this is just a product of trying to support databases that were initially created by a different process or the older ecto2 adapter that used esqlite

@LostKobrakai
Copy link
Contributor

  1. Why does this parameter exist? Does the raw binary save space in the database?

As mentioned this for one part is to allow a migration from esqlite/sqlite_ecto2, which stored uuids as binaries.
But also using the binary representation does save quite a bit of space, and it's not just the additional "-", but also because the raw bytes are way smaller than than the Base16 encoding of bytes into characters.

iex(1)> byte_size(Ecto.UUID.generate())
36
iex(2)> byte_size(Ecto.UUID.bingenerate())
16
  1. Shouldn't the :stringconfiguration be the one to use Ecto.UUID (and therefore handles IDs as plain strings). That would be much more intuitive to me.
  2. Whatever the answer to question 2 is, should the default behavior be to handle UUID columns as plain strings? This would be more in line with the ecto_sql's builtin sql adapters, which all make use of Ecto.UUID and handle binary ids as strings.

Neither postgres nor mysql store UUIDs in their string form. In postgres the native uuid column type1 is used, which stores the raw bytes, but as it's a native type postgres clients usually support visualizing the data as the string encoded form. For mysql binary(16) is used to mimic the same storage behaviour given there's no native column type.2 The mssql/tds driver uses a custom ecto type for uuids, which I'm not exactly sure how it works.3

You're assumption of ecto_sql storing uuids as strings in the db is not correct here. That's probably also why your perception of the loader is wrong. For loading a stringly stored uuid there's nothing to convert between db value and runtime (both are strings), but for the :binary setting the loader actually needs to convert the binary representation of the db to the string representation used at runtime.

  1. How do the :uuid column type and :uuid_type configuration play into this? From my quick testing this type seems to always be handled as strings, :uuid_type only affecting the type of the column in SQLite itself.

Loaders and dumpers are about the mapping between values you see in elixir and the values you see stored in the db. Postgres and MySQL always convert string <-> binary as explained. Exqlite started out storing :uuid columns as strings however, which I explained has quite obvious drawbacks. That's where I introduced the :uuid_type option to adjust this driver to mimic the behaviour of other drivers as well as the older sqlite_ecto2.

Personally I would've loved a switch in default behaviour, but it would be/have been a breaking change and given sqlite clients afaik can't automatically convert the stored bytes to a "human readable" uuid I can also see people liking the string storage in spite of the more than 2x storage need.

Footnotes

  1. https://github.com/elixir-ecto/ecto_sql/blob/d53cf7e83020e9893c16437bb93d1e1ee9f68022/lib/ecto/adapters/postgres/connection.ex#L1365

  2. https://github.com/elixir-ecto/ecto_sql/blob/d53cf7e83020e9893c16437bb93d1e1ee9f68022/lib/ecto/adapters/myxql/connection.ex#L1106

  3. https://github.com/elixir-ecto/ecto_sql/blob/d53cf7e83020e9893c16437bb93d1e1ee9f68022/lib/ecto/adapters/tds/types.ex#L2

@warmwaffles
Copy link
Member

This is actually a really good explanation @LostKobrakai, we should probably throw that into a doc string or a nice comment block for those methods.

@Blond11516
Copy link
Contributor Author

Blond11516 commented May 15, 2022

Ok so let me write down my updated understanding to make sure it's correct.

In all of Postgres, MySQL, SQLite (depending on options) and probably MSSQL too, binary_id columns are stored as raw binary in the database and translated to and from strings by the loader/dumper (usually Ecto.UUID). The binary_id_type and uuid_type configurations exist to allow storing UUIDs either as binary or plain strings. Binary is usually preferred because of the smaller size, but string is the default to maintain compatibility with esqlite/sqlite_ecto2.

So from the way loaders are configured (see below), we can see that the only conversion that happens is when using the :binary_id ecto type stored as binary (taking the binary stored in the database and loading it as a string in elixir). I guess it is then assumed that with binary_id_type set to :string, UUIDs should be generated as strings from the start and not translated between elixir and the database? That doesn't hold up according to my limited testing: UUIDs are still stored as binary and loaded as-is. From my understanding generation of binary_ids is done by Ecto.Adapters.SQL.autogenerate/1, which always calls Ecto.UUID.bingenerate/0. Therefore it makes sense that the :string option doesn't end up actually storing strings.

As for the :uuid type, I haven't been able to find where the values are generated but from my testing they seem to always be binary, both in elixir and the database. I was also wondering what the difference is between [type] and [] as a loader, as I haven't noticed a practical difference in my limited testing.

ecto_sqlite3 loaders:

@impl Ecto.Adapter
def loaders(:binary_id, type) do
  case Application.get_env(:ecto_sqlite3, :binary_id_type, :string) do
    :string -> [type]
    :binary -> [Ecto.UUID, type]
  end
end

@impl Ecto.Adapter
def loaders(:uuid, type) do
  case Application.get_env(:ecto_sqlite3, :uuid_type, :string) do
    :string -> []
    :binary -> [type]
  end
end

This is actually a really good explanation LostKobrakai, we should probably throw that into a doc string or a nice comment block for those methods.

Agreed!

@LostKobrakai
Copy link
Contributor

Binary is usually preferred because of the smaller size, but string is the default to maintain compatibility with esqlite/sqlite_ecto2.

That part is not correct. :string is the default because that's what exqlite used to do before my changes. esqlite/sqlite_ecto2 stored uuids as binaries. If this turns out to be problematic enough to have a breaking change I'd be happy to have :binarystorage as the default or even only option (probably after deprecation).

That doesn't hold up according to my limited testing: UUIDs are still stored as binary and loaded as-is. From my understanding generation of binary_ids is done by Ecto.Adapters.SQL.autogenerate/1, which always calls Ecto.UUID.bingenerate/0. Therefore it makes sense that the :string option doesn't end up actually storing strings.

If autogenerate foregoes dumpers that way then this is a bug / broken for binary_id_type: :string.

@LostKobrakai
Copy link
Contributor

If autogenerate foregoes dumpers that way then this is a bug / broken for binary_id_type: :string.

I can confirm this being a bug with the :binary_id type. I'm not really sure why the autogenerate type differs between :binary_id and Ecto.UUID.

@Blond11516
Copy link
Contributor Author

If I'm not mistaken the issues comes from the fact that ecto_sqlite3 doesn't actually handle any value generation. :binary_id values appear to be generated by ecto_sql's base SQL adapter.

The MSSQL adapter overrides autogenerate/1 to use it's own UUID type, but neither of the postgres, mysql or sqlite adapters do. I think the fix would be for the sqlite adapter to override this function and generate the appropriate value according to :binary_id_type. I haven't thoroughly tested this, but a quick test seems to work as expected. I'll do some more testing and open a PR when I have a bit more time.

@Blond11516
Copy link
Contributor Author

If this turns out to be problematic enough to have a breaking change I'd be happy to have :binary storage as the default or even only option (probably after deprecation).

While I think having binary storage be the default option would've been the better initial decision, with more context I don't really think a breaking change is really worth it. Just having the option documented would be enough imo.

That said, the longer things stay like this, the harder it's going to be to introduce a breaking change 🤷

@warmwaffles
Copy link
Member

I think we are given the allowance to make a breaking change like that prior to bumping the major.

@Blond11516
Copy link
Contributor Author

If autogenerate foregoes dumpers that way then this is a bug / broken for binary_id_type: :string.

I took the liberty of opening a PR with a fix for this: #72

@Blond11516
Copy link
Contributor Author

I think #73 wraps up the questions I had initially.

Thanks a lot @warmwaffles and @LostKobrakai for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants