Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DateTime with zero microseconds encoding bug #138

Merged
merged 4 commits into from
Dec 21, 2023

Conversation

vlymar
Copy link
Contributor

@vlymar vlymar commented Dec 14, 2023

Hi, we've been adopting plausible's ch and ecto_ch packages and are super grateful for your open source work!

I noticed a bug where queries with elixir DateTime params that had microsecond precision with zero microseconds (e.g. ~U[2023-12-13 00:00:00.000000Z]) would fail because the ch driver would encode them with a value like 1.6095024e9 which Clickhouse cannot parse.

Float.to_string returns the "shortest representation" possible, which in some cases will be scientific notation [docs].

There are a few ways to solve this but I chose to use :erlang.float_to_binary to prevent the scientific notation encoding.

Without the patch, the test case I added fails with:

** (Ch.Error) Code: 457. DB::Exception: Value 1.6095024e9 cannot be parsed as DateTime64(6) for query parameter '$0' because it isn't parsed completely: only 9 of 11 bytes was parsed: 1.6095024.

I'm not an expert in clickhouse or elixir so please let me know if you have any feedback or concerns! I'm happy to make any changes. Or feel free to make any changes yourself.

Thanks!

@ruslandoga
Copy link
Contributor

ruslandoga commented Dec 14, 2023

👋 @vlymar

Thank you for the fix! I completely missed it!

The tests are failing due to formatting checks. Elixir's mix format --check-formatted wants the long line to be split:

     # get encoded as a val like "1.6095024e9" which Clickhouse would be unable to parse to a DateTime.
     utc_with_zero_microsec = ~U[2021-01-01 12:00:00.000000Z]
-    naive_with_zero_microsec = utc_with_zero_microsec |> DateTime.shift_zone!(Ch.Test.clickhouse_tz(conn)) |> DateTime.to_naive()
+
+    naive_with_zero_microsec =
+      utc_with_zero_microsec
+      |> DateTime.shift_zone!(Ch.Test.clickhouse_tz(conn))
+      |> DateTime.to_naive()
+
     assert Ch.query!(conn, "select {$0:DateTime64(6)} as d, toString(d)", [utc_with_zero_microsec]).rows ==
             [[~N[2021-01-01 12:00:00.000000], to_string(naive_with_zero_microsec)]]

I opened a PR to handle that (we don't need that long line) and to separate the new test: knocklabs#2

@ruslandoga
Copy link
Contributor

ruslandoga commented Dec 14, 2023

Maybe we can round when microseconds are zero? That is, just return seconds instead of doing the float encoding.

  defp encode_param(%DateTime{time_zone: "Etc/UTC", microsecond: microsecond} = dt) do
    case microsecond do
      {val, precision} when val > 0 and precision > 0 -> # note `val > 0`
        size = round(:math.pow(10, precision))
        unix = DateTime.to_unix(dt, size)
        # to make encoding clear, let's just use arithmetic :)
        seconds = div(unix, size)
        fractional = rem(unix, size)
        IO.iodata_to_binary([Integer.to_string(seconds), ?.,  String.pad_leading(Integer.to_string(fractional), precision)])

      _ ->
        dt |> DateTime.to_unix(:second) |> Integer.to_string()
    end
  end

* some suggestions for tests

* changelog
@ruslandoga
Copy link
Contributor

ruslandoga commented Dec 15, 2023

Oops, the new test actually needs that long line. I can add it and merge tomorrow.

@vlymar
Copy link
Contributor Author

vlymar commented Dec 15, 2023

Maybe we can round when microseconds are zero? That is, just return seconds instead of doing the float encoding.

That works too. I actually started with that approach, then found :erlang.float_to_binary. It doesn't make much of a difference to me, I'm happy with either approach.

Your new encoding approach is pretty cool too, though it did take me a few minutes to wrap my head around what unix = DateTime.to_unix(dt, size) was doing because calling that func with a positive int size isn't well documented.

@ruslandoga
Copy link
Contributor

ruslandoga commented Dec 16, 2023

Seems like we'll need to round to seconds when microseconds are zero as ClickHouse cannot parse 0.0 as DateTime64

** (Ch.Error) Code: 457. DB::Exception: Value 0.0 cannot be parsed as DateTime64 for query parameter '$0' because it isn't parsed completely: only 1 of 3 bytes was parsed: 0. (BAD_QUERY_PARAMETER) (version 23.3.7.5 (official build))
     
code: assert TestRepo.one!(where(q, d: ^~U[1970-01-01 00:00:00.000Z])) == ~N[1970-01-01 00:00:00]
full test
test "DateTime" do
    q =
      from v in fragment(
             # unix(~N[2023-12-16 08:55:47]) = 1702716947
             "values ('d DateTime', ('1970-01-01 00:00:00'), (1702716947), ('2106-02-07 06:28:15'))"
           ),
           select: v.d

    assert TestRepo.one!(where(q, d: ^~N[1970-01-01 00:00:00])) == ~N[1970-01-01 00:00:00]
    assert TestRepo.one!(where(q, d: ^~N[1970-01-01 00:00:00.000])) == ~N[1970-01-01 00:00:00]
    assert TestRepo.one!(where(q, d: ^~U[1970-01-01 00:00:00Z])) == ~N[1970-01-01 00:00:00]
    assert TestRepo.one!(where(q, d: ^~U[1970-01-01 00:00:00.000Z])) == ~N[1970-01-01 00:00:00]

    assert TestRepo.one!(where(q, d: ^1_702_716_947)) == ~N[2023-12-16 08:55:47]
    assert TestRepo.one!(where(q, d: ^1_702_716_947.0000)) == ~N[2023-12-16 08:55:47]
    assert TestRepo.one!(where(q, d: ^~N[2023-12-16 08:55:47])) == ~N[2023-12-16 08:55:47]
    assert TestRepo.one!(where(q, d: ^~U[2023-12-16 08:55:47Z])) == ~N[1970-01-01 00:00:00]
    assert TestRepo.one!(where(q, d: ^~U[2023-12-16 08:55:47.000Z])) == ~N[1970-01-01 00:00:00]

    assert TestRepo.one!(where(q, d: ^~N[2106-02-07 06:28:15])) == ~N[2106-02-07 06:28:15]
    assert TestRepo.one!(where(q, d: ^~N[2106-02-07 06:28:15.000000])) == ~N[2106-02-07 06:28:15]
    assert TestRepo.one!(where(q, d: ^~U[2106-02-07 06:28:15Z])) == ~N[2106-02-07 06:28:15]
    assert TestRepo.one!(where(q, d: ^~U[2106-02-07 06:28:15.000000Z])) == ~N[2106-02-07 06:28:15]

    assert TestRepo.one!(where(q, [v], v.d < ^~N[1970-01-01 00:00:00.123456])) ==
             ~N[1970-01-01 00:00:00]

    assert TestRepo.one!(where(q, [v], v.d < ^~U[1970-01-01 00:00:00.123456Z])) ==
             ~N[1970-01-01 00:00:00]

    assert TestRepo.all(where(q, [v], v.d > ^~N[1970-01-01 00:00:00.123456])) ==
             [~N[2023-12-16 08:55:47], ~N[2106-02-07 06:28:15]]

    assert TestRepo.all(where(q, [v], v.d > ^~U[1970-01-01 00:00:00.123456Z])) ==
             [~N[2023-12-16 08:55:47], ~N[2106-02-07 06:28:15]]

    assert TestRepo.all(where(q, [v], v.d < ^1_702_716_947.123)) ==
             [~N[1970-01-01 00:00:00], ~N[2023-12-16 08:55:47]]
  end

From this PR: plausible/ecto_ch#143


Although ClickHouse can turn 0.0 into DateTime64 in https://play.clickhouse.com/play?user=play#c2VsZWN0IHRvRGF0ZVRpbWU2NCgwLjAsIDYpOw==


0.123456 from ~U[1970-01-01 00:00:00.123456Z] cannot be parsed either. Hm.

@vlymar
Copy link
Contributor Author

vlymar commented Dec 18, 2023

Seems like we'll need to round to seconds when microseconds are zero as ClickHouse cannot parse 0.0 as DateTime64

Nice find @ruslandoga. I've updated this PR with your proposed implementation. It may be worth filing a bug report with Clickhouse?

@ruslandoga
Copy link
Contributor

Thank you! knocklabs#3 should fix the CI. I'll check if the latest ClickHouse version also fails to parse 0.xxx timestamps.

@ruslandoga
Copy link
Contributor

ruslandoga commented Dec 19, 2023

The later ClickHouse versions don't have that problem.

$ docker run -d -p 8123:8123 clickhouse/clickhouse-server
$ iex
iex(1)> {:ok, pid} = Ch.start_link
{:ok, #PID<0.210.0>}

iex(2)> Ch.query(pid, "select {dt:DateTime64}", %{"dt" => 0.123})
{:ok, %Ch.Result{command: :select, num_rows: 1, rows: [[~N[1970-01-01 00:00:00.123000]]]}}

iex(3)> Ch.query(pid, "select version()")
{:ok, %Ch.Result{command: :select, num_rows: 1, rows: [["23.11.2.11"]]}}

In the CI we use 23.3.7.5

@vlymar
Copy link
Contributor Author

vlymar commented Dec 20, 2023

@ruslandoga any other feedback on this PR?

@ruslandoga
Copy link
Contributor

@vlymar no :) All's good! I just forgot to merge it...

@ruslandoga ruslandoga merged commit 9652788 into plausible:master Dec 21, 2023
5 checks passed
ruslandoga added a commit that referenced this pull request Dec 21, 2023
* some suggestions for tests

* changelog
@ruslandoga
Copy link
Contributor

ruslandoga commented Dec 23, 2023

I've published the fix in v0.2.2.

Thank you!

@vlymar
Copy link
Contributor Author

vlymar commented Jan 2, 2024

thanks @ruslandoga, happy new year!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants