-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change MySQL's number handling to be more permissive #2290
Conversation
6b6455d
to
05d3913
Compare
diesel/src/mysql/connection/bind.rs
Outdated
| MYSQL_TYPE_MEDIUM_BLOB | ||
| MYSQL_TYPE_LONG_BLOB | ||
| MYSQL_TYPE_VAR_STRING | ||
| MYSQL_TYPE_GEOMETRY => unimplemented!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is how do we want to handle this case? Implementing TryFrom
would imply that we change the return types in Row
and NamedRow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I think we need some actual test cases demonstrating that we even can get those types in the first place to really answer this. Everything ending with 2
we should never get. I suspect everything that just has some modifier like LONG
or VAR
we can treat identically to the alternatives. Not sure on types like geometry and json. Need to dig more to comment on your statement about From vs TryFrom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://dev.mysql.com/doc/refman/8.0/en/json.html and https://dev.mysql.com/doc/refman/8.0/en/spatial-types.html could be relevant here for json and geometry types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/c-api-data-structures.html has at least some information about MYSQL_TYPE_GEOMETRY
(and some of the other unhandled types)
Thank you for doing the work to rebase this. I promise I will make the time this week to look at it in depth. There's a few things that stood out on initial skim that I just need to refresh my memory on |
The C API documentation is separate from the server documentation. It does
not mention these values at all
…On Wed, Feb 5, 2020 at 6:43 AM Georg Semmler ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In diesel/src/mysql/connection/bind.rs
<#2290 (comment)>:
> + MYSQL_TYPE_NULL
+ | MYSQL_TYPE_YEAR
+ | MYSQL_TYPE_NEWDATE
+ | MYSQL_TYPE_VARCHAR
+ | MYSQL_TYPE_BIT
+ | MYSQL_TYPE_TIMESTAMP2
+ | MYSQL_TYPE_DATETIME2
+ | MYSQL_TYPE_TIME2
+ | MYSQL_TYPE_JSON
+ | MYSQL_TYPE_ENUM
+ | MYSQL_TYPE_SET
+ | MYSQL_TYPE_TINY_BLOB
+ | MYSQL_TYPE_MEDIUM_BLOB
+ | MYSQL_TYPE_LONG_BLOB
+ | MYSQL_TYPE_VAR_STRING
+ | MYSQL_TYPE_GEOMETRY => unimplemented!(),
https://dev.mysql.com/doc/refman/8.0/en/json.html and
https://dev.mysql.com/doc/refman/8.0/en/spatial-types.html could be
relevant here for json and geometry types
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#2290?email_source=notifications&email_token=AALVMK7PGFR56IUFWEP62GTRBK67LA5CNFSM4KP773IKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCUKWCSI#discussion_r375261133>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALVMK3ALS66SFXIDH3IKKTRBK67LANCNFSM4KP773IA>
.
--
Thanks,
Sean Griffin
|
6d201ef
to
538be52
Compare
I think this is blocked on #2248, as clippy complains about "needless |
diesel/src/mysql/connection/bind.rs
Outdated
MYSQL_TYPE_GEOMETRY | | ||
// json type, only available on newer mysql versions | ||
// same encoding as string | ||
MYSQL_TYPE_JSON => unimplemented!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these appear anywhere in the docs, and it's not clear if/how we ever receive these. My guess is that they're details of the wire protocol that are handled by the client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've just played a bit with various column types:
CREATE TABLE type_test (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
some_enum ENUM ('a', 'b', 'c') NOT NULL,
some_set SET('set_a', 'set_b', 'set_c') NOT NULL,
jdoc JSON NOT NULL,
geom GEOMETRY NOT NULL,
some_bit BIT(1) NOT NULL,
);
INSERT INTO(some_enum, some_set, jdoc, geom, some_bit) VALUES(
'a',
'set_b',
'{"some_json": 42}',
ST_GeomFromText('POINT(1 1)'),
b'1',
);
Now using the code from this branch to execute the following sql queries:
#[derive(Debug, QueryableByName)]
struct SomeEnum {
#[sql_type = "Text"]
some_enum: String,
}
#[derive(Debug, QueryableByName)]
struct SomeSet {
#[sql_type = "Text"]
some_set: String,
}
#[derive(Debug, QueryableByName)]
struct Json {
#[sql_type = "Text"]
jdoc: String,
}
#[derive(Debug, QueryableByName)]
struct Geometry {
#[sql_type = "Text"]
geom: String,
}
#[derive(Debug, QueryableByName)]
struct Bit {
#[sql_type = "Bool"]
some_bit: bool,
}
let _ = dbg!(sql_query("SELECT some_enum FROM type_tests").get_result::<SomeEnum>(&connection));
let _ = dbg!(sql_query("SELECT some_set FROM type_tests").get_result::<SomeSet>(&connection));
let _ = dbg!(sql_query("SELECT jdoc FROM type_tests").get_result::<Json>(&connection));
let _ = dbg!(sql_query("SELECT some_bit FROM type_tests").get_result::<Bit>(&connection));
let _ = dbg!(sql_query("SELECT geom FROM type_tests").get_result::<Geometry>(&connection));
yields the following results:
[diesel/src/mysql/connection/bind.rs:251] tpe = MYSQL_TYPE_STRING
[diesel_tests/tests/types.rs:1430] sql_query("SELECT some_enum FROM type_tests").get_result::<SomeEnum>(&connection) = Ok(
SomeEnum {
some_enum: "b",
},
)
[diesel/src/mysql/connection/bind.rs:251] tpe = MYSQL_TYPE_STRING
[diesel_tests/tests/types.rs:1431] sql_query("SELECT some_set FROM type_tests").get_result::<SomeSet>(&connection) = Ok(
SomeSet {
some_set: "set_c",
},
)
[diesel/src/mysql/connection/bind.rs:251] tpe = MYSQL_TYPE_BLOB
[diesel_tests/tests/types.rs:1432] sql_query("SELECT jdoc FROM type_tests").get_result::<Json>(&connection) = Ok(
Json {
jdoc: "{\"abc\": 42, \"def\": {\"ghj\": \"foo\"}, \"baz\": []}",
},
)
[diesel/src/mysql/connection/bind.rs:251] tpe = MYSQL_TYPE_BIT
thread 'main' panicked at 'not yet implemented: Hit currently unsupported type, those variants should probably be variants of MysqlType or at least be mapped to one of the existing types', diesel/src/mysql/connection/bind.rs:288:32
…
[diesel_tests/tests/types.rs:1434] sql_query("SELECT geom FROM type_tests").get_result::<Geometry>(&connection) = Err(
DatabaseError(
__Unknown,
"Using unsupported buffer type: 255 (parameter: 1)",
),
)
This means at least the MYSQL_TYPE_BIT
variant occurs on client side. As for the geometry error message: It seems like my mysqlclient
implementation itself does not support that type, probably related to this bugreport. On the other hand the client does receive that type value, it just cannot handle it. (For example the native rust implementation here seems to handle that type.)
On the other hand doing something like this:
table! {
type_tests(id) {
id -> Integer,
some_enum -> Text,
some_set -> Text,
jdoc -> Text,
geom -> Blob,
some_bit ->Text,
}
}
let dsl_res = dbg!(type_tests::table
.select((
type_tests::some_enum,
type_tests::some_set,
type_tests::jdoc,
type_tests::geom,
type_tests::some_bit,
))
.get_result(&connection));
seems to work just fine:
[diesel_tests/tests/types.rs:1422] type_tests::table.select((type_tests::some_enum, type_tests::some_set,
type_tests::jdoc, type_tests::geom,
type_tests::some_bit)).get_result(&connection) = Ok(
(
"b",
"set_c",
"{\"abc\": 42, \"def\": {\"ghj\": \"foo\"}, \"baz\": []}",
[
0,
0,
0,
0,
1,
1,
0,
0,
0,
0,
0,
0,
0,
0,
0,
240,
63,
0,
0,
0,
0,
0,
0,
240,
63,
],
"\u{1}",
),
)
Tested with:
- mysql Ver 15.1 Distrib 10.3.22-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2
- libmysqlclient 8.0.19-0ubuntu0.19.10.3
diesel/src/mysql/types/primitives.rs
Outdated
Double(x) => Ok(x as Self), | ||
Decimal(bytes) => { | ||
let string = str::from_utf8(bytes)?; | ||
let integer_portion = string.split('.').nth(0).unwrap_or_default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be enforcing that any fractional portion is zero?
Hi, is there something we can help with? |
@Razican Would be great if someone could do some more research on which of the types are now actually returned by the mysql client API. The best case would be to have an example query/schema for each type variant and some pointer to a official mysql/mariadb documentation page/source code part stating when this type is returned. |
Hmm I have never worked on these kind of things. Do you have documentation on where to start? |
@Razican Finding documentation is basically the problem here. Everything I've already found is linked already linked in the comments above, but that does not explain/mention all type variants at all. Otherwise it's probably just experimenting by using this branch, a bunch of debug statements, a mysql installation and trying to get back different kinds of answers as sketched out in this comment. Having concrete test cases for all that cases would be surely helpful. |
For anyone interested in helping out with this: I've pushed some more experimental changes here with a lot of test cases. Trying to run and fix them on different client/server combinations may be useful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a pretty thorough review. I might have missed some issues because I don't have a lot of code context yet. I just have one small minor comment. It looks like we are trying to do decimal parsing instead of reusing already existing libs.
use crate::mysql::{Mysql, MysqlValue}; | ||
use crate::sql_types::{BigInt, Binary, Double, Float, Integer, SmallInt, Text}; | ||
|
||
fn decimal_to_integer<T>(bytes: &[u8]) -> deserialize::Result<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be offloading part of this function to BigDecimal::from_bytes
which should check all the possible errors for us. And then we should retrieve the T
from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bigdecimal
is currently an optional dependency. In my opinion that should stay that way if possible.
I've pushed a quite large change adding some test cases and trying to fix those on at least some database versions. See this (weiznich#5) PR for details of tested database versions. Another independent test of those assumptions would be great (especially @diesel-rs/contributors ) (The current state is open for a fundamental review, but I will probably add some more changes and quite a bit of cleanup as soon as I've time to continue working on this) |
diesel/src/mysql/types/json.rs
Outdated
} | ||
} | ||
|
||
impl ToSql<sql_types::Json, MysqlValue> for serde_json::Value { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been meaning to send a PR. I want to do impl ToSql<Json> for String
also for Pg. Can we do that here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our rules for adding ToSql
imps in diesel itself are roughly :
- Its a commonly used type mapping
- It cannot fail because of invalid values. All (or almost all) valid values of the rust type must be representable by the corresponding sql type.
I cannot answer the second point with: "Yes that assumption is true" as an arbitrary string is not a valid json value for the general case. Therefore such impl does not exist yet and I would prefer to keep it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to add more support for the Json types which is why I was looking at this. I am just concerned with the performance because the deserialization and serialization happening all the time. Since I was dealing with non-user provided JSON, I wanted to map it to string with an unwrap
. I understand your point though, but I would still say that we should add String conversions (thought heavily discouraged in Docs) with unwrap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide an example of what you are exactly trying to achieve in the end. In theory diesel already exposes some helper impls (although a bit more low level) than this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to use the postgres @>
operator to compare json without me needing to understand them. Let's say I have a table rules
with json column rule
. I want to query the following:
select * from rules where '{"a": "read", "b": 2}'::jsonb @> rule;
Where the {"a": "read", "b": 2}
is user provided but validated externally and confirmed as JSON by a faster json checker (and not deserialized because I don't need the data).
So, 2 things here:
- I would never want to store
rule
asserde::Value
in my code. - I would never want to pass a certain json value as
serde::Value
since I do not want to serialize/deserialize it.
I hope I was able to explain what I mean here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general note: Diesel cannot reason about which things are externally validated. In such cases we generally choose to enforce an internal validation as well as we cannot assume that such an external validation has really happened. For this case this means I'm quite unwilling to add a ToSql<Json, DB> for String
impl here as that will clearly blow up for some use cases.
That written for a third party crate there are several possibilities to do something like that without touching serde::Value
:
- Implement a custom query node for inserting the provided json as into the query sql
- Use a new type wrapper around
String
and implementToSql<Json, DB>
for this - Implement a cast query node, send the value as plain
String
/Text
and cast tojsonb
afterwards (technically that's what's already done in your example query) - Introduce a new sql type that is basically a crate local version
Json
and implement
Beside of that diesel does currently not implement support for json operators. I think it's quite straight forward to define them via diesel::infix_operator!
in a third party crate, but adding support for a common set of such operators is definitively something I would like to see as part of diesel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I didn't know about the insert query node and cast query node. Still working on understanding diesel more. Your answer addresses my problem.
I will definitely add support for the json operators in diesel later. Just trying to get the full overview first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is anything unclear about some internals, just keep asking questions. Unfortunately not everything is documented quite well…
About the operators: See this implementation of the postgres array operators implementation as example for built in support in diesel. Something like that also works in a third party crate.
MySQL is a bit more... lax with types than other backends. It's very easy to accidentally get a 64 bit integer or decimal when other backends would continue to give you a 32 bit integer. Right now we're relying on conversion happening in `libmysqlclient` for `query_by_index`. If we ever want to dump `libmysqlclient`, we will need to move those conversions into Diesel. However, I've opted to do this in such a way that it affects `sql_query` as well. Right now there are many differences between what our query builder says and what MySQL does. (32 bit addition/subtraction returns 64 bit, 32 bit multiplication/division/sum return decimal). Ideally you should be able to have a struct that derives both `Queryable` and `QueryableByName`, and have that work with the same query built using the query builder or `sql_query`. In order for that to happen, we can't do the conversions until we hit `FromSql`, since for `sql_query` that is the first and only time we learn what the expected type is.
Also add some links to relevant mysql/mariadb documentation there
Clippy (rightful) complained that we read from a potential unaligned pointer, which would be undefined behaviour. The problem is: Clippy continues to complain about this, even if we use `read_unaligned()` here. This seems to be a clippy bug, see rust-lang/rust-clippy#2881
Is the only thing left is to get the CI working? |
At least:
|
This change fixes a out of buffer write in newer versions of libmysqlclient. That write was caused by a change in libmysqlclient around version 8.0.19 that added another field to `MYSQL_TIME` and by diesel using a statically generated binding (mysqlclient-sys). As consequence diesel would allocate a to small buffer for the struct while libmysql writes more bytes that expected, which results in a out of bound write. Interestingly this does not happen with a recent libmysqlclient version based on the sources in the mariadb source code. As a fix I move an updated version of `MYSQL_TIME` to diesel itself, as I do not have any control over `mysqlclient-sys`. This change is compatible with older libmysqlclient versions, because for them we just allocate a larger buffer than they actually use, which is fine.
@diesel-rs/reviewers Assuming that the CI is passing I think this is ready for a final review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
fn arbitrary_serde<G: quickcheck::Gen>(g: &mut G, depth: usize) -> serde_json::Value { | ||
use rand::distributions::Alphanumeric; | ||
use rand::Rng; | ||
match g.gen_range(0, if depth > 0 { 4 } else { 6 }) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the 4
and 6
be reverse? Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's correct this way, because:
- It works
- It is used to prevent endless recursion, so that we just generate objects and lists for the first level, not for anything else. That means if
depth > 0
, i.e. we've already recursed at least once we only generate variants that cannot recurse again.
(The second constrain can be lifted to allow some more nesting)
@weiznich I don't mind that you closed #1995, but these fixes don't seem to address the issue unless I'm misreading something. Is the fix to make bool read/write as text from here: #2290 (comment) If so, there should be a note in the docs about it, yes? |
@gpit2286 This PR changes the underlying behaviour in such a way that now correctly the |
MySQL is a bit more... lax with types than other backends. It's very
easy to accidentally get a 64 bit integer or decimal when other backends
would continue to give you a 32 bit integer. Right now we're relying on
conversion happening in
libmysqlclient
forquery_by_index
. If weever want to dump
libmysqlclient
, we will need to move thoseconversions into Diesel.
However, I've opted to do this in such a way that it affects
sql_query
as well. Right now there are many differences between what our query
builder says and what MySQL does. (32 bit addition/subtraction returns
64 bit, 32 bit multiplication/division/sum return decimal). Ideally you
should be able to have a struct that derives both
Queryable
andQueryableByName
, and have that work with the same query built usingthe query builder or
sql_query
.In order for that to happen, we can't do the conversions until we hit
FromSql
, since forsql_query
that is the first and only time welearn what the expected type is.
This is basically a rebase of #1457. I've adjusted the code to now already existing
MysqlValue
type. Additionally we now also have a lifetime on that type, so that we can just use&[u8]
as raw value there, instead of the unsafe pointer/cell stuff. I've opted into not having the ffi type enum inMysqlValue
, because we want to expose that type as part of #2182 and that would restrict possible implementations of the mysql backend to such ones that uselibmysqlclient
.Additionally I've added a
#[non_exhaustive]
(requires rust 1.40) to theMysqlType
enum and removed some unneed unsafe code from the date/time type conversations.