Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert data need to filter spaces #3660

Closed
zimulala opened this issue Jul 7, 2017 · 3 comments · Fixed by #3878
Closed

Insert data need to filter spaces #3660

zimulala opened this issue Jul 7, 2017 · 3 comments · Fixed by #3878
Assignees
Labels
type/bug The issue is confirmed as a bug. type/compatibility

Comments

@zimulala
Copy link
Contributor

zimulala commented Jul 7, 2017

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
create table t (a char(10) not null, b char (10) binary not null);
insert into t values("hello ", "hello   ");
select b, length(b) from t;
  1. What did you expect to see?
+-------+-----------+
| b     | length(b) |
+-------+-----------+
| hello |         5 |
+-------+-----------+
1 row in set (0.00 sec)
  1. What did you see instead?
+----------+-----------+
| b        | length(b) |
+----------+-----------+
| hello    |         8 |
+----------+-----------+
1 row in set (0.00 sec)
  1. What version of TiDB are you using (tidb-server -V)?
@zimulala zimulala added type/bug The issue is confirmed as a bug. type/compatibility labels Jul 7, 2017
@zz-jason
Copy link
Member

zz-jason commented Jul 7, 2017

Refer to: https://dev.mysql.com/doc/refman/5.7/en/char.html, the correct behavior of CHAR values is described as following:

When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed unless the PAD_CHAR_TO_FULL_LENGTH SQL mode is enabled.

the behavior of char/varchar/tinytext/text/mediumtext/longtext is weird:

example 1:

create table t(a char(10), b varchar(10), c tinytext, d text, e mediumtext, f longtext);
insert into t (a, b, c, d, e, f) values ("a ", "b ", "c ", "d ", "e ", "f ");
select length(a), length(b), length(c), length(d), length(e), length(f) from t;
+-----------+-----------+-----------+-----------+-----------+-----------+
| length(a) | length(b) | length(c) | length(d) | length(e) | length(f) |
+-----------+-----------+-----------+-----------+-----------+-----------+
|         1 |         2 |         2 |         2 |         2 |         2 |
+-----------+-----------+-----------+-----------+-----------+-----------+

select repeat(a, 2), repeat(b, 2), repeat(c, 2), repeat(d, 2), repeat(e, 2), repeat(f, 2) from t;
+--------------+--------------+--------------+--------------+--------------+--------------+
| repeat(a, 2) | repeat(b, 2) | repeat(c, 2) | repeat(d, 2) | repeat(e, 2) | repeat(f, 2) |
+--------------+--------------+--------------+--------------+--------------+--------------+
| aa           | b b          | c c          | d d          | e e          | f f          |
+--------------+--------------+--------------+--------------+--------------+--------------+

it seems like char value's trailing spaces will be truncated while others trailing spaces will be retained, but the following example makes things complicated:

example 2:

select a="a  ", b="b  ", c="c  ", d="d  ", e="e  ", f="f  " from t;
+---------+---------+---------+---------+---------+---------+
| a="a  " | b="b  " | c="c  " | d="d  " | e="e  " | f="f  " |
+---------+---------+---------+---------+---------+---------+
|       1 |       1 |       1 |       1 |       1 |       1 |
+---------+---------+---------+---------+---------+---------+

@breezewish
Copy link
Member

example 2 may relate to this:

All MySQL collations are of type PAD SPACE. This means that all CHAR, VARCHAR, and TEXT values are compared without regard to any trailing spaces.

https://dev.mysql.com/doc/refman/5.7/en/char.html

@zz-jason
Copy link
Member

zz-jason commented Jul 14, 2017

leave out ENUM and SET, mysql has these string types:

name binary flag tidb representation
CHAR[(M)] 0 TypeString
BINARY(M) 1 TypeString
VARCHAR(M) 0 TypeVarString/TypeVarchar
VARBINARY(M) 1 TypeVarString/TypeVarchar
TINYTEXT 0 TypeTinyBlob
TINYBLOB 1 TypeTinyBlob
TEXT[(M)] 0 TypeBlob
BLOB(M) 1 TypeBlob
MEDIUMTEXT 0 TypeMediumBlob
MEDIUMBLOB 1 TypeMediumBlob
LONGTEXT 0 TypeLongBlob
LONGBLOB 1 TypeLongBlob
  1. TypeVarString, TypeTinyBlob, TypeBlob, TypeMediumBlob, TypeLongBlob
    for these types, no mater whether binary flag is set, we should store what user inserted, do not append spaces in the tail or truncate the trailing spaces
  2. TypeString without binary flag, i.e CHAR[(M)]
    for this type, we should truncate the trailing spaces
  3. TypeString with binary flag, i.e BINARY(M)
    for this type, we should append spaces in the tail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The issue is confirmed as a bug. type/compatibility
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants