Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some special char can't insert on utf8 charset #8933

Closed
devin-zhao opened this issue Jan 4, 2019 · 10 comments
Closed

some special char can't insert on utf8 charset #8933

devin-zhao opened this issue Jan 4, 2019 · 10 comments

Comments

@devin-zhao
Copy link

devin-zhao commented Jan 4, 2019

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    create table:
    CREATE TABLE cd_channel_996 (
    id bigint(20) NOT NULL AUTO_INCREMENT,
    extra_id bigint(20) NOT NULL COMMENT '用户唯一标识',
    extra json NULL,
    weibo_extra_id varchar(16) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT '绑定的微博用户唯一标识',
    title varchar(64) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT '用户名字',
    url varchar(128) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT '用户空间页url',
    pic_url varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT '图片地址',
    intro varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT '个人简介',
    fan_count int(11) NOT NULL DEFAULT 0 COMMENT '粉丝总数',
    follow_count int(11) NOT NULL DEFAULT 0 COMMENT '关注总数',
    video_count int(11) NOT NULL DEFAULT 0 COMMENT '视频总数',
    birthday int(8) NOT NULL DEFAULT 0 COMMENT '生日',
    gender tinyint(3) NOT NULL DEFAULT 0 COMMENT '性别 1女性 2男性',
    constellation varchar(12) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT '星座',
    country varchar(24) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT '国家',
    province varchar(24) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT '省',
    city varchar(24) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT '城市',
    fan_updated_at int(8) NOT NULL DEFAULT 0 COMMENT '粉丝列表抓取日期',
    follow_updated_at int(8) NOT NULL DEFAULT 0 COMMENT '关注列表抓取日期',
    created_at timestamp(0) NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at timestamp(0) NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP(0),
    PRIMARY KEY (extra_id) USING BTREE,
    UNIQUE INDEX id(id) USING BTREE
    ) CHARACTER SET = utf8 COLLATE = utf8_general_ci COMMENT = '渠道用户';

execute sql:
INSERT INTO cd_channel_996 (extra_id, url, birthday, gender, city, country, constellation, province, title, extra) VALUES (71924065172, 'https://reflow.huoshan.com/share/user/71924065172/', 19910819, 1, '邯郸', '中国', '狮子座', '河北', '🕶小楠~', '{"uid": 71924065172}')

  1. What did you expect to see?
    INSERT INTO cd_channel_996 (extra_id, url, birthday, gender, city, country, constellation, province, title, extra) VALUES (71924065172, 'https://reflow.huoshan.com/share/user/71924065172/', 19910819, 1, '邯郸', '中国', '狮子座', '河北', '🕶小楠~', '{"uid": 71924065172}') 失败原因: (1366, 'incorrect utf8 value f09f95b6e5b08fe6a5a0efbd9e(🕶小楠~) for column title')

Notice: when I create a new table and change the filed charset to utf8mb4, it insert success

  1. What did you see instead?
    insert success

  2. What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)?
    Release Version: v2.1.2-1-g8ba8096
    Git Commit Hash: 8ba8096
    Git Branch: release-2.1
    UTC Build Time: 2018-12-21 03:45:55
    GoVersion: go version go1.11.2 linux/amd64
    Race Enabled: false
    TiKV Min Version: 2.1.0-alpha.1-ff3dd160846b7d1aed9079c389fc188f7f5ea13e
    Check Table Before Drop: false

@devin-zhao devin-zhao changed the title some special char can't insert on utf8 some special char can't insert on utf8 charset Jan 4, 2019
@crazycs520
Copy link
Contributor

@devin-zhao , try this:

ALTER TABLE cd_channel_996 CONVERT TO CHARACTER SET utf8mb4;
alter table cd_channel_996 modify title varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '' COMMENT '用户名字';

@devin-zhao
Copy link
Author

@crazycs520 show error:
alter table cd_channel_996 modify title varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '' COMMENT '用户名字'

1105 - unsupported modify column charset utf8mb4 not match origin utf8
时间: 0.008s

@crazycs520
Copy link
Contributor

@devin-zhao , Oh, sorry, 2.1 doesn't support modify column charset, could you try this in master branch?

@devin-zhao
Copy link
Author

@crazycs520 sorry, our cluster is online, can't change to master branch

@sxp1121
Copy link

sxp1121 commented Jan 4, 2019

@crazycs520 show error:
alter table cd_channel_996 modify title varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '' COMMENT '用户名字'

1105 - unsupported modify column charset utf8mb4 not match origin utf8
时间: 0.008s
try this:
alter table cd_channel_996 modify title varchar(64) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT '用户名字'

@devin-zhao
Copy link
Author

@sxp1121 alter table cd_channel_996 modify title varchar(64) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT '用户名字' run suceess, but "🕶小楠~" already can't be set to title field

@crazycs520
Copy link
Contributor

@devin-zhao The chars "🕶小楠~" is not a utf8 char, so use utf8 charset will return error. tidb 2.1 not support alter column charset to utf8mb4, So, if you don't want to use tidb master, you have to change your chars. change "🕶小楠~" to '小楠' will be ok.

@crazycs520
Copy link
Contributor

crazycs520 commented Jan 4, 2019

@devin-zhao , Or you can set sql_mode, don't use STRICT_TRANS_TABLES, this will make insert ok, but return 1 warning. But this will truncate the value of the title.

set @@sql_mode="ONLY_FULL_GROUP_BY,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION";

@darren
Copy link
Contributor

darren commented Jan 4, 2019

@devin-zhao , Or you can set sql_mode, don't use STRICT_TRANS_TABLES, this will make insert ok, but return 1 warning. But this will truncate the value of the title.

set @@sql_mode="ONLY_FULL_GROUP_BY,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION";

Truncate field? that certainly not accpetable!

use this instead as I have commented on #8939

set global tidb_skip_utf8_check = 1;

And "🕶小楠~" is for certain valid utf8 string.

MySQL uses utf8mb4 due to history reasons while TiDB does not!
IMHO, make tidb compatible with MySQL about the utf8mb4 pitfall is really stupid.

@winkyao can you give the reasons behind PR #8738?

@devin-zhao
Copy link
Author

@darren it works, thank you for your advice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants