Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(parser): make json parser case insensitive #7256

Merged
merged 12 commits into from
Jan 9, 2023

Conversation

tabVersion
Copy link
Contributor

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

This section will be used as the commit message. Please do not leave this empty!

Please explain IN DETAIL what the changes are in this PR and why they are needed:

  • Summarize your change (mandatory)
  • How does this PR work? Need a brief introduction for the changed logic (optional)
  • Describe clearly one logical change and avoid lazy messages (optional)
  • Describe any limitations of the current code (optional)

Checklist

- [ ] I have written necessary rustdoc comments

  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

  • Connector (sources & sinks)

Release note

as describe above

Refer to a related PR or issue link (optional)

None

Signed-off-by: tabVersion <tabvision@bupt.icu>
Signed-off-by: tabVersion <tabvision@bupt.icu>
Signed-off-by: tabVersion <tabvision@bupt.icu>
Copy link
Contributor

@waruto210 waruto210 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@waruto210 waruto210 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

writer.insert(|column| {
cannal_simd_json_parse_value(
&column.data_type,
v.get(column.name.as_str()),
get_column_from_value(
column.name.to_lowercase().as_str(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about passing the original value and using eq_ignore_ascii_case to reduce an allocation? I guess the optimization makes sense here because it's likely in the critical path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, to validate these choices, perhaps we should run the microbenchmark? https://github.com/risingwavelabs/risingwave/blob/main/src/source/benches/json_parser.rs

Signed-off-by: tabVersion <tabvision@bupt.icu>
This reverts commit 63ea4a5.
This reverts commit 56536d4.
Signed-off-by: tabVersion <tabvision@bupt.icu>
@tabVersion
Copy link
Contributor Author

impl a new feature in simd-json to change all keys to lowercase when deserializing

tabVersion/simd-json@fe89a0d

Signed-off-by: tabVersion <tabvision@bupt.icu>
@codecov
Copy link

codecov bot commented Jan 9, 2023

Codecov Report

Merging #7256 (d196424) into main (60a3bdb) will increase coverage by 0.00%.
The diff coverage is 88.00%.

@@           Coverage Diff           @@
##             main    #7256   +/-   ##
=======================================
  Coverage   72.98%   72.98%           
=======================================
  Files        1065     1065           
  Lines      170128   170152   +24     
=======================================
+ Hits       124163   124183   +20     
- Misses      45965    45969    +4     
Flag Coverage Δ
rust 72.98% <88.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/source/src/parser/maxwell/simd_json_parser.rs 63.63% <64.28%> (-0.65%) ⬇️
src/source/src/parser/canal/simd_json_parser.rs 63.03% <83.33%> (+0.22%) ⬆️
src/source/src/parser/canal/mod.rs 100.00% <100.00%> (ø)
src/source/src/parser/common.rs 63.79% <100.00%> (+1.29%) ⬆️
src/source/src/parser/debezium/simd_json_parser.rs 81.63% <100.00%> (+2.08%) ⬆️
src/source/src/parser/json_parser.rs 97.93% <100.00%> (+0.03%) ⬆️
src/source/src/parser/maxwell/mod.rs 100.00% <100.00%> (ø)
src/meta/src/manager/cluster.rs 76.86% <0.00%> (-0.25%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mergify mergify bot merged commit b0c36c1 into main Jan 9, 2023
@mergify mergify bot deleted the tab/source-column-to-lowercase branch January 9, 2023 13:09
@tabVersion
Copy link
Contributor Author

resolve #7228

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants