Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ListingTableUrl to decode percent #3750

Merged
merged 5 commits into from
Oct 11, 2022
Merged

Fix ListingTableUrl to decode percent #3750

merged 5 commits into from
Oct 11, 2022

Conversation

unvalley
Copy link
Contributor

@unvalley unvalley commented Oct 7, 2022

Which issue does this PR close?

Closes #3589

Rationale for this change

What changes are included in this PR?

  • Add percent_encoding crate to datafusion/core to decode percent encoded url path.

Are there any user-facing changes?

@@ -80,6 +80,7 @@ ordered-float = "3.0"
parking_lot = "0.12"
parquet = { version = "24.0.0", features = ["arrow", "async"] }
paste = "^1.0"
percent-encoding = "2.2.0"
Copy link
Contributor Author

@unvalley unvalley Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datafusion/core uses url crate by servo.
percent-encoding is also published by the same repository.

@@ -108,7 +109,8 @@ impl ListingTableUrl {

/// Creates a new [`ListingTableUrl`] from a url and optional glob expression
fn new(url: Url, glob: Option<Pattern>) -> Self {
let prefix = Path::parse(url.path()).expect("should be URL safe");
let decoded_path = percent_encoding::percent_decode_str(url.path()).decode_utf8_lossy();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the url.path(), it is still percent encoded string.
percent_decode_str decodes it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will cause the parse call below to fail in the event of non-ASCII characters.

I think it should just be a case of change Path::parse to Path::from

@unvalley
Copy link
Contributor Author

unvalley commented Oct 7, 2022

@alamb PTAL?

@@ -246,6 +248,9 @@ mod tests {
let url = ListingTableUrl::parse("file:///foo").unwrap();
let child = Path::parse("/foob/bar").unwrap();
assert!(url.strip_prefix(&child).is_none());

let url = ListingTableUrl::parse("file:///with space/foo/bar").unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get a test of a non-ASCII character such as a percent encoded emoji or something

We should also test non-URL safe characters like a percent encoded ?

Copy link
Contributor Author

@unvalley unvalley Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I added some test cases at 439f6dc
if it's not enough or wrong to test, please tell me about it.

@unvalley unvalley requested a review from tustvold October 7, 2022 14:24
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

@github-actions github-actions bot added the core Core DataFusion crate label Oct 10, 2022
@tustvold
Copy link
Contributor

I think this just needs a cargo fmt and then we can get it in 😃

@unvalley
Copy link
Contributor Author

@tustvold Oh sorry, I added a commit for that😺

Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @unvalley

@alamb alamb merged commit 2352f3e into apache:master Oct 11, 2022
@ursabot
Copy link

ursabot commented Oct 11, 2022

Benchmark runs are scheduled for baseline = ac1631a and contender = 2352f3e. 2352f3e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

isidentical added a commit to isidentical/arrow-datafusion that referenced this pull request Oct 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

register_csv allow space in table_path
5 participants