Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update string-view branch to arrow-rs main #10966

Merged
merged 6 commits into from
Jun 18, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jun 17, 2024

Which issue does this PR close?

Part of #10961 and #10918

Note this targets the string-view branch and NOT main

Rationale for this change

As we work through StringView development we are going to use an integration branch pointed at the arrow main branch

What changes are included in this PR?

  1. Pin the string-view branch to upstream arrow
  2. Update to fix some clippy errors from upstream changes
  3. Uncomment a test that is waiting on upstream code

Are these changes tested?

Are there any user-facing changes?

@alamb alamb marked this pull request as ready for review June 17, 2024 20:12
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jun 17, 2024
@@ -5446,16 +5446,12 @@ mod tests {
DataType::Dictionary(Box::new(DataType::Int32), Box::new(DataType::Utf8)),
);

// needs https://github.com/apache/arrow-rs/issues/5893
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test now works!

@alamb alamb changed the title Alamb/string view arrow pin Update string-view branch to arrow-rs main Jun 17, 2024
@alamb
Copy link
Contributor Author

alamb commented Jun 18, 2024

Since we are not merging to main, let's start this feature branch!

@alamb
Copy link
Contributor Author

alamb commented Jun 18, 2024

FYI @XiangpengHao and @Weijun-H with this branch i think we can continue StringView implementation in DataFusion and not be blocked by the arrow upgrade

@alamb alamb merged commit 810cce7 into apache:string-view Jun 18, 2024
25 checks passed
alamb added a commit that referenced this pull request Jul 16, 2024
…velopment branch (#11402)

* Update `string-view` branch to arrow-rs main (#10966)

* Pin to arrow main

* Fix clippy with latest arrow

* Uncomment test that needs new arrow-rs to work

* Update datafusion-cli Cargo.lock

* Update Cargo.lock

* tapelo

* feat: Implement equality = and inequality <> support for StringView (#10985)

* feat: Implement equality = and inequality <> support for StringView

* chore: Add tests for the StringView

* chore

* chore: Update tests for NULL

* fix: Used build_array_string!

* chore: Update string_coercion function to handle Utf8View type in binary.rs

* chore: add tests

* chore: ci

* Add more StringView comparison test coverage (#10997)

* Add more StringView comparison test coverage

* add reference

* Add another test showing casting on columns works correctly

* feat: Implement equality = and inequality <> support for BinaryView (#11004)

* feat: Implement equality = and inequality <> support for BinaryView

Signed-off-by: Chojan Shang <psiace@apache.org>

* chore: make fmt happy

Signed-off-by: Chojan Shang <psiace@apache.org>

---------

Signed-off-by: Chojan Shang <psiace@apache.org>

* Implement support for LargeString and LargeBinary for StringView and BinaryView (#11034)

* implement large binary

* add tests for large string

* better comments for string coercion

* Improve filter predicates with `Utf8View` literals (#11043)

* refactor: Improve type coercion logic in TypeCoercionRewriter

* refactor: Improve type coercion logic in TypeCoercionRewriter

* chore

* chore: Update test

* refactor: Improve type coercion logic in TypeCoercionRewriter

* refactor: Remove unused import and update code formatting in unwrap_cast_in_comparison.rs

* Remove arrow-patch

---------

Signed-off-by: Chojan Shang <psiace@apache.org>
Co-authored-by: Alex Huang <huangweijun1001@gmail.com>
Co-authored-by: Chojan Shang <psiace@apache.org>
Co-authored-by: Xiangpeng Hao <haoxiangpeng123@gmail.com>
XiangpengHao pushed a commit to XiangpengHao/datafusion that referenced this pull request Jul 17, 2024
* Pin to arrow main

* Fix clippy with latest arrow

* Uncomment test that needs new arrow-rs to work

* Update datafusion-cli Cargo.lock

* Update Cargo.lock

* tapelo
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 17, 2024
…velopment branch (apache#11402)

* Update `string-view` branch to arrow-rs main (apache#10966)

* Pin to arrow main

* Fix clippy with latest arrow

* Uncomment test that needs new arrow-rs to work

* Update datafusion-cli Cargo.lock

* Update Cargo.lock

* tapelo

* feat: Implement equality = and inequality <> support for StringView (apache#10985)

* feat: Implement equality = and inequality <> support for StringView

* chore: Add tests for the StringView

* chore

* chore: Update tests for NULL

* fix: Used build_array_string!

* chore: Update string_coercion function to handle Utf8View type in binary.rs

* chore: add tests

* chore: ci

* Add more StringView comparison test coverage (apache#10997)

* Add more StringView comparison test coverage

* add reference

* Add another test showing casting on columns works correctly

* feat: Implement equality = and inequality <> support for BinaryView (apache#11004)

* feat: Implement equality = and inequality <> support for BinaryView

Signed-off-by: Chojan Shang <psiace@apache.org>

* chore: make fmt happy

Signed-off-by: Chojan Shang <psiace@apache.org>

---------

Signed-off-by: Chojan Shang <psiace@apache.org>

* Implement support for LargeString and LargeBinary for StringView and BinaryView (apache#11034)

* implement large binary

* add tests for large string

* better comments for string coercion

* Improve filter predicates with `Utf8View` literals (apache#11043)

* refactor: Improve type coercion logic in TypeCoercionRewriter

* refactor: Improve type coercion logic in TypeCoercionRewriter

* chore

* chore: Update test

* refactor: Improve type coercion logic in TypeCoercionRewriter

* refactor: Remove unused import and update code formatting in unwrap_cast_in_comparison.rs

* Remove arrow-patch

---------

Signed-off-by: Chojan Shang <psiace@apache.org>
Co-authored-by: Alex Huang <huangweijun1001@gmail.com>
Co-authored-by: Chojan Shang <psiace@apache.org>
Co-authored-by: Xiangpeng Hao <haoxiangpeng123@gmail.com>
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 18, 2024
…velopment branch (apache#11402)

* Update `string-view` branch to arrow-rs main (apache#10966)

* Pin to arrow main

* Fix clippy with latest arrow

* Uncomment test that needs new arrow-rs to work

* Update datafusion-cli Cargo.lock

* Update Cargo.lock

* tapelo

* feat: Implement equality = and inequality <> support for StringView (apache#10985)

* feat: Implement equality = and inequality <> support for StringView

* chore: Add tests for the StringView

* chore

* chore: Update tests for NULL

* fix: Used build_array_string!

* chore: Update string_coercion function to handle Utf8View type in binary.rs

* chore: add tests

* chore: ci

* Add more StringView comparison test coverage (apache#10997)

* Add more StringView comparison test coverage

* add reference

* Add another test showing casting on columns works correctly

* feat: Implement equality = and inequality <> support for BinaryView (apache#11004)

* feat: Implement equality = and inequality <> support for BinaryView

Signed-off-by: Chojan Shang <psiace@apache.org>

* chore: make fmt happy

Signed-off-by: Chojan Shang <psiace@apache.org>

---------

Signed-off-by: Chojan Shang <psiace@apache.org>

* Implement support for LargeString and LargeBinary for StringView and BinaryView (apache#11034)

* implement large binary

* add tests for large string

* better comments for string coercion

* Improve filter predicates with `Utf8View` literals (apache#11043)

* refactor: Improve type coercion logic in TypeCoercionRewriter

* refactor: Improve type coercion logic in TypeCoercionRewriter

* chore

* chore: Update test

* refactor: Improve type coercion logic in TypeCoercionRewriter

* refactor: Remove unused import and update code formatting in unwrap_cast_in_comparison.rs

* Remove arrow-patch

---------

Signed-off-by: Chojan Shang <psiace@apache.org>
Co-authored-by: Alex Huang <huangweijun1001@gmail.com>
Co-authored-by: Chojan Shang <psiace@apache.org>
Co-authored-by: Xiangpeng Hao <haoxiangpeng123@gmail.com>
XiangpengHao pushed a commit to XiangpengHao/datafusion that referenced this pull request Jul 19, 2024
* Pin to arrow main

* Fix clippy with latest arrow

* Uncomment test that needs new arrow-rs to work

* Update datafusion-cli Cargo.lock

* Update Cargo.lock

* tapelo
alamb added a commit that referenced this pull request Jul 19, 2024
* Update `string-view` branch to arrow-rs main (#10966)

* Pin to arrow main

* Fix clippy with latest arrow

* Uncomment test that needs new arrow-rs to work

* Update datafusion-cli Cargo.lock

* Update Cargo.lock

* tapelo

* merge

* update cast

* consistent dep

* fix ci

* add more tests

* make doc happy

* update new implementation

* fix bug

* avoid unused dep

* update dep

* update

* fix cargo check

* update doc

* pick up the comments change again

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
XiangpengHao pushed a commit to XiangpengHao/datafusion that referenced this pull request Jul 19, 2024
* Pin to arrow main

* Fix clippy with latest arrow

* Uncomment test that needs new arrow-rs to work

* Update datafusion-cli Cargo.lock

* Update Cargo.lock

* tapelo
alamb added a commit that referenced this pull request Jul 20, 2024
…11519)

* add functions

* Update `string-view` branch to arrow-rs main (#10966)

* Pin to arrow main

* Fix clippy with latest arrow

* Uncomment test that needs new arrow-rs to work

* Update datafusion-cli Cargo.lock

* Update Cargo.lock

* tapelo

* merge

* update cast

* consistent dep

* fix ci

* avoid unused dep

* update dep

* update

* fix cargo check

* better group value view aggregation

* update

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
alamb added a commit that referenced this pull request Jul 29, 2024
… some ClickBench queries (not on by default) (#11667)

* Pin to pre-release version of arrow 52.2.0

* Update for deprecated method

* Add a config to force using string view in benchmark (#11514)

* add a knob to force string view in benchmark

* fix sql logic test

* update doc

* fix ci

* fix ci only test

* Update benchmarks/src/util/options.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update datafusion/common/src/config.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* update tests

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Add String view helper functions (#11517)

* add functions

* add tests for hash util

* Add ArrowBytesViewMap and ArrowBytesViewSet (#11515)

* Update `string-view` branch to arrow-rs main (#10966)

* Pin to arrow main

* Fix clippy with latest arrow

* Uncomment test that needs new arrow-rs to work

* Update datafusion-cli Cargo.lock

* Update Cargo.lock

* tapelo

* merge

* update cast

* consistent dep

* fix ci

* add more tests

* make doc happy

* update new implementation

* fix bug

* avoid unused dep

* update dep

* update

* fix cargo check

* update doc

* pick up the comments change again

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Enable `GroupValueBytesView` for aggregation with StringView types (#11519)

* add functions

* Update `string-view` branch to arrow-rs main (#10966)

* Pin to arrow main

* Fix clippy with latest arrow

* Uncomment test that needs new arrow-rs to work

* Update datafusion-cli Cargo.lock

* Update Cargo.lock

* tapelo

* merge

* update cast

* consistent dep

* fix ci

* avoid unused dep

* update dep

* update

* fix cargo check

* better group value view aggregation

* update

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Initial support for regex_replace on `StringViewArray` (#11556)

* initial support for string view regex

* update tests

* Add support for Utf8View for date/temporal codepaths (#11518)

* Add StringView support for date_part and make_date funcs

* run cargo update in datafusion-cli

* cargo fmt

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* GC `StringViewArray` in `CoalesceBatchesStream` (#11587)

* gc string view when appropriate

* make clippy happy

* address comments

* make doc happy

* update style

* Add comments and tests for gc_string_view_batch

* better herustic

* update test

* Update datafusion/physical-plan/src/coalesce_batches.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* [Bug] fix bug in return type inference of `utf8_to_int_type` (#11662)

* fix bug in return type inference

* update doc

* add tests

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Fix clippy

* Increase ByteViewMap block size to 2MB (#11674)

* better default block size

* fix related test

* Change `--string-view` to only apply to parquet formats (#11663)

* use inferenced schema, don't load schema again

* move config to parquet-only

* update

* update

* better format

* format

* update

* Implement native support StringView for character length (#11676)

* native support for character length

* Update datafusion/functions/src/unicode/character_length.rs

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Remove uneeded patches

* cargo fmt

---------

Co-authored-by: Xiangpeng Hao <haoxiangpeng123@gmail.com>
Co-authored-by: Xiangpeng Hao <me@haoxp.xyz>
Co-authored-by: Andrew Duffy <a10y@users.noreply.github.com>
wiedld pushed a commit to influxdata/arrow-datafusion that referenced this pull request Jul 31, 2024
…velopment branch (apache#11402)

* Update `string-view` branch to arrow-rs main (apache#10966)

* Pin to arrow main

* Fix clippy with latest arrow

* Uncomment test that needs new arrow-rs to work

* Update datafusion-cli Cargo.lock

* Update Cargo.lock

* tapelo

* feat: Implement equality = and inequality <> support for StringView (apache#10985)

* feat: Implement equality = and inequality <> support for StringView

* chore: Add tests for the StringView

* chore

* chore: Update tests for NULL

* fix: Used build_array_string!

* chore: Update string_coercion function to handle Utf8View type in binary.rs

* chore: add tests

* chore: ci

* Add more StringView comparison test coverage (apache#10997)

* Add more StringView comparison test coverage

* add reference

* Add another test showing casting on columns works correctly

* feat: Implement equality = and inequality <> support for BinaryView (apache#11004)

* feat: Implement equality = and inequality <> support for BinaryView

Signed-off-by: Chojan Shang <psiace@apache.org>

* chore: make fmt happy

Signed-off-by: Chojan Shang <psiace@apache.org>

---------

Signed-off-by: Chojan Shang <psiace@apache.org>

* Implement support for LargeString and LargeBinary for StringView and BinaryView (apache#11034)

* implement large binary

* add tests for large string

* better comments for string coercion

* Improve filter predicates with `Utf8View` literals (apache#11043)

* refactor: Improve type coercion logic in TypeCoercionRewriter

* refactor: Improve type coercion logic in TypeCoercionRewriter

* chore

* chore: Update test

* refactor: Improve type coercion logic in TypeCoercionRewriter

* refactor: Remove unused import and update code formatting in unwrap_cast_in_comparison.rs

* Remove arrow-patch

---------

Signed-off-by: Chojan Shang <psiace@apache.org>
Co-authored-by: Alex Huang <huangweijun1001@gmail.com>
Co-authored-by: Chojan Shang <psiace@apache.org>
Co-authored-by: Xiangpeng Hao <haoxiangpeng123@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant