forked from apache/arrow-rs
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VTX-666: Sync from upstream #51
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Add specific fixed size list concat test * Add fixed size list concat benchmark * Improve `FixedSizeList` concat performance for large list * `cargo fmt` * Increase size of `FixedSizeList` benchmark data * Get capacity recursively for `FixedSizeList` * Reuse `Capacities::List` to avoid breaking change * Use correct default capacities * Avoid a `Box::new()` when not needed * format --------- Co-authored-by: Will Jones <willjones127@gmail.com>
* add neq/eq benchmark for String/ViewArray * move bench to comparsion kernel * clean unnecessary dep * make clippy happy
…s are different (apache#5703) * Add the ability for Maps to cast to another case where the field names are different. Arrow Maps have field names for the elements of the fields, the field names are allowed to be any value and do not affect the type of the data. This allows a Map where the field names are key_value, key, value to be mapped to a entries, keys, values. This can be helpful in merging record batches that may have come from different sources. This also makes maps behave similar to lists which also have a field to distinguish their elements. * Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Feedback from code review - simplify map casting logic to reuse the entries - Added unit tests for negative cases - Use MapBuilder to make the intended type clearer. * fix formatting * Lint and format * correctly set the null fields --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…apache#5913) Updates the requirements on [zstd-sys](https://github.com/gyscos/zstd-rs) to permit the latest version. - [Release notes](https://github.com/gyscos/zstd-rs/releases) - [Commits](gyscos/zstd-rs@zstd-sys-2.0.7...zstd-sys-2.0.11) --- updated-dependencies: - dependency-name: zstd-sys dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* add impl for box * update * another update * small fix
* implement arrow-row encoding/decoding for view types * add doc comments, better error msg, more test coverage * ensure no performance regression * update perf * fix bug * make fmt happy * Update arrow-array/src/array/byte_view_array.rs Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com> * update * update comments * move cmp around * move things around and remove inline hint * Update arrow-array/src/array/byte_view_array.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Update arrow-ord/src/cmp.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * return error instead of panic * remove unnecessary func --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
…pache#5946) Updates the requirements on [quick-xml](https://github.com/tafia/quick-xml) to permit the latest version. - [Release notes](https://github.com/tafia/quick-xml/releases) - [Changelog](https://github.com/tafia/quick-xml/blob/master/Changelog.md) - [Commits](tafia/quick-xml@v0.32.0...v0.33.0) --- updated-dependencies: - dependency-name: quick-xml dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* like for string view array * fix bug * update doc * update tests
* test: Add unit test for extending slice of list array * For review
…pache#5954) Updates the requirements on [quick-xml](https://github.com/tafia/quick-xml) to permit the latest version. - [Release notes](https://github.com/tafia/quick-xml/releases) - [Changelog](https://github.com/tafia/quick-xml/blob/master/Changelog.md) - [Commits](tafia/quick-xml@v0.33.0...v0.34.0) --- updated-dependencies: - dependency-name: quick-xml dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Improve error message for unsupported nested comparison * Update arrow-ord/src/cmp.rs Co-authored-by: Jay Zhan <jayzhan211@gmail.com> --------- Co-authored-by: Jay Zhan <jayzhan211@gmail.com>
* skip iterator removed from primitive encoding * special cases for not-null primitives encoding * faster iterators for nullable columns
* Document process for PRs with breaking changes * ticket reference * Update CONTRIBUTING.md Co-authored-by: Xuanwo <github@xuanwo.io> --------- Co-authored-by: Xuanwo <github@xuanwo.io>
…pache#5928) * Expose IntervalMonthDayNano and IntervalDayMonth and update docs * fix doc test
* implement sort for view types * add bench for binary/binary view
* implement sort for view types * add bench for binary/binary view * add view buffer, prepare for byte_view_array reader * make clippy happy * reuse make_view_unchecked * Update parquet/src/arrow/buffer/view_buffer.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * update * rename and inline --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* failing test * Handle dict ID assignment during flight encoding/decoding * remove println * One more println * Make auto-assign optional * Update docs * Remove breaking change * Update arrow-ipc/src/writer.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Remove breaking change to DictionaryTracker ctor --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Make ObjectStoreScheme public * Fix clippy, add docs and examples --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* support def_level=1 but non-null column in reader * update comment, adapt ut to the uuid change --------- Co-authored-by: Ye Yuan <yuanye_ptr@qq.com>
* Update Azure dependencies and add support for Fabric token authentication * Refactor Azure credential provider to support Fabric token authentication * Refactor Azure credential provider to remove unnecessary print statements and improve token handling * Bump object_store version to 0.11.0 * Refactor Azure credential provider to remove unnecessary print statements and improve token handling
* add benchmark * add optimization * fix * fix * cargo fmt * clippy * Update arrow-data/src/decimal.rs Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> * optimize to avoid allocating an idx variable * revert change to public api * fix error in rustdoc --------- Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
…egexp_is_match_scalar` function, deprecate `regexp_is_match_utf8` and `regexp_is_match_utf8_scalar` (apache#6376) * Implement native support StringViewArray for regex_is_match function * Update test cases cover StringViewArray length more then 12 bytes * Add StringView benchmark for regexp_is_match Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com> * Implement native support StringViewArray for regex_is_match function Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com> * Remove duplicate implementation, fix clippy, add docs more --------- Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Especially when transferring large amounts of data over HTTP/2, this can massively reduce the overhead.
* chore: add docs, part of #37 - add pragma `#![warn(missing_docs)]` to the following - `arrow-array` - `arrow-cast` - `arrow-csv` - `arrow-data` - `arrow-json` - `arrow-ord` - `arrow-pyarrow-integration-testing` - `arrow-row` - `arrow-schema` - `arrow-select` - `arrow-string` - `arrow` - `parquet_derive` - add docs to those that generated lint warnings - Remove `bitflags` workaround in `arrow-schema` At some point, a change in `bitflags v2.3.0` had started generating lint warnings in `arrow-schema`, This was handled using a [workaround](apache#4233) [Issue](bitflags/bitflags#356) `bitflags v2.3.1` fixed the issue hence the workaround is no longer needed. * fix: resolve comments on PR apache#6433
* fix CI errors * apply suggestion from review Co-authored-by: ngli-me <107162634+ngli-me@users.noreply.github.com> --------- Co-authored-by: ngli-me <107162634+ngli-me@users.noreply.github.com>
* Update prost-build requirement from =0.13.2 to =0.13.3 Updates the requirements on [prost-build](https://github.com/tokio-rs/prost) to permit the latest version. - [Release notes](https://github.com/tokio-rs/prost/releases) - [Changelog](https://github.com/tokio-rs/prost/blob/master/CHANGELOG.md) - [Commits](tokio-rs/prost@v0.13.2...v0.13.3) --- updated-dependencies: - dependency-name: prost-build dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * update vendored code --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* add ParquetMetaDataReader * clippy * Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * formatting * add ParquetMetaDataReader to module documentation * document erros returned from `try_parse_sized` * oops * rename methods per review suggestion --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* feat: add union_extract kernel * fix: reexport union_extract in arrow crate * add tests, improve docs, simplify code --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…especting not preserving dict ID (apache#6444) * arrow-ipc: Add test for non preserving dict ID behavior with same ID * arrow-ipc: Always set dict ID in IPC from dictionary tracker This decouples dictionary IDs that end up in IPC from the schema further because the dictionary tracker always first gathers the dict ID for each field whether it is pre-defined and preserved or not. Then when actually writing the IPC bytes the dictionary ID is always taken from the dictionary tracker as opposed to falling back to the `Field` of the `Schema`. * arrow-ipc: Read dictionary IDs from dictionary tracker in correct order When dictionary IDs are not preserved, then they are assigned depth first, however, when reading them from the dictionary tracker to write the IPC bytes, they were previously read from the dictionary tracker in the order that the schema is traversed (first come first serve), which caused an incorrect order of dictionaries serialized in IPC. * Refine IpcSchemaEncoder API and docs * reduce repeated code * Fix lints --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…e#6441) * Minor: Add additional documentation and builder APIs to `SortOptions` * Port some uses * Update defaults * Add nulls_first() and nulls_last() and more examples
…pache#6450) * workaround for missing page indexes * remove empty line * Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * fmt --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…pache#6452) * Support cast between Durations Signed-off-by: tison <wander4096@gmail.com> * Support cast between Durations and all numeric type Signed-off-by: tison <wander4096@gmail.com> * Impl cast between Durations Signed-off-by: tison <wander4096@gmail.com> * Add test_cast_between_durations Signed-off-by: tison <wander4096@gmail.com> * add test cases Signed-off-by: tison <wander4096@gmail.com> * cargo clippy Signed-off-by: tison <wander4096@gmail.com> --------- Signed-off-by: tison <wander4096@gmail.com>
thinkharderdev
approved these changes
Sep 26, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?