Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VTX-666: Sync from upstream #51

Merged
merged 1,994 commits into from
Sep 26, 2024
Merged

VTX-666: Sync from upstream #51

merged 1,994 commits into from
Sep 26, 2024

Conversation

fsdvh
Copy link
Collaborator

@fsdvh fsdvh commented Sep 26, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

judahrand and others added 30 commits June 21, 2024 12:51
* Add specific fixed size list concat test

* Add fixed size list concat benchmark

* Improve `FixedSizeList` concat performance for large list

* `cargo fmt`

* Increase size of `FixedSizeList` benchmark data

* Get capacity recursively for `FixedSizeList`

* Reuse `Capacities::List` to avoid breaking change

* Use correct default capacities

* Avoid a `Box::new()` when not needed

* format

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
* add neq/eq benchmark for String/ViewArray

* move bench to comparsion kernel

* clean unnecessary dep

* make clippy happy
…s are different (apache#5703)

* Add the ability for Maps to cast to another case where the field names are different.

Arrow Maps have field names for the elements of the fields, the field names are allowed to be any value and do not affect the type of the data.

This allows a Map where the field names are key_value, key, value to be mapped to a entries, keys, values.

This can be helpful in merging record batches that may have come from different sources.  This also makes maps behave similar to lists which also have a field to distinguish their elements.

* Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Feedback from code review

- simplify map casting logic to reuse the entries
- Added unit tests for negative cases
- Use MapBuilder to make the intended type clearer.

* fix formatting

* Lint and format

* correctly set the null fields

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…apache#5913)

Updates the requirements on [zstd-sys](https://github.com/gyscos/zstd-rs) to permit the latest version.
- [Release notes](https://github.com/gyscos/zstd-rs/releases)
- [Commits](gyscos/zstd-rs@zstd-sys-2.0.7...zstd-sys-2.0.11)

---
updated-dependencies:
- dependency-name: zstd-sys
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* add impl for box

* update

* another update

* small fix
* implement arrow-row encoding/decoding for view types

* add doc comments, better error msg, more test coverage

* ensure no performance regression

* update perf

* fix bug

* make fmt happy

* Update arrow-array/src/array/byte_view_array.rs

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>

* update

* update comments

* move cmp around

* move things around and remove inline hint

* Update arrow-array/src/array/byte_view_array.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update arrow-ord/src/cmp.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* return error instead of panic

* remove unnecessary func

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
…pache#5946)

Updates the requirements on [quick-xml](https://github.com/tafia/quick-xml) to permit the latest version.
- [Release notes](https://github.com/tafia/quick-xml/releases)
- [Changelog](https://github.com/tafia/quick-xml/blob/master/Changelog.md)
- [Commits](tafia/quick-xml@v0.32.0...v0.33.0)

---
updated-dependencies:
- dependency-name: quick-xml
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* like for string view array

* fix bug

* update doc

* update tests
* test: Add unit test for extending slice of list array

* For review
…pache#5954)

Updates the requirements on [quick-xml](https://github.com/tafia/quick-xml) to permit the latest version.
- [Release notes](https://github.com/tafia/quick-xml/releases)
- [Changelog](https://github.com/tafia/quick-xml/blob/master/Changelog.md)
- [Commits](tafia/quick-xml@v0.33.0...v0.34.0)

---
updated-dependencies:
- dependency-name: quick-xml
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Improve error message for unsupported nested comparison

* Update arrow-ord/src/cmp.rs

Co-authored-by: Jay Zhan <jayzhan211@gmail.com>

---------

Co-authored-by: Jay Zhan <jayzhan211@gmail.com>
* skip iterator removed from primitive encoding

* special cases for not-null primitives encoding

* faster iterators for nullable columns
* Document process for PRs with breaking changes

* ticket reference

* Update CONTRIBUTING.md

Co-authored-by: Xuanwo <github@xuanwo.io>

---------

Co-authored-by: Xuanwo <github@xuanwo.io>
…pache#5928)

* Expose IntervalMonthDayNano and IntervalDayMonth and update docs

* fix doc test
* implement sort for view types

* add bench for binary/binary view
* implement sort for view types

* add bench for binary/binary view

* add view buffer, prepare for byte_view_array reader

* make clippy happy

* reuse make_view_unchecked

* Update parquet/src/arrow/buffer/view_buffer.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* update

* rename and inline

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* failing test

* Handle dict ID assignment during flight encoding/decoding

* remove println

* One more println

* Make auto-assign optional

* Update docs

* Remove breaking change

* Update arrow-ipc/src/writer.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Remove breaking change to DictionaryTracker ctor

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Make ObjectStoreScheme public

* Fix clippy, add docs and examples

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* support def_level=1 but non-null column in reader

* update comment, adapt ut to the uuid change

---------

Co-authored-by: Ye Yuan <yuanye_ptr@qq.com>
RobinLin666 and others added 19 commits September 21, 2024 06:15
* Update Azure dependencies and add support for Fabric token authentication

* Refactor Azure credential provider to support Fabric token authentication

* Refactor Azure credential provider to remove unnecessary print statements and improve token handling

* Bump object_store version to 0.11.0

* Refactor Azure credential provider to remove unnecessary print statements and improve token handling
* add benchmark

* add optimization

* fix

* fix

* cargo fmt

* clippy

* Update arrow-data/src/decimal.rs

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

* optimize to avoid allocating an idx variable

* revert change to public api

* fix error in rustdoc

---------

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
…egexp_is_match_scalar` function, deprecate `regexp_is_match_utf8` and `regexp_is_match_utf8_scalar` (apache#6376)

* Implement native support StringViewArray for regex_is_match function

* Update test cases cover StringViewArray length more then 12 bytes

* Add StringView benchmark for regexp_is_match

Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com>

* Implement native support StringViewArray for regex_is_match function

Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com>

* Remove duplicate implementation, fix clippy, add docs

more

---------

Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Especially when transferring large amounts of data over HTTP/2, this can
massively reduce the overhead.
* chore: add docs, part of #37

- add pragma `#![warn(missing_docs)]` to the following
  - `arrow-array`
  - `arrow-cast`
  - `arrow-csv`
  - `arrow-data`
  - `arrow-json`
  - `arrow-ord`
  - `arrow-pyarrow-integration-testing`
  - `arrow-row`
  - `arrow-schema`
  - `arrow-select`
  - `arrow-string`
  - `arrow`
  - `parquet_derive`

- add docs to those that generated lint warnings

- Remove `bitflags` workaround in `arrow-schema`
At some point, a change in `bitflags v2.3.0` had
started generating lint warnings in `arrow-schema`,

This was handled using a
[workaround](apache#4233)

[Issue](bitflags/bitflags#356)

`bitflags v2.3.1` fixed the issue hence the
workaround is no longer needed.

* fix: resolve comments on PR apache#6433
* fix CI errors

* apply suggestion from review

Co-authored-by: ngli-me <107162634+ngli-me@users.noreply.github.com>

---------

Co-authored-by: ngli-me <107162634+ngli-me@users.noreply.github.com>
* Update prost-build requirement from =0.13.2 to =0.13.3

Updates the requirements on [prost-build](https://github.com/tokio-rs/prost) to permit the latest version.
- [Release notes](https://github.com/tokio-rs/prost/releases)
- [Changelog](https://github.com/tokio-rs/prost/blob/master/CHANGELOG.md)
- [Commits](tokio-rs/prost@v0.13.2...v0.13.3)

---
updated-dependencies:
- dependency-name: prost-build
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* update vendored code

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* add ParquetMetaDataReader

* clippy

* Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* formatting

* add ParquetMetaDataReader to module documentation

* document erros returned from `try_parse_sized`

* oops

* rename methods per review suggestion

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* feat: add union_extract kernel

* fix: reexport union_extract in arrow crate

* add tests, improve docs, simplify code

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…especting not preserving dict ID (apache#6444)

* arrow-ipc: Add test for non preserving dict ID behavior with same ID

* arrow-ipc: Always set dict ID in IPC from dictionary tracker

This decouples dictionary IDs that end up in IPC from the schema further
because the dictionary tracker always first gathers the dict ID for each
field whether it is pre-defined and preserved or not.

Then when actually writing the IPC bytes the dictionary ID is always
taken from the dictionary tracker as opposed to falling back to the
`Field` of the `Schema`.

* arrow-ipc: Read dictionary IDs from dictionary tracker in correct order

When dictionary IDs are not preserved, then they are assigned depth
first, however, when reading them from the dictionary tracker to write
the IPC bytes, they were previously read from the dictionary tracker in
the order that the schema is traversed (first come first serve), which
caused an incorrect order of dictionaries serialized in IPC.

* Refine IpcSchemaEncoder API and docs

* reduce repeated code

* Fix lints

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…e#6441)

* Minor: Add additional documentation and builder APIs to `SortOptions`

* Port some uses

* Update defaults

* Add nulls_first() and nulls_last() and more examples
…pache#6450)

* workaround for missing page indexes

* remove empty line

* Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* fmt

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…pache#6452)

* Support cast between Durations

Signed-off-by: tison <wander4096@gmail.com>

* Support cast between Durations and all numeric type

Signed-off-by: tison <wander4096@gmail.com>

* Impl cast between Durations

Signed-off-by: tison <wander4096@gmail.com>

* Add test_cast_between_durations

Signed-off-by: tison <wander4096@gmail.com>

* add test cases

Signed-off-by: tison <wander4096@gmail.com>

* cargo clippy

Signed-off-by: tison <wander4096@gmail.com>

---------

Signed-off-by: tison <wander4096@gmail.com>
@fsdvh fsdvh changed the title Sync from upstream VTX-666: Sync from upstream Sep 26, 2024
@fsdvh fsdvh merged commit f6df09b into master Sep 26, 2024
35 checks passed
@fsdvh fsdvh deleted the sync-from-upstream branch September 26, 2024 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.