Implement UnionArray logical_nulls #6303

gstvg · 2024-08-24T20:49:01Z

Which issue does this PR close?

Rationale for this change

N/A

What changes are included in this PR?

UnionArray::logical_nulls implementation, tests and benches
is_null and is_not_null tests on unions
Check if sparse union child arrays length match the length of the parent union

Are there any user-facing changes?

UnionArray::logical_nulls return correct results

Additional info

This is a port of apache/datafusion#11321

arrow-array/src/array/union_array.rs

gstvg · 2024-08-24T22:49:36Z

cc @samuelcolvin

samuelcolvin · 2024-08-25T07:03:23Z

Looks good from a brief check, what's the result of running the benchmarks?

gstvg · 2024-08-25T14:05:10Z

$ RUSTFLAGS='-C target-feature=+avx2' cargo bench --bench union_array

union logical nulls 4096 1 children
                        time:   [8.9139 ns 8.9932 ns 9.0887 ns]
Found 18 outliers among 100 measurements (18.00%)
  5 (5.00%) high mild
  13 (13.00%) high severe

union logical nulls 4096 2 children
                        time:   [1.4903 µs 1.4969 µs 1.5055 µs]
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) low severe
  3 (3.00%) high mild
  14 (14.00%) high severe

union logical nulls 4096 3 children
                        time:   [6.1631 µs 6.1999 µs 6.2449 µs]
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) high mild
  12 (12.00%) high severe

union logical nulls 4096 4 children
                        time:   [6.1878 µs 6.2374 µs 6.2976 µs]
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) high mild
  15 (15.00%) high severe

union logical nulls 4096 5 children
                        time:   [6.1996 µs 6.2488 µs 6.3094 µs]
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) high mild
  17 (17.00%) high severe

union logical nulls 4096 6 children
                        time:   [6.2286 µs 6.2827 µs 6.3441 µs]
Found 21 outliers among 100 measurements (21.00%)
  3 (3.00%) high mild
  18 (18.00%) high severe

single with nulls 4096  time:   [1.2346 µs 1.2493 µs 1.2678 µs]
Found 14 outliers among 100 measurements (14.00%)
  1 (1.00%) high mild
  13 (13.00%) high severe

gstvg · 2024-08-25T15:53:33Z

~~Studying a few fast paths~~

Edit: Now it runs faster when there are up to ~10 fields with nulls, with timings slowly increasing for every field with nulls, and from there timings stabilize and depends only on the length of the union

New results

union logical nulls 4096 1 children with nulls, 0 without nulls
                        time:   [8.3308 ns 8.4185 ns 8.5348 ns]
Found 17 outliers among 100 measurements (17.00%)
  3 (3.00%) high mild
  14 (14.00%) high severe

union logical nulls 4096 1 children with nulls, 1 without nulls
                        time:   [815.86 ns 821.39 ns 827.97 ns]
                        change: [-0.6460% +0.4002% +1.4415%] (p = 0.48 > 0.05)
                        No change in performance detected.
Found 17 outliers among 100 measurements (17.00%)
  1 (1.00%) high mild
  16 (16.00%) high severe

union logical nulls 4096 1 children with nulls, 10 without nulls
                        time:   [830.73 ns 833.54 ns 837.27 ns]
                        change: [+0.1174% +0.9252% +1.8490%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe

union logical nulls 4096 2 children with nulls, 0 without nulls
                        time:   [866.46 ns 870.39 ns 875.34 ns]
Found 15 outliers among 100 measurements (15.00%)
  15 (15.00%) high severe

union logical nulls 4096 2 children with nulls, 1 without nulls
                        time:   [1.4325 µs 1.4411 µs 1.4516 µs]
                        change: [-0.0205% +0.6804% +1.4063%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) high mild
  15 (15.00%) high severe

union logical nulls 4096 2 children with nulls, 10 without nulls
                        time:   [1.4605 µs 1.4709 µs 1.4833 µs]
                        change: [-1.6473% -0.3650% +0.8144%] (p = 0.58 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  16 (16.00%) high severe

union logical nulls 4096 3 children with nulls, 0 without nulls
                        time:   [1.5076 µs 1.5214 µs 1.5380 µs]
Found 15 outliers among 100 measurements (15.00%)
  7 (7.00%) high mild
  8 (8.00%) high severe

union logical nulls 4096 3 children with nulls, 1 without nulls
                        time:   [2.0524 µs 2.0704 µs 2.0963 µs]
                        change: [-0.6605% +0.1153% +0.9818%] (p = 0.79 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) high mild
  12 (12.00%) high severe

union logical nulls 4096 3 children with nulls, 10 without nulls
                        time:   [2.0694 µs 2.0827 µs 2.1012 µs]
                        change: [-1.2473% -0.0532% +1.1080%] (p = 0.93 > 0.05)
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  7 (7.00%) high mild
  11 (11.00%) high severe

union logical nulls 4096 4 children with nulls, 0 without nulls
                        time:   [2.1140 µs 2.1281 µs 2.1473 µs]
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) high mild
  17 (17.00%) high severe

union logical nulls 4096 4 children with nulls, 1 without nulls
                        time:   [2.6724 µs 2.6915 µs 2.7148 µs]
                        change: [-0.9684% -0.2581% +0.4469%] (p = 0.48 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  15 (15.00%) high severe

union logical nulls 4096 4 children with nulls, 10 without nulls
                        time:   [2.6953 µs 2.7094 µs 2.7269 µs]
                        change: [-0.9989% -0.2504% +0.5038%] (p = 0.50 > 0.05)
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe

union logical nulls 4096 5 children with nulls, 0 without nulls
                        time:   [2.7898 µs 2.8096 µs 2.8318 µs]
Found 20 outliers among 100 measurements (20.00%)
  3 (3.00%) high mild
  17 (17.00%) high severe

union logical nulls 4096 5 children with nulls, 1 without nulls
                        time:   [3.3592 µs 3.3784 µs 3.4011 µs]
                        change: [-0.5205% +0.1547% +0.8115%] (p = 0.66 > 0.05)
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  2 (2.00%) high mild
  18 (18.00%) high severe

union logical nulls 4096 5 children with nulls, 10 without nulls
                        time:   [3.3779 µs 3.4021 µs 3.4327 µs]
                        change: [-0.9537% +0.1781% +1.2313%] (p = 0.76 > 0.05)
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  3 (3.00%) high mild
  15 (15.00%) high severe

union logical nulls 4096 6 children with nulls, 0 without nulls
                        time:   [3.4163 µs 3.4446 µs 3.4805 µs]
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) high mild
  15 (15.00%) high severe

union logical nulls 4096 6 children with nulls, 1 without nulls
                        time:   [3.9625 µs 3.9789 µs 3.9996 µs]
                        change: [-0.5328% +0.3486% +1.2941%] (p = 0.43 > 0.05)
                        No change in performance detected.
Found 19 outliers among 100 measurements (19.00%)
  3 (3.00%) high mild
  16 (16.00%) high severe

union logical nulls 4096 6 children with nulls, 10 without nulls
                        time:   [3.9917 µs 4.0067 µs 4.0244 µs]
                        change: [-0.2479% +0.7253% +1.8074%] (p = 0.18 > 0.05)
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  2 (2.00%) high mild
  16 (16.00%) high severe

union logical nulls 4096 7 children with nulls, 0 without nulls
                        time:   [4.0318 µs 4.0461 µs 4.0642 µs]
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe

union logical nulls 4096 7 children with nulls, 1 without nulls
                        time:   [4.6034 µs 4.6171 µs 4.6339 µs]
                        change: [+0.3074% +0.7556% +1.1784%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 19 outliers among 100 measurements (19.00%)
  5 (5.00%) high mild
  14 (14.00%) high severe

union logical nulls 4096 7 children with nulls, 10 without nulls
                        time:   [4.6182 µs 4.6344 µs 4.6543 µs]
                        change: [-0.0238% +0.5574% +1.2432%] (p = 0.08 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  2 (2.00%) high mild
  13 (13.00%) high severe

union logical nulls 4096 8 children with nulls, 0 without nulls
                        time:   [4.6619 µs 4.7071 µs 4.7695 µs]
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) high mild
  14 (14.00%) high severe

union logical nulls 4096 8 children with nulls, 1 without nulls
                        time:   [5.2440 µs 5.2843 µs 5.3403 µs]
                        change: [-1.0769% +0.1182% +1.2061%] (p = 0.85 > 0.05)
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) high mild
  17 (17.00%) high severe

union logical nulls 4096 8 children with nulls, 10 without nulls
                        time:   [5.2702 µs 5.3058 µs 5.3488 µs]
                        change: [-0.8008% +0.0624% +0.8842%] (p = 0.89 > 0.05)
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  3 (3.00%) high mild
  17 (17.00%) high severe

union logical nulls 4096 9 children with nulls, 0 without nulls
                        time:   [5.3339 µs 5.3626 µs 5.3990 µs]
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) high mild
  11 (11.00%) high severe

union logical nulls 4096 9 children with nulls, 1 without nulls
                        time:   [5.9340 µs 5.9751 µs 6.0214 µs]
                        change: [-0.1106% +0.5016% +1.0782%] (p = 0.11 > 0.05)
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  2 (2.00%) high mild
  16 (16.00%) high severe

union logical nulls 4096 9 children with nulls, 10 without nulls
                        time:   [5.9485 µs 5.9897 µs 6.0430 µs]
                        change: [+0.5805% +1.6243% +2.7664%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 20 outliers among 100 measurements (20.00%)
  3 (3.00%) high mild
  17 (17.00%) high severe

union logical nulls 4096 10 children with nulls, 0 without nulls
                        time:   [5.9614 µs 5.9826 µs 6.0088 µs]
Found 14 outliers among 100 measurements (14.00%)
  1 (1.00%) high mild
  13 (13.00%) high severe

union logical nulls 4096 10 children with nulls, 1 without nulls
                        time:   [6.3212 µs 6.3628 µs 6.4144 µs]
                        change: [-4.4399% -3.4190% -2.4917%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) high mild
  15 (15.00%) high severe

union logical nulls 4096 10 children with nulls, 10 without nulls
                        time:   [6.3714 µs 6.4107 µs 6.4609 µs]
                        change: [-4.7291% -3.7470% -2.7989%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  4 (4.00%) high mild
  15 (15.00%) high severe

union logical nulls 4096 11 children with nulls, 0 without nulls
                        time:   [6.3651 µs 6.4082 µs 6.4610 µs]
Found 18 outliers among 100 measurements (18.00%)
  4 (4.00%) high mild
  14 (14.00%) high severe

union logical nulls 4096 11 children with nulls, 1 without nulls
                        time:   [6.3882 µs 6.4184 µs 6.4559 µs]
                        change: [-12.286% -11.243% -10.265%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  2 (2.00%) high mild
  15 (15.00%) high severe

union logical nulls 4096 11 children with nulls, 10 without nulls
                        time:   [6.4030 µs 6.4589 µs 6.5291 µs]
                        change: [-12.246% -11.198% -10.111%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe

…n_logical_nulls

gstvg · 2024-08-30T19:43:38Z

arrow-array/src/array/union_array.rs

+            None
+        }
+    }
+
    /// Union types always return non null as there is no validity buffer.


I updated docs below, but the others arrays which also should and correctly implements logical_nulls doesn't include those default methods. Should we delete them too?

DictionaryArray
RunArray
NullArray

If we remove these specific implementations, the behavior will not change but the documented reason (because UnionArray has no validity buffer) will be lost.

If we remove these -- can we update the documentation for the default implementations to mention the UnionArray special case?

The comment points to old code, my bad.
I think that the most recent changes already update the documentation, so I removed the methods
Let me know if you agree

…n_logical_nulls

alamb · 2024-09-18T20:13:45Z

I am depressed about the large review backlog in this crate. We are looking for more help from the community reviewing PRs -- see #6418 for more

wiedld · 2024-09-23T19:07:40Z

I'm starting to review this one. 👀

wiedld

Thank you again for the code documentation! This was a fun PR to review.

Most of the suggestions are intended to make it easier for the next reader. I'm not sure about our code coverage policy, and I also think we may need to implement UnionArray::is_nullable (maybe in a followup PR?).

wiedld · 2024-09-25T16:58:20Z

arrow-array/src/array/mod.rs

+    /// Similary, a [`UnionArray`] with any nullable child will always return true, even if all
+    /// selected values are valid, and therefore would not appear in [`Array::logical_nulls`].
    fn is_nullable(&self) -> bool {
        self.null_count() != 0


Does the added code comment match the implementation?

UnionArrays always returns a zero null_count. So this conditional self.null_count() != 0 always evaluates to false, which means is never nullable.

I think the proposed/added doc comments is what should happen -- but it's not happening now.

Union types have no null validity bitmap (per spec). I believe this is why the null_count is always zero and the UnionArray::nulls is None.

In contrast, I believe the is_nullable is based on the child types (as the docs added here suggest). But what's missing is to define UnionArray::is_nullable on it's implementation, and in a way that examines the UnionArray's children. (Dictionary arrays already does so.) Do you agree?

If you agree, then maybe add this doc comment & the code change in a follow up PR?

You are absolute right, thank you!
I added an implementation always returning true just to not be buggy, and will improve in a follow up PR

wiedld · 2024-09-25T18:40:08Z

arrow-array/src/array/union_array.rs

+            None
+        }
+    }
+
    /// Union types always return non null as there is no validity buffer.


If we remove these specific implementations, the behavior will not change but the documented reason (because UnionArray has no validity buffer) will be lost.

If we remove these -- can we update the documentation for the default implementations to mention the UnionArray special case?

wiedld · 2024-09-25T21:34:11Z

arrow-array/src/array/union_array.rs

+                enum SparseStrategy {
+                    Gather,
+                    AllNullsSkipOne,
+                    MixedSkipWithoutNulls,
+                    MixedSkipFullyNull,


Could you move this enum elsewhere and doc comment what the options mean? Since the names are not self explanatory.

For example, the AllNullsSkipOne is actually "all fields have some (not all) nulls" and the SkipOne part refers to how the child arrays are iterated. Having that enum explained up front helps speedup the comprehension. 🙏🏼

Fair enough. I added comments, but please let me know if you think it can be further improved

wiedld · 2024-09-25T21:43:35Z

arrow-array/src/array/union_array.rs

+                // This is the threshold where masking becomes slower than gather
+                // TODO: test on avx512f(feature is still unstable)
+                let gather_relative_cost = if cfg!(target_feature = "avx2") {
+                    10
+                } else if cfg!(target_feature = "sse4.1") {
+                    3
+                } else {
+                    2
+                };


Do you have any sources (or data) on specifically why these numbers?

There is a clear preference to using the gather approach when we have many fields (partial null) to iterate thru, but I didn't follow how these exact numbers were chosen.

Indeed, those numbers looks magic. I choose them based on benchmarks, numbers below.

I added comments to be more clear, and modified the else branch, that was choose based on x86 baseline/SSE2 benchmarks, but was also being selected for non x86 archs. I restricted it to x86, and set the gather cost to 0 (always used) on non x86, because I only have x86 to bench.

In resume, for a union with len of 4096:
gather = 6.4 µs (regardless of target feature)

AVX2:
1 selection mask per chunk = 0.8 µs
10 masks = 6.3 µs
11 maks = ~7 µs (slower than gather)

SSE4.1:
1 mask = 2.5µs
2 masks = 5 µs
3 masks = 7.5 µs (slower than gather)

x86 baseline/SSE2:
1 mask = 4 µs
2 masks = 8 µs (slower than gather)

The data for AVX2 is at #6303 (comment)
Data for SSE4.1 and SSE2 is from memory, I will bench it again tomorrow

wiedld · 2024-09-26T00:10:39Z

arrow-array/src/array/union_array.rs

+        // Unsafe code below depend on it:
+        // To remove one branch from the loop, if the a type_id is not utilized, or it's logical_nulls is None/all set,
+        // we use a null buffer of len 1 and a index_mask of 0, or the true null buffer and usize::MAX otherwise.
+        // We then unconditionally access the null buffer with index & index_mask,
+        // which always return 0 for the 1-len buffer, or the true index unchanged otherwise
+        // We also use a 256 array, so llvm knows that `type_id as u8 as usize` is always in bounds
+        let mut logical_nulls_array = [(&one_valid, Mask::Zero); 256];


wiedld · 2024-09-26T02:40:55Z

arrow-array/src/array/union_array.rs

+        let chunks = chunks_exact.map(|chunk| {
+            let chunk_array = <&[i8; 64]>::try_from(chunk).unwrap();
+
+            mask_chunk(chunk_array, &mut nulls_masks_iter)
+        });


Nit: would you mind renaming the variable chunk_array (and where the mask_chunk closures are defined) as type_ids_chunk_array or something similar?

Looks better, I also renamed chunk to type_ids_chunk and remainder to type_ids_remainder
Feel free to ping any other place I may have missed

wiedld · 2024-09-26T03:40:58Z

arrow-array/src/array/union_array.rs

+                    },
+                );
+
+                union_nulls | without_nulls_selected(chunk, &without_nulls_ids)


Could we add a comment on this line?
I believe the without_nulls_selected returns the bitmask for the without nulls (the is_d | is_e), which provides the proper true NullBuffer slot (which translates to not-the-f-array-null, a.k.a. not one of the fully nulls).

Sure, please let me know if it can be further clarified

arrow-array/src/array/union_array.rs

wiedld · 2024-09-26T04:14:02Z

arrow-array/src/array/union_array.rs

+                    })
+                    .fold((0, 0), fold);
+
+                !any_nullable_selected | union_nulls


Could we add a comment on this line?
It's basically the inverse of line 480 (see requested comment there). It's clear once reading the NullBuffer docs, but it's an extra hop for anyone newer to the codebase. Thank you!

I modified it to be the same as line 480, and renamed nullable to with_nulls because it doesn't applies to nullable fields that happens to have 0 nulls

wiedld · 2024-09-26T04:59:49Z

arrow-array/src/array/union_array.rs

@@ -380,6 +392,254 @@ impl UnionArray {
            _ => unreachable!(),
        }
    }
+
+    /// Computes the logical nulls for a sparse union, optimized for when there's a lot of fields without nulls
+    fn mask_sparse_skip_without_nulls(&self, nulls: Vec<(i8, NullBuffer)>) -> BooleanBuffer {


This function is not hit by any of the test cases, but does get used by the benchmark.

There are also several other branch points in these mask methods, also only used in the benchmark. I don't believe the benchmark tests for correctness (uses black_box(array.logical_nulls()). What is the policy here @alamb ?

I don't think we have any strict policy -- rather the guidelines are to cover the code such that if someone broke it accidentally in a future refactoring, the tests would break.

Exactly how much coverage is enough to meet that bar I think is somewhat of a judgement call and is based on the functionality.

Perhaps you can help @gstvg figure out any missing cases they could add based on your coverage analysis?

Sure thing. Here is the cov report.
I added in assert!(false) lines for verification of coverage gaps; also makes skimming the report easier.

# to open > tar -xvf cov_union_arrays.tar.gz > open cov_union_arrays.html # how generated > RUSTFLAGS='-Z profile -C codegen-units=1' CARGO_CFG_REGEX_DISABLE_AUTO_OPTIMIZATIONS=1 cargo +nightly cov test -p arrow-array --lib > cargo +nightly cov report --open # save as complete webpage (give you interactive bits), and tarball # Also confirmed the arrow-arith tests did not increase coverage. # Only the benchmarks did -- and those don't check correctness.

Your call on wherever you want correctness coverage @gstvg . Hope this helps!

I was thinking it would help to translate the codecov report into a description of what UnionArray should be constructed / have logical_nulls called on it to improve the coverage

Thanks for the cov report.
That's my fault, the test test_sparse_union_logical_mask_mixed_nulls_skip_fully_valid should have hit this. It's fixed

I also discovered that test_sparse_union_logical_nulls_mask_all_nulls_skip_one was using the gather strategy, and the SkipOne strategy was only called on a fast_paths test that doesn't make sense, so I removed it and fixed the skip_one test.

… into union_logical_nulls

Co-authored-by: wiedld <wiedld@users.noreply.github.com>

alamb · 2024-10-01T21:09:37Z

@wiedld -- please ping me when you think this PS is ready

wiedld

Thank you @gstvg for the clarifications & documentation. ❤️

I can approve after @alamb let's me know about the CI-testing targets, and what is the standard we use. 🙏🏼

Alternatively, there is another suggested approach to make sure we are testing the different sparse strategies.

wiedld · 2024-10-01T20:27:51Z

arrow-array/src/array/union_array.rs

+                // Choose the fastest way to compute the logical nulls
+                // Gather computes one null per iteration, while the others work on 64 nulls chunks,
+                // but must also compute selection masks, which is expensive,
+                // so it's cost is the number of selection masks computed per chunk
+                // Since computing the selection mask gets auto-vectorized, it's performance depends on which simd feature is enabled
+                // For gather, the cost is the threshold where masking becomes slower than gather, which is determined with benchmarks


wiedld · 2024-10-01T20:28:38Z

arrow-array/src/array/union_array.rs

+                    // Always use gather on non benchmarked archs because even though it may slower on some cases,
+                    // it's performance depends only on the union length, without being affected by the number of fields
+                    0


Thank you. These comments are great.

@alamb -- I have a question regarding how CI will be testing these branches.

The default gather_relative_cost=0 means that the default scenario is to always use the SparseStrategy::Gather except when other architectures AND features are enabled during testing. When I look at our CI (which I believe is here) we won't be testing all of these scenarios.

Not trying to hold up this PR. Just trying to figure out if this means we leave the responsibility on the user to build & run tests on their own platform -- and report to us if the tests errors. Is that ok?

Alternatively, we could have these branches (choosing which SparseStrategy enum) not be hardcoded to a feature & architecure -- and instead use an approach where we can set the variable separately before running each test. That way all branches can be run through the correctness tests.

Is this a reasonable ask? (I'm new to this codebase & as a code reviewer here.)

In general, the arrow tests are controlled by this file: https://github.com/apache/arrow-rs/blob/master/.github/workflows/arrow.yml

Which runs with various combination of feature flags

fix: implement UnionArray logical_nulls

8bfb4f2

github-actions bot added the arrow Changes to the arrow crate label Aug 24, 2024

fix: allow false positive clippy warning

442e6f9

gstvg commented Aug 24, 2024

View reviewed changes

arrow-array/src/array/union_array.rs Outdated Show resolved Hide resolved

fix: replace unstable functions

9472786

gstvg marked this pull request as draft August 25, 2024 15:52

gstvg mentioned this pull request Aug 29, 2024

UnionBuilder produces incorrect Union DataType #1637

Open

gstvg added 4 commits August 30, 2024 15:32

improve docs, perf, validate sparse, revert msrv

9a921ff

Merge branch 'master' of https://github.com/apache/arrow-rs into unio…

3a61305

…n_logical_nulls

simplify benches

251ef82

fine tune fast path threshold

e35aaa0

gstvg commented Aug 30, 2024

View reviewed changes

gstvg added 3 commits August 30, 2024 17:03

update docs

6100d7a

fix fast path check, improve perf and docs

49e78e2

Merge branch 'master' of https://github.com/apache/arrow-rs into unio…

5b965bf

…n_logical_nulls

gstvg marked this pull request as ready for review September 11, 2024 15:41

fix: remove rust 1.65 code

291691c

alamb mentioned this pull request Sep 16, 2024

DataFusion weekly project plan (Andrew Lamb) - Sep 16, 2024 apache/datafusion#12494

Closed

8 tasks

gstvg added 2 commits September 23, 2024 00:45

simplify mask_sparse_

3fd0b38

Merge branch 'master' into union_logical_nulls

0b9a443

wiedld reviewed Sep 26, 2024

View reviewed changes

gstvg and others added 3 commits September 30, 2024 02:25

apply suggestions from review

b51bc86

Merge branch 'union_logical_nulls' of https://github.com/gstvg/arrow-rs…

74a3b20

… into union_logical_nulls

Update arrow-array/src/array/union_array.rs

f824ed3

Co-authored-by: wiedld <wiedld@users.noreply.github.com>

wiedld reviewed Oct 1, 2024

View reviewed changes

Implement UnionArray logical_nulls #6303

Are you sure you want to change the base?

Implement UnionArray logical_nulls #6303

Conversation

gstvg commented Aug 24, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Additional info

gstvg commented Aug 24, 2024

samuelcolvin commented Aug 25, 2024

gstvg commented Aug 25, 2024

gstvg commented Aug 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Sep 18, 2024

wiedld commented Sep 23, 2024

wiedld left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wiedld Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wiedld Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wiedld Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Oct 1, 2024

wiedld left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wiedld Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gstvg commented Aug 24, 2024 •

edited

Loading

gstvg commented Aug 25, 2024 •

edited

Loading

wiedld left a comment •

edited

Loading

wiedld Sep 25, 2024 •

edited

Loading

wiedld Sep 26, 2024 •

edited

Loading

wiedld Sep 26, 2024 •

edited

Loading

wiedld left a comment •

edited

Loading

wiedld Oct 1, 2024 •

edited

Loading