Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement UnionArray logical_nulls #6303

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

gstvg
Copy link
Contributor

@gstvg gstvg commented Aug 24, 2024

Which issue does this PR close?

Closes #6017

Rationale for this change

N/A

What changes are included in this PR?

UnionArray::logical_nulls implementation, tests and benches
is_null and is_not_null tests on unions
Check if sparse union child arrays length match the length of the parent union

Are there any user-facing changes?

UnionArray::logical_nulls return correct results

Additional info

This is a port of apache/datafusion#11321

@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 24, 2024
@gstvg
Copy link
Contributor Author

gstvg commented Aug 24, 2024

cc @samuelcolvin

@samuelcolvin
Copy link
Contributor

Looks good from a brief check, what's the result of running the benchmarks?

@gstvg
Copy link
Contributor Author

gstvg commented Aug 25, 2024

$ RUSTFLAGS='-C target-feature=+avx2' cargo bench --bench union_array

union logical nulls 4096 1 children
                        time:   [8.9139 ns 8.9932 ns 9.0887 ns]
Found 18 outliers among 100 measurements (18.00%)
  5 (5.00%) high mild
  13 (13.00%) high severe

union logical nulls 4096 2 children
                        time:   [1.4903 µs 1.4969 µs 1.5055 µs]
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) low severe
  3 (3.00%) high mild
  14 (14.00%) high severe

union logical nulls 4096 3 children
                        time:   [6.1631 µs 6.1999 µs 6.2449 µs]
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) high mild
  12 (12.00%) high severe

union logical nulls 4096 4 children
                        time:   [6.1878 µs 6.2374 µs 6.2976 µs]
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) high mild
  15 (15.00%) high severe

union logical nulls 4096 5 children
                        time:   [6.1996 µs 6.2488 µs 6.3094 µs]
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) high mild
  17 (17.00%) high severe

union logical nulls 4096 6 children
                        time:   [6.2286 µs 6.2827 µs 6.3441 µs]
Found 21 outliers among 100 measurements (21.00%)
  3 (3.00%) high mild
  18 (18.00%) high severe

single with nulls 4096  time:   [1.2346 µs 1.2493 µs 1.2678 µs]
Found 14 outliers among 100 measurements (14.00%)
  1 (1.00%) high mild
  13 (13.00%) high severe

@gstvg gstvg marked this pull request as draft August 25, 2024 15:52
@gstvg
Copy link
Contributor Author

gstvg commented Aug 25, 2024

Studying a few fast paths

Edit: Now it runs faster when there are up to ~10 fields with nulls, with timings slowly increasing for every field with nulls, and from there timings stabilize and depends only on the length of the union

New results
union logical nulls 4096 1 children with nulls, 0 without nulls
                        time:   [8.3308 ns 8.4185 ns 8.5348 ns]
Found 17 outliers among 100 measurements (17.00%)
  3 (3.00%) high mild
  14 (14.00%) high severe

union logical nulls 4096 1 children with nulls, 1 without nulls
                        time:   [815.86 ns 821.39 ns 827.97 ns]
                        change: [-0.6460% +0.4002% +1.4415%] (p = 0.48 > 0.05)
                        No change in performance detected.
Found 17 outliers among 100 measurements (17.00%)
  1 (1.00%) high mild
  16 (16.00%) high severe

union logical nulls 4096 1 children with nulls, 10 without nulls
                        time:   [830.73 ns 833.54 ns 837.27 ns]
                        change: [+0.1174% +0.9252% +1.8490%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe

union logical nulls 4096 2 children with nulls, 0 without nulls
                        time:   [866.46 ns 870.39 ns 875.34 ns]
Found 15 outliers among 100 measurements (15.00%)
  15 (15.00%) high severe

union logical nulls 4096 2 children with nulls, 1 without nulls
                        time:   [1.4325 µs 1.4411 µs 1.4516 µs]
                        change: [-0.0205% +0.6804% +1.4063%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) high mild
  15 (15.00%) high severe

union logical nulls 4096 2 children with nulls, 10 without nulls
                        time:   [1.4605 µs 1.4709 µs 1.4833 µs]
                        change: [-1.6473% -0.3650% +0.8144%] (p = 0.58 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  16 (16.00%) high severe

union logical nulls 4096 3 children with nulls, 0 without nulls
                        time:   [1.5076 µs 1.5214 µs 1.5380 µs]
Found 15 outliers among 100 measurements (15.00%)
  7 (7.00%) high mild
  8 (8.00%) high severe

union logical nulls 4096 3 children with nulls, 1 without nulls
                        time:   [2.0524 µs 2.0704 µs 2.0963 µs]
                        change: [-0.6605% +0.1153% +0.9818%] (p = 0.79 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) high mild
  12 (12.00%) high severe

union logical nulls 4096 3 children with nulls, 10 without nulls
                        time:   [2.0694 µs 2.0827 µs 2.1012 µs]
                        change: [-1.2473% -0.0532% +1.1080%] (p = 0.93 > 0.05)
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  7 (7.00%) high mild
  11 (11.00%) high severe

union logical nulls 4096 4 children with nulls, 0 without nulls
                        time:   [2.1140 µs 2.1281 µs 2.1473 µs]
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) high mild
  17 (17.00%) high severe

union logical nulls 4096 4 children with nulls, 1 without nulls
                        time:   [2.6724 µs 2.6915 µs 2.7148 µs]
                        change: [-0.9684% -0.2581% +0.4469%] (p = 0.48 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  15 (15.00%) high severe

union logical nulls 4096 4 children with nulls, 10 without nulls
                        time:   [2.6953 µs 2.7094 µs 2.7269 µs]
                        change: [-0.9989% -0.2504% +0.5038%] (p = 0.50 > 0.05)
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe

union logical nulls 4096 5 children with nulls, 0 without nulls
                        time:   [2.7898 µs 2.8096 µs 2.8318 µs]
Found 20 outliers among 100 measurements (20.00%)
  3 (3.00%) high mild
  17 (17.00%) high severe

union logical nulls 4096 5 children with nulls, 1 without nulls
                        time:   [3.3592 µs 3.3784 µs 3.4011 µs]
                        change: [-0.5205% +0.1547% +0.8115%] (p = 0.66 > 0.05)
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  2 (2.00%) high mild
  18 (18.00%) high severe

union logical nulls 4096 5 children with nulls, 10 without nulls
                        time:   [3.3779 µs 3.4021 µs 3.4327 µs]
                        change: [-0.9537% +0.1781% +1.2313%] (p = 0.76 > 0.05)
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  3 (3.00%) high mild
  15 (15.00%) high severe

union logical nulls 4096 6 children with nulls, 0 without nulls
                        time:   [3.4163 µs 3.4446 µs 3.4805 µs]
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) high mild
  15 (15.00%) high severe

union logical nulls 4096 6 children with nulls, 1 without nulls
                        time:   [3.9625 µs 3.9789 µs 3.9996 µs]
                        change: [-0.5328% +0.3486% +1.2941%] (p = 0.43 > 0.05)
                        No change in performance detected.
Found 19 outliers among 100 measurements (19.00%)
  3 (3.00%) high mild
  16 (16.00%) high severe

union logical nulls 4096 6 children with nulls, 10 without nulls
                        time:   [3.9917 µs 4.0067 µs 4.0244 µs]
                        change: [-0.2479% +0.7253% +1.8074%] (p = 0.18 > 0.05)
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  2 (2.00%) high mild
  16 (16.00%) high severe

union logical nulls 4096 7 children with nulls, 0 without nulls
                        time:   [4.0318 µs 4.0461 µs 4.0642 µs]
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe

union logical nulls 4096 7 children with nulls, 1 without nulls
                        time:   [4.6034 µs 4.6171 µs 4.6339 µs]
                        change: [+0.3074% +0.7556% +1.1784%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 19 outliers among 100 measurements (19.00%)
  5 (5.00%) high mild
  14 (14.00%) high severe

union logical nulls 4096 7 children with nulls, 10 without nulls
                        time:   [4.6182 µs 4.6344 µs 4.6543 µs]
                        change: [-0.0238% +0.5574% +1.2432%] (p = 0.08 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  2 (2.00%) high mild
  13 (13.00%) high severe

union logical nulls 4096 8 children with nulls, 0 without nulls
                        time:   [4.6619 µs 4.7071 µs 4.7695 µs]
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) high mild
  14 (14.00%) high severe

union logical nulls 4096 8 children with nulls, 1 without nulls
                        time:   [5.2440 µs 5.2843 µs 5.3403 µs]
                        change: [-1.0769% +0.1182% +1.2061%] (p = 0.85 > 0.05)
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) high mild
  17 (17.00%) high severe

union logical nulls 4096 8 children with nulls, 10 without nulls
                        time:   [5.2702 µs 5.3058 µs 5.3488 µs]
                        change: [-0.8008% +0.0624% +0.8842%] (p = 0.89 > 0.05)
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  3 (3.00%) high mild
  17 (17.00%) high severe

union logical nulls 4096 9 children with nulls, 0 without nulls
                        time:   [5.3339 µs 5.3626 µs 5.3990 µs]
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) high mild
  11 (11.00%) high severe

union logical nulls 4096 9 children with nulls, 1 without nulls
                        time:   [5.9340 µs 5.9751 µs 6.0214 µs]
                        change: [-0.1106% +0.5016% +1.0782%] (p = 0.11 > 0.05)
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  2 (2.00%) high mild
  16 (16.00%) high severe

union logical nulls 4096 9 children with nulls, 10 without nulls
                        time:   [5.9485 µs 5.9897 µs 6.0430 µs]
                        change: [+0.5805% +1.6243% +2.7664%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 20 outliers among 100 measurements (20.00%)
  3 (3.00%) high mild
  17 (17.00%) high severe

union logical nulls 4096 10 children with nulls, 0 without nulls
                        time:   [5.9614 µs 5.9826 µs 6.0088 µs]
Found 14 outliers among 100 measurements (14.00%)
  1 (1.00%) high mild
  13 (13.00%) high severe

union logical nulls 4096 10 children with nulls, 1 without nulls
                        time:   [6.3212 µs 6.3628 µs 6.4144 µs]
                        change: [-4.4399% -3.4190% -2.4917%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) high mild
  15 (15.00%) high severe

union logical nulls 4096 10 children with nulls, 10 without nulls
                        time:   [6.3714 µs 6.4107 µs 6.4609 µs]
                        change: [-4.7291% -3.7470% -2.7989%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  4 (4.00%) high mild
  15 (15.00%) high severe

union logical nulls 4096 11 children with nulls, 0 without nulls
                        time:   [6.3651 µs 6.4082 µs 6.4610 µs]
Found 18 outliers among 100 measurements (18.00%)
  4 (4.00%) high mild
  14 (14.00%) high severe

union logical nulls 4096 11 children with nulls, 1 without nulls
                        time:   [6.3882 µs 6.4184 µs 6.4559 µs]
                        change: [-12.286% -11.243% -10.265%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  2 (2.00%) high mild
  15 (15.00%) high severe

union logical nulls 4096 11 children with nulls, 10 without nulls
                        time:   [6.4030 µs 6.4589 µs 6.5291 µs]
                        change: [-12.246% -11.198% -10.111%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe

None
}
}

/// Union types always return non null as there is no validity buffer.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated docs below, but the others arrays which also should and correctly implements logical_nulls doesn't include those default methods. Should we delete them too?

DictionaryArray
RunArray
NullArray

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove these specific implementations, the behavior will not change but the documented reason (because UnionArray has no validity buffer) will be lost.

If we remove these -- can we update the documentation for the default implementations to mention the UnionArray special case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment points to old code, my bad.
I think that the most recent changes already update the documentation, so I removed the methods
Let me know if you agree

@gstvg gstvg marked this pull request as ready for review September 11, 2024 15:41
@alamb
Copy link
Contributor

alamb commented Sep 18, 2024

I am depressed about the large review backlog in this crate. We are looking for more help from the community reviewing PRs -- see #6418 for more

@wiedld
Copy link
Contributor

wiedld commented Sep 23, 2024

I'm starting to review this one. 👀

Copy link
Contributor

@wiedld wiedld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again for the code documentation! This was a fun PR to review.

Most of the suggestions are intended to make it easier for the next reader. I'm not sure about our code coverage policy, and I also think we may need to implement UnionArray::is_nullable (maybe in a followup PR?).

Comment on lines 287 to 290
/// Similary, a [`UnionArray`] with any nullable child will always return true, even if all
/// selected values are valid, and therefore would not appear in [`Array::logical_nulls`].
fn is_nullable(&self) -> bool {
self.null_count() != 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the added code comment match the implementation?

UnionArrays always returns a zero null_count. So this conditional self.null_count() != 0 always evaluates to false, which means is never nullable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the proposed/added doc comments is what should happen -- but it's not happening now.

Union types have no null validity bitmap (per spec). I believe this is why the null_count is always zero and the UnionArray::nulls is None.

In contrast, I believe the is_nullable is based on the child types (as the docs added here suggest). But what's missing is to define UnionArray::is_nullable on it's implementation, and in a way that examines the UnionArray's children. (Dictionary arrays already does so.) Do you agree?

If you agree, then maybe add this doc comment & the code change in a follow up PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolute right, thank you!
I added an implementation always returning true just to not be buggy, and will improve in a follow up PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

None
}
}

/// Union types always return non null as there is no validity buffer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove these specific implementations, the behavior will not change but the documented reason (because UnionArray has no validity buffer) will be lost.

If we remove these -- can we update the documentation for the default implementations to mention the UnionArray special case?

Comment on lines 794 to 798
enum SparseStrategy {
Gather,
AllNullsSkipOne,
MixedSkipWithoutNulls,
MixedSkipFullyNull,
Copy link
Contributor

@wiedld wiedld Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this enum elsewhere and doc comment what the options mean? Since the names are not self explanatory.

For example, the AllNullsSkipOne is actually "all fields have some (not all) nulls" and the SkipOne part refers to how the child arrays are iterated. Having that enum explained up front helps speedup the comprehension. 🙏🏼

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I added comments, but please let me know if you think it can be further improved

Comment on lines 801 to 809
// This is the threshold where masking becomes slower than gather
// TODO: test on avx512f(feature is still unstable)
let gather_relative_cost = if cfg!(target_feature = "avx2") {
10
} else if cfg!(target_feature = "sse4.1") {
3
} else {
2
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any sources (or data) on specifically why these numbers?

There is a clear preference to using the gather approach when we have many fields (partial null) to iterate thru, but I didn't follow how these exact numbers were chosen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, those numbers looks magic. I choose them based on benchmarks, numbers below.

I added comments to be more clear, and modified the else branch, that was choose based on x86 baseline/SSE2 benchmarks, but was also being selected for non x86 archs. I restricted it to x86, and set the gather cost to 0 (always used) on non x86, because I only have x86 to bench.

In resume, for a union with len of 4096:
gather = 6.4 µs (regardless of target feature)

AVX2:
1 selection mask per chunk = 0.8 µs
10 masks = 6.3 µs
11 maks = ~7 µs (slower than gather)

SSE4.1:
1 mask = 2.5µs
2 masks = 5 µs
3 masks = 7.5 µs (slower than gather)

x86 baseline/SSE2:
1 mask = 4 µs
2 masks = 8 µs (slower than gather)

The data for AVX2 is at #6303 (comment)
Data for SSE4.1 and SSE2 is from memory, I will bench it again tomorrow

Comment on lines +587 to +593
// Unsafe code below depend on it:
// To remove one branch from the loop, if the a type_id is not utilized, or it's logical_nulls is None/all set,
// we use a null buffer of len 1 and a index_mask of 0, or the true null buffer and usize::MAX otherwise.
// We then unconditionally access the null buffer with index & index_mask,
// which always return 0 for the 1-len buffer, or the true index unchanged otherwise
// We also use a 256 array, so llvm knows that `type_id as u8 as usize` is always in bounds
let mut logical_nulls_array = [(&one_valid, Mask::Zero); 256];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Comment on lines 565 to 569
let chunks = chunks_exact.map(|chunk| {
let chunk_array = <&[i8; 64]>::try_from(chunk).unwrap();

mask_chunk(chunk_array, &mut nulls_masks_iter)
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: would you mind renaming the variable chunk_array (and where the mask_chunk closures are defined) as type_ids_chunk_array or something similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks better, I also renamed chunk to type_ids_chunk and remainder to type_ids_remainder
Feel free to ping any other place I may have missed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

},
);

union_nulls | without_nulls_selected(chunk, &without_nulls_ids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a comment on this line?
I believe the without_nulls_selected returns the bitmask for the without nulls (the is_d | is_e), which provides the proper true NullBuffer slot (which translates to not-the-f-array-null, a.k.a. not one of the fully nulls).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, please let me know if it can be further clarified

arrow-array/src/array/union_array.rs Outdated Show resolved Hide resolved
})
.fold((0, 0), fold);

!any_nullable_selected | union_nulls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a comment on this line?
It's basically the inverse of line 480 (see requested comment there). It's clear once reading the NullBuffer docs, but it's an extra hop for anyone newer to the codebase. Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified it to be the same as line 480, and renamed nullable to with_nulls because it doesn't applies to nullable fields that happens to have 0 nulls

@@ -380,6 +392,254 @@ impl UnionArray {
_ => unreachable!(),
}
}

/// Computes the logical nulls for a sparse union, optimized for when there's a lot of fields without nulls
fn mask_sparse_skip_without_nulls(&self, nulls: Vec<(i8, NullBuffer)>) -> BooleanBuffer {
Copy link
Contributor

@wiedld wiedld Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not hit by any of the test cases, but does get used by the benchmark.

There are also several other branch points in these mask methods, also only used in the benchmark. I don't believe the benchmark tests for correctness (uses black_box(array.logical_nulls()). What is the policy here @alamb ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have any strict policy -- rather the guidelines are to cover the code such that if someone broke it accidentally in a future refactoring, the tests would break.

Exactly how much coverage is enough to meet that bar I think is somewhat of a judgement call and is based on the functionality.

Perhaps you can help @gstvg figure out any missing cases they could add based on your coverage analysis?

Copy link
Contributor

@wiedld wiedld Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. Here is the cov report.
I added in assert!(false) lines for verification of coverage gaps; also makes skimming the report easier.

# to open
> tar -xvf cov_union_arrays.tar.gz
> open cov_union_arrays.html

# how generated
> RUSTFLAGS='-Z profile -C codegen-units=1' CARGO_CFG_REGEX_DISABLE_AUTO_OPTIMIZATIONS=1 cargo +nightly cov test -p arrow-array --lib
> cargo +nightly cov report --open
# save as complete webpage (give you interactive bits), and tarball 

# Also confirmed the arrow-arith tests did not increase coverage. 
# Only the benchmarks did -- and those don't check correctness.

Your call on wherever you want correctness coverage @gstvg . Hope this helps!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it would help to translate the codecov report into a description of what UnionArray should be constructed / have logical_nulls called on it to improve the coverage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the cov report.
That's my fault, the test test_sparse_union_logical_mask_mixed_nulls_skip_fully_valid should have hit this. It's fixed

I also discovered that test_sparse_union_logical_nulls_mask_all_nulls_skip_one was using the gather strategy, and the SkipOne strategy was only called on a fast_paths test that doesn't make sense, so I removed it and fixed the skip_one test.

@alamb
Copy link
Contributor

alamb commented Oct 1, 2024

@wiedld -- please ping me when you think this PS is ready

Copy link
Contributor

@wiedld wiedld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @gstvg for the clarifications & documentation. ❤️

I can approve after @alamb let's me know about the CI-testing targets, and what is the standard we use. 🙏🏼

Alternatively, there is another suggested approach to make sure we are testing the different sparse strategies.

Comment on lines +807 to +812
// Choose the fastest way to compute the logical nulls
// Gather computes one null per iteration, while the others work on 64 nulls chunks,
// but must also compute selection masks, which is expensive,
// so it's cost is the number of selection masks computed per chunk
// Since computing the selection mask gets auto-vectorized, it's performance depends on which simd feature is enabled
// For gather, the cost is the threshold where masking becomes slower than gather, which is determined with benchmarks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Comment on lines +823 to +825
// Always use gather on non benchmarked archs because even though it may slower on some cases,
// it's performance depends only on the union length, without being affected by the number of fields
0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. These comments are great.

Copy link
Contributor

@wiedld wiedld Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb -- I have a question regarding how CI will be testing these branches.

The default gather_relative_cost=0 means that the default scenario is to always use the SparseStrategy::Gather except when other architectures AND features are enabled during testing. When I look at our CI (which I believe is here) we won't be testing all of these scenarios.

Not trying to hold up this PR. Just trying to figure out if this means we leave the responsibility on the user to build & run tests on their own platform -- and report to us if the tests errors. Is that ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could have these branches (choosing which SparseStrategy enum) not be hardcoded to a feature & architecure -- and instead use an approach where we can set the variable separately before running each test. That way all branches can be run through the correctness tests.

Is this a reasonable ask? (I'm new to this codebase & as a code reviewer here.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the arrow tests are controlled by this file: https://github.com/apache/arrow-rs/blob/master/.github/workflows/arrow.yml

Which runs with various combination of feature flags

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect values for is_null and is_not_null on UnionArray
4 participants