Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: improve RowFilter and ArrowPredicate docs #6301

Merged
merged 2 commits into from
Aug 25, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 12 additions & 5 deletions parquet/src/arrow/arrow_reader/filter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ use arrow_array::{BooleanArray, RecordBatch};
use arrow_schema::ArrowError;

/// A predicate operating on [`RecordBatch`]
///
/// See [`RowFilter`] for more information on the use of this trait.
pub trait ArrowPredicate: Send + 'static {
/// Returns the [`ProjectionMask`] that describes the columns required
/// to evaluate this predicate. All projected columns will be provided in the `batch`
Expand All @@ -29,7 +31,7 @@ pub trait ArrowPredicate: Send + 'static {
/// Evaluate this predicate for the given [`RecordBatch`] containing the columns
/// identified by [`Self::projection`]
///
/// Must return a [`BooleanArray`] that has the same length as the input
/// Must return a [`BooleanArray`] that has the same length as the input
/// `batch` where each row indicates whether the row should be returned:
/// * `true`:the row should be returned
/// * `false` or `null`: the row should not be returned
Expand Down Expand Up @@ -68,12 +70,17 @@ where
}
}

/// A [`RowFilter`] allows pushing down a filter predicate to skip IO and decode
/// Filter applied *during* the parquet read process
///
/// [`RowFilter`] applies predicates in order, after decoding only the columns
/// required. As predicates eliminate rows, fewer rows from subsequent columns
/// may be required, thus potentially reducing IO and decode.
///
/// This consists of a list of [`ArrowPredicate`] where only the rows that satisfy all
/// of the predicates will be returned. Any [`RowSelection`] will be applied prior
/// A `RowFilter` consists of a list of [`ArrowPredicate`]s. Only the rows for which
/// all the predicates evaluate to `true` will be returned.
/// Any [`RowSelection`] provided to the reader will be applied prior
/// to the first predicate, and each predicate in turn will then be used to compute
/// a more refined [`RowSelection`] to use when evaluating the subsequent predicates.
/// a more refined [`RowSelection`] used when evaluating the subsequent predicates.
///
/// Once all predicates have been evaluated, the final [`RowSelection`] is applied
/// to the top-level [`ProjectionMask`] to produce the final output [`RecordBatch`].
Expand Down
Loading