Skip to content

Commit

Permalink
Minor: improve RowFilter and ArrowPredicate docs (#6301)
Browse files Browse the repository at this point in the history
* Minor: improve `RowFilter` and `ArrowPredicate` docs

* tweak
  • Loading branch information
alamb authored Aug 25, 2024
1 parent 8c956a9 commit f73dbc3
Showing 1 changed file with 12 additions and 5 deletions.
17 changes: 12 additions & 5 deletions parquet/src/arrow/arrow_reader/filter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ use arrow_array::{BooleanArray, RecordBatch};
use arrow_schema::ArrowError;

/// A predicate operating on [`RecordBatch`]
///
/// See [`RowFilter`] for more information on the use of this trait.
pub trait ArrowPredicate: Send + 'static {
/// Returns the [`ProjectionMask`] that describes the columns required
/// to evaluate this predicate. All projected columns will be provided in the `batch`
Expand All @@ -29,7 +31,7 @@ pub trait ArrowPredicate: Send + 'static {
/// Evaluate this predicate for the given [`RecordBatch`] containing the columns
/// identified by [`Self::projection`]
///
/// Must return a [`BooleanArray`] that has the same length as the input
/// Must return a [`BooleanArray`] that has the same length as the input
/// `batch` where each row indicates whether the row should be returned:
/// * `true`:the row should be returned
/// * `false` or `null`: the row should not be returned
Expand Down Expand Up @@ -68,12 +70,17 @@ where
}
}

/// A [`RowFilter`] allows pushing down a filter predicate to skip IO and decode
/// Filter applied *during* the parquet read process
///
/// [`RowFilter`] applies predicates in order, after decoding only the columns
/// required. As predicates eliminate rows, fewer rows from subsequent columns
/// may be required, thus potentially reducing IO and decode.
///
/// This consists of a list of [`ArrowPredicate`] where only the rows that satisfy all
/// of the predicates will be returned. Any [`RowSelection`] will be applied prior
/// A `RowFilter` consists of a list of [`ArrowPredicate`]s. Only the rows for which
/// all the predicates evaluate to `true` will be returned.
/// Any [`RowSelection`] provided to the reader will be applied prior
/// to the first predicate, and each predicate in turn will then be used to compute
/// a more refined [`RowSelection`] to use when evaluating the subsequent predicates.
/// a more refined [`RowSelection`] used when evaluating the subsequent predicates.
///
/// Once all predicates have been evaluated, the final [`RowSelection`] is applied
/// to the top-level [`ProjectionMask`] to produce the final output [`RecordBatch`].
Expand Down

0 comments on commit f73dbc3

Please sign in to comment.