Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add olap aggregation function #527

Merged
merged 5 commits into from
Sep 18, 2024
Merged

feat: add olap aggregation function #527

merged 5 commits into from
Sep 18, 2024

Conversation

nathanielc
Copy link
Collaborator

@nathanielc nathanielc commented Sep 12, 2024

The aggregation function compute the stream state for each new event into a stream. All stream states are persisted into a parquet database.

TODO:

  • Use schema version naming
  • CLI args for flight sql path

Closes AES-289
Closes AES-290

Copy link

linear bot commented Sep 12, 2024

Base automatically changed from feat/flight-sql-server to main September 12, 2024 17:06
@nathanielc nathanielc marked this pull request as ready for review September 12, 2024 17:14
@nathanielc nathanielc requested a review from a team as a code owner September 12, 2024 17:14
@nathanielc nathanielc requested review from AaronGoldman and samika98 and removed request for a team September 12, 2024 17:14
Copy link
Contributor

@samika98 samika98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions about the evaluate_all code.
I believe using topological sort would be a neater solution than the current iterative + hashmap approach.
Also concerned about termination during cycles in the current code. We might need to handle that and a return exec error

olap/src/aggregator/ceramic_patch.rs Show resolved Hide resolved
CODEOWNERS Show resolved Hide resolved
@nathanielc
Copy link
Collaborator Author

@samika98 I have updated the algo to do a single pass.

The aggregation function compute the stream state for each new event into a stream.
All stream states are persisted into a parquet database.
Copy link
Contributor

@dav1do dav1do left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the test. Some questions as it's a bit dense to those of us uninitiated with datafusion (their docs seem really good though so I just need to spend some time reading them).

olap/src/aggregator/ceramic_patch.rs Show resolved Hide resolved
olap/src/aggregator/ceramic_patch.rs Show resolved Hide resolved
olap/src/aggregator/ceramic_patch.rs Show resolved Hide resolved
}

impl PartitionEvaluator for CeramicPatchEvaluator {
fn evaluate_all(&mut self, values: &[ArrayRef], num_rows: usize) -> Result<ArrayRef> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing I struggled with was understanding the where things are coming from and what we're actually working with. The datafusion docs are pretty good for how this trait is used but but including a doc comment about what the expected input looks like would be helpful. Then again, the assignments into the fields makes it fairly clear, so maybe it's not useful 🤷

(these assignments)

        let event_cids = as_binary_array(&values[0])?;
        let previous_cids = as_binary_array(&values[1])?;
        let previous_states = as_string_array(&values[2])?;
        let patches = as_string_array(&values[3])?;

@nathanielc nathanielc dismissed samika98’s stale review September 18, 2024 14:28

samika is OOO and we discussed changes already in person

@nathanielc nathanielc added this pull request to the merge queue Sep 18, 2024
Merged via the queue into main with commit 81f7e78 Sep 18, 2024
5 checks passed
@nathanielc nathanielc deleted the feat/olap-agg branch September 18, 2024 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants