Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt(torii-grpc): parallelize queries #2443

Merged
merged 12 commits into from
Sep 19, 2024
Merged

Conversation

Larkooo
Copy link
Collaborator

@Larkooo Larkooo commented Sep 18, 2024

Summary by CodeRabbit

  • New Features

    • Enhanced performance for large dataset handling in the DojoWorld functionality through parallel processing of database queries.
    • Improved efficiency by grouping entities for batch processing, reducing database calls.
    • Integrated the itertools library for enhanced collection manipulation capabilities.
  • Bug Fixes

    • Optimized mapping of database rows to entities, ensuring faster execution times.

Copy link

coderabbitai bot commented Sep 18, 2024

Walkthrough

Ohayo, sensei! The recent changes enhance the DojoWorld implementation by incorporating parallel processing for database queries and entity mapping. The code now employs the Rayon library to parallelize the mapping of database rows to entities, improving performance with large datasets. Key updates include grouping entities by model IDs for efficient querying and utilizing batch processing with SQL IN clauses, significantly reducing database calls.

Changes

File Path Change Summary
crates/torii/grpc/src/server/mod.rs Introduced parallel processing for database queries and entity mapping using Rayon. Grouped entities by model IDs for efficient querying and refactored row mapping to leverage parallel iteration. Modified functions to accept parameters for table names and relation columns.
crates/torii/grpc/Cargo.toml Added itertools.workspace = true to include the itertools crate in the workspace, enhancing data handling capabilities.

Possibly related PRs

Suggested reviewers

  • glihm

Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 17dfab3 and 51c654b.

Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
Files selected for processing (1)
  • crates/torii/grpc/src/server/mod.rs (16 hunks)
Files skipped from review as they are similar to previous changes (1)
  • crates/torii/grpc/src/server/mod.rs

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    -- I pushed a fix in commit <commit_id>, please review it.
    -- Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8613c78 and aa43bb9.

Files selected for processing (1)
  • crates/torii/grpc/src/server/mod.rs (4 hunks)
Additional comments not posted (3)
crates/torii/grpc/src/server/mod.rs (3)

24-24: Ohayo sensei! Excellent addition of Rayon for parallel processing

Including rayon will enhance performance when handling large datasets through parallelization.


343-346: Ohayo sensei! Ensure thread safety when using par_iter() with map_row_to_entity

When applying par_iter(), please confirm that map_row_to_entity is thread-safe and does not access shared mutable state without proper synchronization.


613-623: Ohayo sensei! Ensure thread safety when parallelizing entity mapping

The use of par_iter() for db_entities is great for performance. Please verify that map_row_to_entity and any data accessed within are thread-safe to prevent race conditions.

crates/torii/grpc/src/server/mod.rs Outdated Show resolved Hide resolved
crates/torii/grpc/src/server/mod.rs Outdated Show resolved Hide resolved
Copy link

codecov bot commented Sep 18, 2024

Codecov Report

Attention: Patch coverage is 49.01961% with 78 lines in your changes missing coverage. Please review.

Project coverage is 68.38%. Comparing base (7c5a1f5) to head (51c654b).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
crates/torii/grpc/src/server/mod.rs 49.01% 78 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2443      +/-   ##
==========================================
+ Coverage   68.21%   68.38%   +0.16%     
==========================================
  Files         365      365              
  Lines       48018    47973      -45     
==========================================
+ Hits        32757    32805      +48     
+ Misses      15261    15168      -93     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 04025a1 and ad08bd1.

Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
Files selected for processing (1)
  • crates/torii/grpc/Cargo.toml (1 hunks)

@@ -13,6 +13,7 @@ futures.workspace = true
num-traits.workspace = true
parking_lot.workspace = true
rayon.workspace = true
itertools.workspace = true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohayo, sensei! Potential issue with the itertools dependency configuration

The line itertools.workspace = true adds itertools as a workspace dependency. However, itertools is an external crate and should typically be specified with a version number. Using workspace = true implies that itertools is part of your local workspace, which might not be the case unless you have a local copy of the crate.

Recommend updating the dependency to include the appropriate version from crates.io:

-itertools.workspace = true
+itertools = "0.10"  # Replace with the desired version

This ensures that the correct version of itertools is fetched from crates.io.

Committable suggestion was skipped due to low confidence.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between ad08bd1 and 827c945.

Files selected for processing (1)
  • crates/torii/grpc/src/server/mod.rs (14 hunks)
Additional comments not posted (4)
crates/torii/grpc/src/server/mod.rs (4)

218-305: Ohayo sensei! Efficient Batch Processing in fetch_entities

The new fetch_entities function (lines 218-305) efficiently batches database queries and handles large datasets by grouping entities and chunking inserts. Excellent work on addressing SQLite's parameter limit and optimizing query performance.


873-886: Ohayo sensei! Safe Regex Pattern Construction in build_keys_pattern

The build_keys_pattern function (lines 873-886) constructs regex patterns safely for matching keys. The logic properly handles empty inputs and ensures patterns are well-formed.


733-748: Ohayo sensei! Consistent Function Parameter Updates

The functions retrieve_entities, query_by_hashed_keys, query_by_keys, query_by_member, and query_by_composite have been updated to accept table, model_relation_table, and entity_relation_column parameters (lines 733-748, 756-763, 771-773, 782-784, 793-795). Ensure all calls to these functions pass the correct arguments to maintain consistency and prevent potential bugs.

Also applies to: 756-763, 771-773, 782-784, 793-795


1076-1084: Ohayo sensei! Updated Calls to retrieve_entities

The calls to retrieve_entities have been updated to include additional parameters for the table and relation columns (lines 1076-1084, 1127-1132). Verify that these changes are reflected wherever retrieve_entities is called to ensure correct functionality.

Run the following script to check for all retrieve_entities calls:

Also applies to: 1127-1132

Verification successful

Ohayo sensei! It seems we need to take a closer look at those retrieve_entities calls. Let's dig a bit deeper to see the full picture!


Ohayo sensei! I've taken a deep dive into the retrieve_entities calls, and I've got some good news for you!

retrieve_entities Changes Correctly Implemented

The changes to retrieve_entities have been correctly implemented in the server-side code (crates/torii/grpc/src/server/mod.rs). The function now accepts the new parameters (table, model_relation_table, entity_relation_column, query) and is called with the appropriate arguments.

However, it's important to note that the client-side calls (in crates/torii/client/src/client/mod.rs and crates/torii/grpc/src/client.rs) don't include these new parameters. This is actually correct because:

  1. The client-side calls should match the gRPC interface, which hasn't changed from the client's perspective.
  2. The new parameters are internal to the server implementation and don't affect the client-server communication.

So, no worries! The changes have been implemented where they should be, and the code is looking good, sensei!

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all calls to `retrieve_entities` have been updated with the new parameters.

# Test: Search for `retrieve_entities` calls.
# Expect: All calls should include the new parameters.

rg --type rust 'retrieve_entities\('

Length of output: 606


Script:

#!/bin/bash
# Description: Search for `retrieve_entities` calls with more context

# Test: Search for `retrieve_entities` calls with 5 lines of context before and after.
# Expect: Full function calls with all parameters visible.

rg --type rust -C 5 'retrieve_entities\('

Length of output: 5230

@@ -21,6 +21,7 @@ use proto::world::{
RetrieveEventsRequest, RetrieveEventsResponse, SubscribeModelsRequest, SubscribeModelsResponse,
UpdateEntitiesSubscriptionRequest,
};
use rayon::iter::{IntoParallelRefIterator, ParallelIterator};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohayo sensei! Introducing Rayon Requires Caution in Async Context

The addition of Rayon imports (line 24) indicates the use of data parallelism. However, using Rayon within asynchronous functions can lead to unexpected behavior due to its blocking nature and potential thread pool conflicts with Tokio's async runtime. It's important to ensure that integrating Rayon doesn't cause deadlocks or hinder async performance.

Consider using async-aware concurrency patterns provided by Tokio or the futures crate to achieve parallelism that's compatible with the async runtime.

crates/torii/grpc/src/server/mod.rs Show resolved Hide resolved
crates/torii/grpc/src/server/mod.rs Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Outside diff range and nitpick comments (1)
crates/torii/grpc/src/server/mod.rs (1)

218-305: Add unit tests for the new fetch_entities function

The newly added fetch_entities function significantly enhances data retrieval. To ensure its reliability and correctness, please consider adding unit tests that cover various scenarios, including edge cases and error handling.

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 827c945 and 17dfab3.

Files selected for processing (1)
  • crates/torii/grpc/src/server/mod.rs (16 hunks)
Additional comments not posted (4)
crates/torii/grpc/src/server/mod.rs (4)

24-24: Ohayo sensei! Reminder about using Rayon in async context

As previously noted, using Rayon within asynchronous functions can lead to unexpected behavior due to potential thread pool conflicts with Tokio's async runtime. Please ensure that integrating Rayon doesn't cause deadlocks or hinder async performance.


290-294: Ohayo sensei! Reminder about using Rayon in async context

The previous concerns regarding the use of par_iter() within an async function are still applicable. Leveraging Rayon can conflict with the async runtime's thread management.


Line range hint 345-357: Ensure safe construction of SQL queries to prevent injection risks

Interpolating variables like {table} and {model_relation_table} directly into SQL queries can introduce SQL injection vulnerabilities if these variables contain unexpected values.

[security]

To enhance security, validate that table and model_relation_table are from a predefined list of table names or sanitize them appropriately before interpolation. Alternatively, consider using parameterized queries or query builders that safely handle identifiers.


604-612: Ohayo sensei! Reminder about using Rayon in async context

As previously mentioned, using par_iter() inside an async context may lead to undesirable outcomes. Please revisit this section to ensure compatibility with the async runtime.

crates/torii/grpc/src/server/mod.rs Show resolved Hide resolved
Copy link
Collaborator

@glihm glihm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup and refactoring, some minor comments.

crates/torii/grpc/src/server/mod.rs Outdated Show resolved Hide resolved
crates/torii/grpc/src/server/mod.rs Outdated Show resolved Hide resolved
crates/torii/grpc/src/server/mod.rs Outdated Show resolved Hide resolved
crates/torii/grpc/src/server/mod.rs Outdated Show resolved Hide resolved
crates/torii/grpc/src/server/mod.rs Outdated Show resolved Hide resolved
crates/torii/grpc/src/server/mod.rs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants