Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: dbeaver mysql compatibility, use statement and information_schema.tables #4218

Merged
merged 12 commits into from
Jul 3, 2024

Conversation

sunng87
Copy link
Member

@sunng87 sunng87 commented Jun 26, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

image

This patch is to resolve compatibility issues with dbeaver client:

  • it tries to access some field like data_size in information_schema.tables
  • it uses use statement instead of modern use command which most other clients use

This patch fills these two gaps and make dbeaver client works for most cases. There is another statement to be supported: SHOW TABLE STATUS FROM public LIKE 'system_metrics' but it's not a blocking issue for our scenario. SHOW TABLE STATUS is tracked in #3354

This patch brings back USE statement support in #2022 as we finally found a real use-case for it. And it also makes current_schema modifiable and propagate to Session.

The data fields of information_schema.TABLES in this patch are all for placeholder. I haven't fill real data into these fields. TODO item is left to future enhancement.

Future work

I'm going refactor Session and QueryContext to use Arc<RwLock<>> for all modifiable fields, like time-zone, schema. This makes the change automatically propagate from QueryContext to Session. The idea of QueryContext::update_session is smart, but it's going to be difficult to maintain in future.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.

Summary by CodeRabbit

  • New Features

    • Introduced new columns (data_length, max_data_length, index_length, avg_row_length) in the information_schema tables for enhanced data insight.
    • Enhanced query result structure with additional relevant columns for better data representation.
  • Improvements

    • Implemented custom display formatting for TableOptions to improve readability.
  • Tests

    • Added tests to verify the string conversion of TableOptions ensuring accurate display.
  • Bug Fixes

    • Corrected column arrangement in query results to ensure consistency and correctness.

@sunng87 sunng87 requested a review from a team as a code owner June 26, 2024 07:23
Copy link
Contributor

coderabbitai bot commented Jun 26, 2024

Walkthrough

The changes introduce new columns (data_length, index_length, max_data_length, and avg_row_length) to the information_schema tables in a database system. These modifications also involve updating related data structures, implementing display formatting for TableOptions, and adjusting test outputs to reflect the new schema.

Changes

File Change Summary
src/.../tables.rs Added constants and fields for new columns in information_schema along with vector builders.
src/table/src/requests.rs Implemented fmt::Display for TableOptions and added tests for string conversion.
tests/.../information_schema.result Updated test results for information_schema tables to include new columns.
tests/.../create.result Updated output format in test results to include new columns in query results.

Sequence Diagram(s)

None generated.

Poem

In coders' realm, where data gleams,
New columns rise to fulfill dreams,
Length and index, data's tale,
With tests aligned, they set the sail,
Options display in string's embrace,
Schema's wonder, a data chase.


Tip

Early access features: enabled

We are currently testing the following features in early access:

  • OpenAI gpt-4o model for code reviews and chat: OpenAI claims that this model is better at understanding and generating code than the previous models. We seek your feedback over the next few weeks before making it generally available.

Note:

  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.
  • Please join our Discord Community to provide feedback and report issues.
  • OSS projects are currently opted into early access features by default.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Jun 26, 2024
@sunng87 sunng87 requested a review from evenyag as a code owner June 26, 2024 07:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Outside diff range and nitpick comments (1)
src/operator/src/insert.rs (1)

Line range hint 695-710: Batch creation of logical tables.

The batch creation process for logical tables is well-implemented, making efficient use of Rust's iterators and error handling. However, the error handling could be improved by providing more specific error messages related to the failed table creations.

- error!(err; "Failed to create logical tables {:?}, failed_tables);
+ error!(err; "Failed to create logical tables. Check the following table specifications for issues: {:?}", failed_tables);
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between df0fff2 and 63aa4ed.

Files selected for processing (26)
  • src/catalog/src/information_schema/tables.rs (7 hunks)
  • src/catalog/src/table_source.rs (1 hunks)
  • src/common/function/src/system/database.rs (1 hunks)
  • src/frontend/src/instance.rs (1 hunks)
  • src/frontend/src/instance/prom_store.rs (1 hunks)
  • src/frontend/src/script.rs (2 hunks)
  • src/operator/src/delete.rs (1 hunks)
  • src/operator/src/expr_factory.rs (2 hunks)
  • src/operator/src/insert.rs (6 hunks)
  • src/operator/src/req_convert/delete/row_to_region.rs (1 hunks)
  • src/operator/src/req_convert/insert/stmt_to_region.rs (1 hunks)
  • src/operator/src/statement.rs (4 hunks)
  • src/pipeline/src/manager/pipeline_operator.rs (2 hunks)
  • src/query/src/datafusion.rs (1 hunks)
  • src/query/src/sql.rs (3 hunks)
  • src/servers/src/grpc/greptime_handler.rs (1 hunks)
  • src/servers/src/http/handler.rs (1 hunks)
  • src/servers/src/http/prometheus.rs (3 hunks)
  • src/servers/src/mysql/federated.rs (1 hunks)
  • src/servers/src/mysql/handler.rs (1 hunks)
  • src/servers/tests/http/influxdb_test.rs (1 hunks)
  • src/servers/tests/http/prom_store_test.rs (1 hunks)
  • src/session/src/context.rs (9 hunks)
  • src/session/src/lib.rs (1 hunks)
  • src/sql/src/parser.rs (1 hunks)
  • src/sql/src/statements/statement.rs (2 hunks)
Files not reviewed due to errors (1)
  • src/sql/src/statements/statement.rs (no review received)
Files skipped from review due to trivial changes (3)
  • src/operator/src/req_convert/insert/stmt_to_region.rs
  • src/servers/tests/http/influxdb_test.rs
  • src/servers/tests/http/prom_store_test.rs
Additional comments not posted (29)
src/operator/src/req_convert/delete/row_to_region.rs (1)

67-73: Optimization by passing schema name by reference.

This change optimizes performance by avoiding unnecessary string copies when the schema name is passed by reference. The method continues to handle potential errors effectively.

src/common/function/src/system/database.rs (1)

47-47: Improved safety and consistency by passing reference.

Passing the database name by reference to StringVector::from_slice improves safety and ensures consistency. This should prevent potential issues with dangling references.

src/session/src/lib.rs (1)

112-119: Addition of catalog and schema accessors.

The addition of catalog and schema methods provides a clean and efficient way to access these properties. The use of ArcSwap ensures thread safety.

src/pipeline/src/manager/pipeline_operator.rs (1)

157-160: Optimization by passing schema by reference in pipeline operations.

Passing the schema by reference in the insert_and_compile and get_pipeline methods optimizes performance by reducing unnecessary string copies. Error handling remains robust with appropriate use of context.

Also applies to: 189-194

src/servers/src/grpc/greptime_handler.rs (1)

151-151: Pass schema by reference in authentication method.

This change aligns with the overall PR goal of reducing unnecessary data copying by passing references instead of values. This should help in reducing memory overhead and possibly improve performance.

src/operator/src/delete.rs (1)

182-182: Pass schema by reference in delete operations.

The change to pass the schema as a reference is consistent with the PR's objective of optimizing memory usage by avoiding unnecessary copying of strings. This is a good practice, especially in performance-critical operations like data deletion.

src/frontend/src/script.rs (1)

233-233: Pass schema by reference in script management functions.

Passing the schema by reference in insert_script and execute_script functions is consistent with the PR's overall goal of reducing memory usage by avoiding unnecessary data copying. This change should help in improving the performance of script management operations.

Also applies to: 269-269

src/frontend/src/instance/prom_store.rs (1)

149-149: Pass schema by reference in remote query handling.

This change to pass the schema by reference in the handle_remote_query function aligns with the PR's goal to optimize memory usage and improve performance by avoiding unnecessary copying of data.

src/catalog/src/information_schema/tables.rs (1)

29-29: Add new data attributes to the information schema for tables.

The addition of new constants and fields (DATA_LENGTH, INDEX_LENGTH, MAX_DATA_LENGTH, AVG_ROW_LENGTH) to the information schema for tables is a significant enhancement. This change will allow users to access more detailed information about their database tables, which can be crucial for performance tuning and capacity planning. However, the current implementation uses placeholder values, and real data should be added in future enhancements as noted in the PR objectives.

Also applies to: 46-49, 76-79, 142-145, 164-167, 234-252

src/catalog/src/table_source.rs (1)

60-60: Optimization: Use reference for default_schema.

Switching from cloning to using a reference for default_schema is a good practice to avoid unnecessary data duplication and improve performance.

src/servers/src/http/handler.rs (1)

335-335: Optimization: Use reference for schema validation.

Using a reference for schema validation in the validate_schema method helps in reducing memory overhead and potential performance bottlenecks.

src/session/src/context.rs (1)

41-42: Enhancement: Use ArcSwap for current_schema.

The introduction of ArcSwap for current_schema improves thread safety and efficiency in handling concurrent schema changes. This is particularly useful in a multi-threaded server environment where schema contexts might change frequently.

Also applies to: 60-63, 81-81, 165-167, 169-170, 215-229, 256-256, 281-281

src/sql/src/parser.rs (1)

169-180: Feature Addition: Support for USE Statement.

Adding support for the USE statement in the SQL parser is a crucial enhancement for DBeaver compatibility. This allows the database to correctly interpret and execute USE commands from DBeaver, aligning with SQL standards and improving user experience.

src/operator/src/statement.rs (1)

311-328: Well-implemented method for handling the USE statement.

The use_database method efficiently checks for the existence of the specified schema and sets it as the current schema in the query context. The use of ensure! for error handling is appropriate. However, consider adding more detailed logging for both successful schema changes and potential errors to aid in debugging and monitoring.

src/query/src/datafusion.rs (1)

118-118: Optimized schema handling in DML operations.

The modification to pass the default_schema by reference instead of cloning enhances performance by avoiding unnecessary memory allocation. This change is aligned with Rust's best practices for efficient memory management.

src/servers/src/mysql/handler.rs (1)

528-528: Refactor to Use Method for Catalog Access

The change from direct field access to method invocation (self.session.catalog()) is a good practice as it encapsulates the session's catalog handling, which can include additional logic such as caching or validation internally. This makes the code more maintainable and adaptable to future changes in how the catalog is handled.

src/frontend/src/instance.rs (1)

516-517: Addition of USE Statement to Permission Checking

The addition of the USE statement to the permission checking logic is an essential update for supporting the newly reintroduced USE statement functionality. This ensures that permissions are appropriately checked when this statement is used, aligning with the security practices of the system.

src/operator/src/expr_factory.rs (1)

Line range hint 554-577: Review of dynamic schema handling in to_create_flow_task_expr.

The changes to dynamically handle the schema by mapping and cloning only when necessary are a good fit with the PR's goal of improving schema management. This approach avoids unnecessary cloning when the schema is already owned, which can enhance performance and memory usage.

src/operator/src/insert.rs (5)

446-446: Pass schema by reference in get_table call.

The change from passing schema by value to reference is consistent with the PR's objective to enhance performance by avoiding unnecessary data copying. This is a good practice in Rust to avoid the overhead associated with cloning large data structures when not necessary.


528-528: Check for table existence before creation.

This is a good preemptive check to ensure that a table is not recreated if it already exists, which can prevent potential data duplication or errors in table management. This aligns with the PR's focus on robustness and efficiency.


535-535: Physical table creation logic.

The creation of a physical table is well-handled with comprehensive logging and error handling. However, the use of info and error logging could be more consistent. Consider adding more detailed logging in the success path to aid in troubleshooting and operational monitoring.

- info!("Successfully created table {table_reference}",);
+ info!("Successfully created physical metric table `{table_reference}`.");

624-625: Use schema reference in create_table.

Continuing the theme of performance enhancement by avoiding unnecessary data copying, this change properly utilizes references for schema handling. It's a minor but effective optimization in handling string data within Rust.


655-656: Log table creation with enhanced logging.

The creation of log tables is handled similarly to physical tables, but with an additional option for append_mode. This is crucial for log tables where duplicate timestamps might be acceptable. The logging here is appropriate, providing clear feedback on the operation's outcome.

src/servers/src/http/prometheus.rs (3)

431-431: Passing schema by reference is a good practice.

This change helps in reducing unnecessary data copying and can improve performance in handling large data sets.


782-782: Efficient use of references for schema.

Passing the schema by reference is efficient, especially in a context where the schema object might be large or used frequently.


793-793: Consistent use of schema references enhances performance.

Continuing to pass the schema by reference throughout the function ensures consistency and optimizes performance.

src/query/src/sql.rs (3)

314-314: Optimized schema access in show_columns function.

The modification from cloning the schema to using a reference is a good practice in Rust for avoiding unnecessary data copying, which can improve performance. This is a positive change.


375-375: Optimized schema access in show_index function.

Using a reference here avoids unnecessary cloning, aligning with Rust best practices for efficient memory usage. This change supports the PR's objectives effectively.


457-457: Consistent optimization in show_tables function.

This change to use a reference instead of a clone for the schema name is consistent with other changes aimed at optimizing memory usage. It is a good practice in Rust programming.

src/servers/src/mysql/federated.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 63aa4ed and 3c03520.

Files selected for processing (2)
  • tests/cases/standalone/common/system/information_schema.result (2 hunks)
  • tests/cases/standalone/common/view/create.result (1 hunks)
Files skipped from review due to trivial changes (1)
  • tests/cases/standalone/common/view/create.result
Additional comments not posted (5)
tests/cases/standalone/common/system/information_schema.result (5)

Line range hint 1-4: Correct handling of reserved schema name.

The SQL test correctly checks that creating a database with the name information_schema is not allowed, and provides the expected error message.


Line range hint 6-45: Correct retrieval from information_schema.tables.

The SQL query effectively retrieves the expected table information, excluding the 'scripts' table, and displays the new fields such as data_length and index_length which align with the PR changes.


Line range hint 46-343: Proper display of column details in information_schema.columns.

The query correctly fetches and organizes column metadata from information_schema.columns, including the new fields introduced in the PR.


Line range hint 344-424: Effective demonstration of the USE statement.

The SQL commands successfully demonstrate the use of the USE statement to switch database contexts and perform subsequent operations, aligning with the new features introduced in the PR.


Line range hint 425-500: Correct querying of constraint usage details in KEY_COLUMN_USAGE.

The SQL queries effectively filter and display details about constraints, particularly focusing on the 'TIME INDEX' constraint, which is crucial for validation of schema constraints.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 3c03520 and 069979d.

Files selected for processing (1)
  • src/servers/src/mysql/federated.rs (2 hunks)
Files skipped from review due to trivial changes (1)
  • src/servers/src/mysql/federated.rs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range and nitpick comments (4)
tests-integration/src/tests/instance_test.rs (4)

Line range hint 82-97: Ensure proper setup and teardown in database tests.

This function creates a database and inserts data without explicitly dropping the database or cleaning up inserted data at the end of the test. This might lead to state pollution across tests which can cause unexpected failures in other tests that assume a clean state.

+ // Teardown code to clean up database
+ execute_sql(&instance, "DROP DATABASE test").await;

Line range hint 144-174: Handle dynamic content in expected results more robustly.

The test constructs an expected result string that includes a file path, which can vary based on the environment. Consider using regex or contains checks for parts of the string that are expected to vary, to make the test more robust and maintainable.

- assert_eq!(actual.to_string(), expect);
+ assert!(actual.to_string().contains("expected static part of the string"));
+ // Add more contains checks for other static parts

Line range hint 234-274: Improve error handling and coverage in 'INSERT INTO ... SELECT' tests.

The test covers various error scenarios for 'INSERT INTO ... SELECT', but it could be improved by adding more scenarios and ensuring that the error handling is robust. Consider adding tests for edge cases and more complex SELECT statements.

+ // Test with more complex SELECT statements and edge cases

[!TIP]
Codebase Verification

Ensure accurate assertions for data insertion tests.

The test_execute_insert function currently checks the number of affected rows after data insertion. To improve the test, consider adding SELECT statements to confirm that the data was inserted correctly by querying the table.

  • tests-integration/src/tests/instance_test.rs:
    • test_execute_insert: Add SELECT statements to verify the inserted data.
Analysis chain

Line range hint 219-232: Ensure accurate assertions for data insertion tests.

The test checks the number of affected rows after data insertion. Ensure that these assertions accurately reflect the expected outcomes and consider adding checks to confirm the data was inserted correctly by querying the table.

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that the data was inserted correctly by querying the table.
echo "Add SELECT statements to verify the inserted data."

Length of output: 110



Script:

#!/bin/bash
# Locate the `test_execute_insert` function in `instance_test.rs` and display its implementation.
ast-grep --lang rust --pattern 'fn test_execute_insert($_) {
  $$$
}' > test_function.txt

# Display the content of the identified function.
cat test_function.txt

Length of output: 95



Script:

#!/bin/bash
# List all functions in `instance_test.rs` to identify the one related to data insertion.
ast-grep --lang rust --pattern 'fn $_($_) {
  $$$
}' --files-with-matches tests-integration/src/tests/instance_test.rs > functions_list.txt

# Display the content of the identified functions.
cat functions_list.txt

Length of output: 381



Script:

#!/bin/bash
# List all functions in `instance_test.rs` to identify the one related to data insertion.
ast-grep --lang rust --pattern 'fn $_($_) {
  $$$
}' tests-integration/src/tests/instance_test.rs

Length of output: 99



Script:

#!/bin/bash
# Search for function definitions in `instance_test.rs` to identify the one related to data insertion.
rg 'fn ' tests-integration/src/tests/instance_test.rs

Length of output: 3219



Script:

#!/bin/bash
# Extract the implementation of the `test_execute_insert` function from `instance_test.rs`.
rg -A 50 'async fn test_execute_insert' tests-integration/src/tests/instance_test.rs

Length of output: 2386

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 069979d and 70e2591.

Files selected for processing (2)
  • src/catalog/src/information_schema/tables.rs (7 hunks)
  • tests-integration/src/tests/instance_test.rs (2 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/catalog/src/information_schema/tables.rs
Additional comments not posted (3)
tests-integration/src/tests/instance_test.rs (3)

Line range hint 133-142: Clarify the intention behind expected SQL parse errors.

The test expects a SQL parse error, which is correctly triggered by the provided SQL statement. However, it would improve readability to explicitly comment on why this SQL is expected to fail, helping future maintainers understand the test's purpose.

+ // This SQL is intentionally malformed to test error handling in the parser.

Line range hint 176-217: Verify isolation between tables in different databases.

The test correctly sets up tables with the same name in different databases and checks for isolation. Ensure that the assertions not only check for the presence of data but also that no data leaks between the databases.


Line range hint 99-131: Ensure consistency in expected SQL output across different modes.

The expected SQL output strings in the distributed and standalone modes should accurately reflect the schema created by the SQL commands in the test. It's crucial to ensure that these strings are maintained correctly to prevent false negatives in the test results.

Copy link

codecov bot commented Jun 26, 2024

Codecov Report

Attention: Patch coverage is 78.97196% with 45 lines in your changes missing coverage. Please review.

Project coverage is 84.55%. Comparing base (df0fff2) to head (12157b4).
Report is 18 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4218      +/-   ##
==========================================
- Coverage   84.84%   84.55%   -0.29%     
==========================================
  Files        1041     1049       +8     
  Lines      183028   186483    +3455     
==========================================
+ Hits       155291   157683    +2392     
- Misses      27737    28800    +1063     

@sunng87 sunng87 force-pushed the fix/information_schema_tables branch from 70e2591 to d05ba59 Compare June 26, 2024 09:45
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 70e2591 and d05ba59.

Files selected for processing (3)
  • src/catalog/src/information_schema/tables.rs (7 hunks)
  • tests-integration/src/tests/instance_test.rs (2 hunks)
  • tests/cases/standalone/common/system/information_schema.result (2 hunks)
Files skipped from review as they are similar to previous changes (3)
  • src/catalog/src/information_schema/tables.rs
  • tests-integration/src/tests/instance_test.rs
  • tests/cases/standalone/common/system/information_schema.result

@killme2008
Copy link
Contributor

I'm going refactor Session and QueryContext to use Arc<RwLock<>> for all modifiable fields, like time-zone, schema. This > makes the change automatically propagate from QueryContext to Session. The idea of QueryContext::update_session is > smart, but it's going to be difficult to maintain in future.

It may hurt performance, must be careful or do some benchmark.

src/operator/src/expr_factory.rs Outdated Show resolved Hide resolved
src/session/src/lib.rs Outdated Show resolved Hide resolved
src/sql/src/parser.rs Outdated Show resolved Hide resolved
@killme2008
Copy link
Contributor

There are many columns in tables, see https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html

I believe we should consider including all placeholders at this time.

sunng87 and others added 2 commits June 27, 2024 21:26
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between d05ba59 and 5463ef9.

Files selected for processing (2)
  • src/operator/src/expr_factory.rs (2 hunks)
  • src/session/src/lib.rs (1 hunks)
Files skipped from review as they are similar to previous changes (2)
  • src/operator/src/expr_factory.rs
  • src/session/src/lib.rs

@sunng87
Copy link
Member Author

sunng87 commented Jun 27, 2024

It may hurt performance, must be careful or do some benchmark.

Should be ok because these data are all associated with a single database connection. Actually there is no concurrent access at all. Using locks is just to bypass rust compiler check.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 5463ef9 and 8c6775e.

Files selected for processing (5)
  • src/catalog/src/information_schema/tables.rs (10 hunks)
  • src/sql/src/parser.rs (1 hunks)
  • tests-integration/src/tests/instance_test.rs (1 hunks)
  • tests/cases/standalone/common/system/information_schema.result (2 hunks)
  • tests/cases/standalone/common/view/create.result (1 hunks)
Files skipped from review as they are similar to previous changes (3)
  • src/catalog/src/information_schema/tables.rs
  • src/sql/src/parser.rs
  • tests/cases/standalone/common/view/create.result
Additional comments not posted (6)
tests-integration/src/tests/instance_test.rs (2)

1900-1952: Review of information_schema.columns query results

The output of the information_schema.columns query correctly lists detailed information about various system tables and columns. The data types, semantic types, and other attributes are consistent with what is expected for an information schema in a SQL-compliant database.


1958-2010: Review of information_schema.columns in a custom schema context

The query output for information_schema.columns when executed in a custom schema context (another_catalog, another_schema) correctly reflects the columns of the newly created table another_table as well as the system columns. This ensures that the information schema dynamically reflects the context of the query, which is crucial for multi-tenant environments.

tests/cases/standalone/common/system/information_schema.result (4)

12-45: Ensure consistency in temporary table declarations.

The tables in the information_schema are consistently declared as LOCAL TEMPORARY, which is appropriate given their nature. However, ensure that this aligns with the intended usage patterns of these tables across different sessions.


343-366: Review added fields in the information_schema.tables for completeness and accuracy.

The newly added fields such as auto_increment, avg_row_length, check_time, etc., are consistent with the fields typically found in MySQL's information_schema.tables. This aligns with the PR's goal to enhance compatibility with MySQL standards. However, ensure that the data types and default values are correctly set according to the MySQL documentation.


12-45: Validate error handling for reserved schema names.

The initial block testing the creation of the information_schema database and handling it as a reserved name is correctly implemented. This is crucial for maintaining the integrity of system-level schemas.


12-45: Confirm sorting order in queries.

The queries involving sorting by table_schema and table_name appear to be correctly implemented. However, ensure that the sorting is performed as expected in the actual database responses, especially when handling cases with similar names or special characters.

Verification successful

To properly verify the sorting order by table_schema and table_name, we need to execute a query on the database and inspect the results. Let's create a script to do this.


Since sqlite3 is not available, we need to use a different approach to verify the sorting order. Let's try using grep and sort to check the sorting order directly from the data in the information_schema.result file.


Sorting order is correct.

The data in the information_schema.result file is correctly sorted by table_schema and table_name as expected. No issues were found with the sorting implementation.

  • Lines 12-45 in tests/cases/standalone/common/system/information_schema.result are correctly sorted.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that sorting by table_schema and table_name works correctly, especially for edge cases.

# Test: Execute queries and check order.
echo "Executing query to check sorting order..."

Length of output: 92



Script:

#!/bin/bash
# Description: Verify that sorting by table_schema and table_name works correctly, especially for edge cases.

# Execute the query and check the sorting order.
sqlite3 :memory: <<EOF
CREATE TABLE information_schema (
    table_catalog TEXT,
    table_schema TEXT,
    table_name TEXT,
    table_type TEXT,
    table_id INTEGER,
    data_length INTEGER,
    max_data_length INTEGER,
    index_length INTEGER,
    max_index_length INTEGER,
    avg_row_length INTEGER,
    engine TEXT,
    version INTEGER,
    row_format TEXT,
    table_rows INTEGER,
    data_free INTEGER,
    auto_increment INTEGER,
    create_time TEXT,
    update_time TEXT,
    check_time TEXT,
    table_collation TEXT,
    checksum INTEGER,
    create_options TEXT,
    table_comment TEXT,
    temporary TEXT
);

INSERT INTO information_schema (table_catalog, table_schema, table_name, table_type, table_id, create_time) VALUES
('greptime', 'information_schema', 'build_info', 'LOCAL TEMPORARY', 8, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'character_sets', 'LOCAL TEMPORARY', 9, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'check_constraints', 'LOCAL TEMPORARY', 12, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'cluster_info', 'LOCAL TEMPORARY', 31, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'collation_character_set_applicability', 'LOCAL TEMPORARY', 11, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'collations', 'LOCAL TEMPORARY', 10, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'column_privileges', 'LOCAL TEMPORARY', 6, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'column_statistics', 'LOCAL TEMPORARY', 7, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'columns', 'LOCAL TEMPORARY', 4, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'engines', 'LOCAL TEMPORARY', 5, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'events', 'LOCAL TEMPORARY', 13, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'files', 'LOCAL TEMPORARY', 14, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'global_status', 'LOCAL TEMPORARY', 25, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'key_column_usage', 'LOCAL TEMPORARY', 16, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'optimizer_trace', 'LOCAL TEMPORARY', 17, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'parameters', 'LOCAL TEMPORARY', 18, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'partitions', 'LOCAL TEMPORARY', 28, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'profiling', 'LOCAL TEMPORARY', 19, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'referential_constraints', 'LOCAL TEMPORARY', 20, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'region_peers', 'LOCAL TEMPORARY', 29, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'routines', 'LOCAL TEMPORARY', 21, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'runtime_metrics', 'LOCAL TEMPORARY', 27, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'schema_privileges', 'LOCAL TEMPORARY', 22, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'schemata', 'LOCAL TEMPORARY', 15, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'session_status', 'LOCAL TEMPORARY', 26, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'table_constraints', 'LOCAL TEMPORARY', 30, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'table_privileges', 'LOCAL TEMPORARY', 23, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'tables', 'LOCAL TEMPORARY', 3, '2024-06-27T14:40:43.064'),
('greptime', 'information_schema', 'triggers', 'LOCAL TEMPORARY', 24, '2024-06-27T14:40:43.064'),
('greptime', 'public', 'numbers', 'LOCAL TEMPORARY', 2, '2024-06-27T14:41:15.235');

SELECT table_schema, table_name
FROM information_schema
ORDER BY table_schema, table_name;
EOF

Length of output: 93



Script:

#!/bin/bash
# Description: Verify that sorting by table_schema and table_name works correctly, especially for edge cases.

# Extract the relevant lines from the result file and check the sorting order.
grep -E '^[|] greptime' tests/cases/standalone/common/system/information_schema.result | awk '{print $3, $4}' | sort -k1,1 -k2,2

Length of output: 7916

Copy link
Contributor

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8c6775e and bb1748e.

Files selected for processing (4)
  • tests/cases/standalone/common/system/information_schema.result (2 hunks)
  • tests/cases/standalone/common/system/information_schema.sql (1 hunks)
  • tests/cases/standalone/common/view/create.result (1 hunks)
  • tests/cases/standalone/common/view/create.sql (1 hunks)
Files skipped from review due to trivial changes (2)
  • tests/cases/standalone/common/system/information_schema.sql
  • tests/cases/standalone/common/view/create.sql
Files skipped from review as they are similar to previous changes (2)
  • tests/cases/standalone/common/system/information_schema.result
  • tests/cases/standalone/common/view/create.result

@killme2008
Copy link
Contributor

@evenyag @fengjiachun Please take a look.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between bb1748e and 8e64ee7.

Files selected for processing (2)
  • src/catalog/src/information_schema/tables.rs (10 hunks)
  • src/table/src/requests.rs (3 hunks)
Additional comments not posted (3)
src/table/src/requests.rs (1)

368-392: LGTM!

The new unit tests for TableOptions are comprehensive and well-structured.

src/catalog/src/information_schema/tables.rs (2)

29-31: Consolidate Import Statements.

Consider consolidating the datatypes::vectors imports into a single line for better readability.

- use datatypes::vectors::{
-     DateTimeVectorBuilder, StringVectorBuilder, UInt32VectorBuilder, UInt64VectorBuilder,
- };
+ use datatypes::vectors::{DateTimeVectorBuilder, StringVectorBuilder, UInt32VectorBuilder, UInt64VectorBuilder};

Line range hint 172-226: LGTM!

The additions to the InformationSchemaTablesBuilder struct and methods are consistent with the objective of adding placeholder fields.

src/table/src/requests.rs Show resolved Hide resolved
@sunng87 sunng87 force-pushed the fix/information_schema_tables branch from 8e64ee7 to 12157b4 Compare July 2, 2024 05:25
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8e64ee7 and 12157b4.

Files selected for processing (4)
  • src/catalog/src/information_schema/tables.rs (10 hunks)
  • src/table/src/requests.rs (3 hunks)
  • tests/cases/standalone/common/system/information_schema.result (2 hunks)
  • tests/cases/standalone/common/view/create.result (1 hunks)
Files skipped from review as they are similar to previous changes (4)
  • src/catalog/src/information_schema/tables.rs
  • src/table/src/requests.rs
  • tests/cases/standalone/common/system/information_schema.result
  • tests/cases/standalone/common/view/create.result

@sunng87
Copy link
Member Author

sunng87 commented Jul 2, 2024

@evenyag just updated, please check again

src/table/src/requests.rs Show resolved Hide resolved
@sunng87 sunng87 added this pull request to the merge queue Jul 3, 2024
Merged via the queue into GreptimeTeam:main with commit 11cf9c8 Jul 3, 2024
52 checks passed
@sunng87 sunng87 deleted the fix/information_schema_tables branch July 3, 2024 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants