Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement FlowTaskMetadataManager #3766

Merged
merged 26 commits into from
Apr 25, 2024

Conversation

WenyXu
Copy link
Member

@WenyXu WenyXu commented Apr 22, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

#3664

What's changed and what's your intention?

  1. Introduces FlowTaskInfoKey, layout: __flow_task/{catalog}/info/{flow_task_id}, which stores the metadata of the flow task.
  2. Introduces FlowTaskNameKey, layout: __flow_task/{catalog}/name/{task_name}, which mapping task_name to task_id.
  3. Introduces FlownodeTaskKey, layout: __flow_task/{catalog}/flownode/{flownode_id}/{flow_task_id}/{partition_id}, which mapping flownode_id to task_id.
  4. Introduces TableTaskKey, layout: __table_task/{catalog}/source_table/{table_id}/{flownode_id}/{flow_task_id}/{partition_id}, which mapping table_id to node_id.
  5. Introduces FlowTaskMetadataManager, and implements the create_flow_task_metadata method.

The whole picture will be like this:

__flow_task/
    {catalog}/
      info/
        {tsak_id}
    
      name/
        {task_name}
    
      flownode/
        {flownode_id}/
          {task_id}/
            {partition_id}

      source_table/
        {table_id}/
          {flownode_id}/
            {task_id}/
              {partition_id}

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR does not require documentation updates.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Apr 22, 2024
@WenyXu WenyXu changed the title Feat/flow task metadta feat: implement FlowMetadataManager Apr 22, 2024
@discord9 discord9 requested a review from zhongzc April 22, 2024 11:17
Copy link

codecov bot commented Apr 22, 2024

Codecov Report

Attention: Patch coverage is 93.11552% with 59 lines in your changes are missing coverage. Please review.

Project coverage is 85.32%. Comparing base (20a933e) to head (726488e).
Report is 10 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3766      +/-   ##
==========================================
- Coverage   85.55%   85.32%   -0.24%     
==========================================
  Files         947      952       +5     
  Lines      160481   161940    +1459     
==========================================
+ Hits       137307   138172     +865     
- Misses      23174    23768     +594     

@WenyXu WenyXu force-pushed the feat/flow-task-metadta branch 2 times, most recently from 032cd50 to 6973fb5 Compare April 23, 2024 09:22
src/common/meta/src/error.rs Show resolved Hide resolved
src/common/meta/src/key/flow_task.rs Outdated Show resolved Hide resolved
src/common/meta/src/key.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/flow_task_name.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/flownode_task.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/flownode_task.rs Outdated Show resolved Hide resolved
src/common/meta/src/key.rs Outdated Show resolved Hide resolved
src/common/meta/src/key.rs Outdated Show resolved Hide resolved
src/common/meta/src/key.rs Outdated Show resolved Hide resolved
@WenyXu
Copy link
Member Author

WenyXu commented Apr 24, 2024

Why is the flow task is directly put under "catalog"? Do you mean flow task can cross database but not catalog?

Yes, I think we should treat a catalog as a tenant. cc @discord9

@WenyXu
Copy link
Member Author

WenyXu commented Apr 24, 2024

PTAL @MichaelScofield @fengjiachun

@discord9
Copy link
Contributor

Why is the flow task is directly put under "catalog"? Do you mean flow task can cross database but not catalog?

Yes, I think we should treat a catalog as a tenant. cc @discord9

It would make sense for one user to use multiple database as input, so task should be cross database but also belong to one catalog, so it's per user?

@WenyXu WenyXu marked this pull request as draft April 24, 2024 06:39
@WenyXu WenyXu marked this pull request as ready for review April 24, 2024 13:58
@WenyXu WenyXu changed the title feat: implement FlowMetadataManager feat: implement FlowTaskMetadataManager Apr 24, 2024
src/common/meta/src/key/flow_task/flow_task_info.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/flow_task/flow_task_name.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/flow_task/flownode_task.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/flow_task/flow_task_name.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/flow_task/flownode_task.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/flow_task/table_task.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/scope.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/table_name.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/flow_task/flownode_task.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/flow_task/table_task.rs Outdated Show resolved Hide resolved
src/common/meta/src/key.rs Outdated Show resolved Hide resolved
src/common/meta/src/key.rs Outdated Show resolved Hide resolved
src/common/meta/src/error.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/flow_task.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/flow_task.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/flow_task/flow_task_info.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/flow_task/flownode_task.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/flow_task/table_task.rs Outdated Show resolved Hide resolved
@MichaelScofield
Copy link
Collaborator

Can we reduce the depth of the key? I think we can end up with "flownode_id", put flow task id and partition id in the value.

@MichaelScofield
Copy link
Collaborator

MichaelScofield commented Apr 25, 2024

I don't think it's good to place the variable "catalog" between two static prefixes. For example, better __flow_task/name/{catalog} than __flow_task/{catalog}/name. It can be a lot easier for listing, for example, all flow tasks' names.

src/common/meta/src/key.rs Show resolved Hide resolved
src/common/meta/src/key/flow_task.rs Show resolved Hide resolved
src/common/meta/src/key/flow_task.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/flow_task.rs Outdated Show resolved Hide resolved
src/common/meta/src/key/flow_task.rs Show resolved Hide resolved
src/common/meta/src/key/flow_task.rs Show resolved Hide resolved
Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fengjiachun
Copy link
Collaborator

I don't think it's good to place the variable "catalog" between two static prefixes. For example, better __flow_task/name/{catalog} than __flow_task/{catalog}/name. It can be a lot easier for listing, for example, all flow tasks' names.

We need to be able to see all keys under a catalog, and the catalog also serves as an isolation, in other words: catalog1 cannot see catalog2.

@MichaelScofield MichaelScofield added this pull request to the merge queue Apr 25, 2024
Merged via the queue into GreptimeTeam:main with commit 9206f60 Apr 25, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants