Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

describe parallel writes during an insert #2300

Closed
robert-s-lee opened this issue Dec 18, 2017 · 7 comments
Closed

describe parallel writes during an insert #2300

robert-s-lee opened this issue Dec 18, 2017 · 7 comments

Comments

@robert-s-lee
Copy link
Contributor

@Amruta-Ranade the sequence diagrams (and accompanying .mmd files using mermaid) show how an insert is parallelized in detail. @bdarnell please check for accuracy again :)

  • 1.x and future 2.0 is described below.
  • connection from east and west are shown to show the concept of gateway and when the leaserholder is not the gateway
  • connection from west is when gateway happened to be the leaseholder and the raft leader which is the best case
  • what is now described is when leaseholder is not the raft leader (which could not happen but only for momentary) as it would be similar to the east connecting to west
  • mermaid sequence code can be cut/paste at https://mermaidjs.github.io/mermaid-live-editor/
  • examples show insert, but other DMLs (update and delete) would go through the same process

Below is 1.x behavior.

insert_singleton_ideal_west_1 x

%% mmdc -i insert_singleton_ideal_west_1.x.mmd -o insert_singleton_ideal_west_1.x.png
sequenceDiagram
    participant Client West
    participant Node1
    participant Node2
    participant Node3
    participant Client East
    Client West->>+Node1: insert int x (1,'A')
Note Over Node1,Node3: Client connects to any node
Note Right of Node1: node1 is range leaseholder and raft leader
    Node1->>Node1:put /t/primary/1/val/'A' 
    Node1->>-Node1:sync to disk 
    par parallel to raft followers
        Node1->>+Node2: 1) put /t/primary/1/val/'A'
        Node2->>-Node2:sync to disk
    and parallel to raft followers
        Node1->>+Node3: 2) put /t/primary/1/val/'A'
        Node3->>-Node3:sync to disk
    end
Note Over Node1,Node3: done when NAJORITY of the followers and leader complete sync to disk
Node1->>Client West: commit
Loading

insert_singleton_ideal_east_1 x

%% mmdc -i insert_singleton_ideal_east_1.x.mmd -o insert_singleton_ideal_east_1.x.png
sequenceDiagram
    participant Client West
    participant Node1
    participant Node2
    participant Node3
    participant Client East
    Client East->>+Node3: insert int x (1,'A')
Note Over Node1,Node3: Client connects to any node
Note Right of Node1: node1 is range leaseholder and raft leader
    Node3->>+Node1:put /t/primary/1/val/'A' 
    deactivate Node3
    Node1->>-Node1:sync to disk 
    par parallel to raft followers
        Node1->>+Node2: 1) put /t/primary/1/val/'A'
        Node2->>-Node2:sync to disk
    and parallel to raft followers
        Node1->>+Node3: 2) put /t/primary/1/val/'A'
        Node3->>-Node3:sync to disk
    end
Note Over Node1,Node3: done when MAJORITY of the followers and leader complete sync to disk
Node3->>Client East: commit
Loading

Below is 2.0 behavior.

The write to leaderholder/raft leader is parallel with the followers that should improve performance.

insert_singleton_ideal_west_2 x

%% mmdc -i insert_singleton_ideal_west_2.x.mmd -o insert_singleton_ideal_west_2.x.png
sequenceDiagram
    participant Client West
    participant Node1
    participant Node2
    participant Node3
    participant Client East
    Client West->>+Node1: insert int x (1,'A')
Note Over Node1,Node3: Client connects to any node
Note Right of Node1: node1 is range leaseholder and raft leader
    Node1->>Node1:put /t/primary/1/val/'A' 
    par parallel write to raft leader (New in 2.0)
        Node1->>-Node1:sync to disk 
    and parallel to raft followers
        Node1->>+Node2: 1) put /t/primary/1/val/'A'
        Node2->>-Node2:sync to disk
    and parallel to raft followers
        Node1->>+Node3: 2) put /t/primary/1/val/'A'
        Node3->>-Node3:sync to disk
    end
Note Over Node1,Node3: done when NAJORITY of the followers and leader complete sync to disk
Node1->>Client West: commit
Loading

insert_singleton_ideal_east_2 x

%% mmdc -i insert_singleton_ideal_east_2.x.mmd -o insert_singleton_ideal_east_2.x.png
sequenceDiagram
    participant Client West
    participant Node1
    participant Node2
    participant Node3
    participant Client East
    Client East->>+Node3: insert int x (1,'A')
Note Over Node1,Node3: Client connects to any node
Note Right of Node1: node1 is range leaseholder and raft leader
    Node3->>+Node1:put /t/primary/1/val/'A' 
    deactivate Node3
    par parallel write to raft leader (New in 2.0)
        Node1->>-Node1:sync to disk 
    and parallel to raft followers
        Node1->>+Node2: 1) put /t/primary/1/val/'A'
        Node2->>-Node2:sync to disk
    and parallel to raft followers
        Node1->>+Node3: 2) put /t/primary/1/val/'A'
        Node3->>-Node3:sync to disk
    end
Note Over Node1,Node3: done when MAJORITY of the followers and leader complete sync to disk
Node3->>Client East: commit
Loading
@bdarnell
Copy link
Contributor

These are mostly accurate, but in the second and fourth cases, the fact that you've omitted the RPC responses creates the impression that the sync on node3 is responsible for sending the final OK to the client.

I've updated the second diagram with responses:

sequenceDiagram
    participant Client West
    participant Node1
    participant Node2
    participant Node3
    participant Client East
    Client East->>+Node3: insert int x (1,'A')
Note Over Node1,Node3: Client connects to any node
Note Right of Node1: node1 is range leaseholder and raft leader
    Node3->>+Node1:put /t/primary/1/val/'A' 
    deactivate Node3
    Node1->>Node1:sync to disk 
    par parallel to raft followers
        Node1->>+Node2: 1) put /t/primary/1/val/'A'
        Node2->>Node2:sync to disk
        Node2->>-Node1: ok
    and parallel to raft followers
        Node1->>+Node3: 2) put /t/primary/1/val/'A'
        Node3->>Node3:sync to disk
        Node3->>-Node1: ok
    end
Note Over Node1,Node3: done when MAJORITY of the followers and leader complete sync to disk
Node1->>-Node3: put ok
Node3->>Client East: ok

image

@robert-s-lee
Copy link
Contributor Author

@bdarnell Good suggestions. I make the changes to the others. @Amruta-Ranade I can provide an outline for the page.

@robert-s-lee
Copy link
Contributor Author

Completed the changes on the diagrams to show return, corrected typo on insert DML

Archive.zip

insert_singleton_ideal_east_1 x
insert_singleton_ideal_east_2 x
insert_singleton_ideal_west_1 x
insert_singleton_ideal_west_2 x

@robert-s-lee
Copy link
Contributor Author

New in 2.0 label could be treated as the entire parallel write is new in 2.0. refactor the graph to clarify parallel write to followers is existing 1.x functionality. sync to disk at leaseholder in parallel with the followers is new in 2.0.

@jseldess
Copy link
Contributor

@robert-s-lee, is this essential for 2.0 or can we plan this work for post-release?

@robert-s-lee
Copy link
Contributor Author

Having the doc in 2.0 would be beneficial. Two specific areas: 1) highlight the potential performance improvements possible as outlined in cockroachdb/cockroach#19229 2) gateway connecting to the leaseholder for the reading and writing. Would additional outline in addition to the sequence diagrams help?

@jseldess
Copy link
Contributor

Closing in favor of Closing in favor of #3873.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants