Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Seq No powered optimistic concurrency control #37284

Merged
merged 5 commits into from
Jan 11, 2019

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Jan 10, 2019

Add documentation to describe the new sequence number powered optimistic concurrency control

Relates #36148
Relates #10708

@bleskes bleskes added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v7.0.0 v6.6.0 labels Jan 10, 2019
@bleskes bleskes requested a review from zuketo January 10, 2019 01:16
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link

@zuketo zuketo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few minor suggestions

=== Optimistic Concurrency Control

Each index and delete bulk item can include the `if_seq_no` and `if_primary_term`
parameters in their respective action and meta data lines. These parameters
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each index and delete action within a bulk API call may include the if_seq_no and if_primary_term parameters in their respective action and meta data lines.

Copy link
Contributor Author

@bleskes bleskes Jan 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx. adapted

Each index and delete bulk item can include the `if_seq_no` and `if_primary_term`
parameters in their respective action and meta data lines. These parameters
allow controlling how these operations will be performed based on the last
modification to existing documents. See <<optimistic-concurrency-control>> for more details.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if_seq_no and if_primary_term parameters control how operations are executed, based on the most recently modified version of an existing document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopted the first part. I left out the "modified version" suggestion because it's not about the document version but rather the seq# and primary term of the modification.

allow controlling how these operations will be performed based on the last
modification to existing documents. See <<optimistic-concurrency-control>> for more details.


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include a very short example for passing the if_seq_no and if_primary_term parameters to the bulk API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question. we don't do it for other parameters, like routing etc. I think we should consistent and adding it to all may be very verbose.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, good point

Elasticsearch needs a way of ensuring that an older version of a document never
overwrites a newer version.

To ensure this, every operation performed to a document is assigned a sequence number
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure an older version of a document doesn't overwrite a newer version, every operation performed is assigned a sequence number ...

To ensure this, every operation performed to a document is assigned a sequence number
by the primary shard that coordinates that change. The sequence number is increased
with each operation and thus newer operations are guaranteed to have a higher sequence
number. Elasticsearch can then use the sequence number of operations to make sure they
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

than older operations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

never override a newer document version is never overridden by a change that has a
smaller sequence number assigned to it.

For example, the following indexing command will create a document and assign it its
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it its = an



Elasticsearch keeps tracks of the sequence number and primary of the last
operation to have changed each of the document it stores. These are returned
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elasticsearch keeps tracks of the sequence number and primary term for every document it stores, this value will change based on the last operation to modify the document. The _seq_no and _primary_term fields are returned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will adopt the second part of the suggestion, but I feel the first part is not accurate - there's no sequence number of a document but rather a seq of the last operation to have changed the doc.

for each search hit by requesting the `_seq_no` and `_primary_term` <<search-request-docvalue-fields,Doc Value Fields>>.

The sequence number and the primary term uniquely identify a change. By noting down
the sequence number and primary term return, you can make sure to only change the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

is done by setting the `if_seq_no` and `if_primary_term` parameters of either the
<<docs-index_,Index API>> or the <<docs-delete,Delete API>>.

For example, the following indexing call, will make sure to add a tag to the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no comma needed:

For example, the following indexing call will make sure to add a tag to the

@bleskes bleskes requested a review from zuketo January 11, 2019 15:09
@bleskes
Copy link
Contributor Author

bleskes commented Jan 11, 2019

Thanks @zuketo . I addressed your comments. Can you please take another look?

Copy link

@zuketo zuketo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

allow controlling how these operations will be performed based on the last
modification to existing documents. See <<optimistic-concurrency-control>> for more details.


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, good point

operation and thus newer operations are guaranteed to have a higher sequence
number than older operations. Elasticsearch can then use the sequence number of
operations to make sure they never override a newer document version is never
overridden by a change that has a smaller sequence number assigned to it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something is wrong with this sentence.


The above will succeed since the the supplied version of 2 is higher than
the current document version of 1. If the document was already updated
and it's version was set to 2 or higher, the indexing command will fail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its

WARNING: External versioning supports the value 0 as a valid version number.
This allows the version to be in sync with an external versioning system
where version numbers start from zero instead of one. It has the side effect
that documents with version number equal to zero cannot neither be updated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can neither

database, as long as version numbers from the source database are used.
Even the simple case of updating the Elasticsearch index using data from
a database is simplified if external versioning is used, as only the
latest version will be used if the index operations are out of order for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrive out of order

the different version types and their semantics.

`internal`:: only index the document if the given version is identical to the version
of the stored document.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we point to optimistic concurrency control here and make it clear that using if_seq_no is the preferred method for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to only do this once internal versioning is deprecated. I can add a note if you prefer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to only do this once internal versioning is deprecated

ok, if the deprecation is being added to 6.6, I'm fine with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we'll make the 6.6 time frame as we need to first remove all usages of it (only reindex is left)

@bleskes bleskes added the v6.7.0 label Jan 11, 2019
@bleskes bleskes merged commit cae71cd into elastic:master Jan 11, 2019
@bleskes
Copy link
Contributor Author

bleskes commented Jan 11, 2019

@ywelsch sorry, I didn't see your comments on the page I've hit the merge button on. I'll address them in a follow up commit.

@bleskes bleskes deleted the cas_doc branch January 11, 2019 16:08
bleskes added a commit that referenced this pull request Jan 11, 2019
bleskes added a commit that referenced this pull request Jan 11, 2019
Add documentation to describe the new sequence number powered optimistic concurrency control

Relates #36148
Relates #10708
bleskes added a commit that referenced this pull request Jan 11, 2019
bleskes added a commit that referenced this pull request Jan 11, 2019
Add documentation to describe the new sequence number powered optimistic concurrency control

Relates #36148
Relates #10708
bleskes added a commit that referenced this pull request Jan 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement v6.6.0 v6.7.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants