Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce soft-deletes retention policy based on global checkpoint #30335

Merged
merged 4 commits into from
May 5, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented May 2, 2018

This commit introduces a soft-deletes retention merge policy based on
the global checkpoint. Some notes on this simple retention policy:

  • This policy keeps all operations whose seq# is greater than the
    persisted global checkpoint and configurable extra operations prior to
    the global checkpoint. This is good enough for querying history changes.
  • This policy is not watertight for peer-recovery. I will make it in a
    follow-up.
  • This policy is too simple to support rollback.

Relates #29530

This commit introduces a soft-deletes retention merge policy based on
the global checkpoint. Some notes on this simple retention policy:

- This policy keeps all operations whose seq# is greater than the
persisted global checkpoint and configurable extra operations prior to
the global checkpoint. This is good enough for querying history changes.

- This policy is not watertight for peer-recovery. I will make it in a
  follow-up.

- This policy is too simple to support rollback.
@dnhatn dnhatn added the :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. label May 2, 2018
@dnhatn dnhatn requested review from s1monw and bleskes May 2, 2018 13:21
@dnhatn
Copy link
Member Author

dnhatn commented May 2, 2018

/cc @jasontedor and @martijnvg

@bleskes
Copy link
Contributor

bleskes commented May 2, 2018

Thanks Nhat. I will look as soon as I can.

This policy is not watertight for peer-recovery. I will make it in a
follow-up.
This policy is too simple to support rollback.

Can you elaborate on these?

@dnhatn
Copy link
Member Author

dnhatn commented May 2, 2018

This policy is not watertight for peer-recovery. I will make it in a follow-up.

We send the safe-commit in peer-recovery, thus we need to also send all operations after the local checkpoint of that commit. This is analog to the min translog gen for recovery.

This policy is too simple to support rollback.

This policy considers multiple operations of a single document individually instead of a chain. It might not work in this case.

  1. Index doc1 then flush
  2. Index a large num of docs then flush (i.e. greater than the retained ops)
  3. Delete doc1 then flush
  4. At step 3, this retention policy may not keep the actual doc1 but only the tombstone, thus we may not be able to un-delete doc1.

@hub-cap hub-cap added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels May 3, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this LGTM but I think we should wait until https://issues.apache.org/jira/browse/LUCENE-8290 is in and we have a new snapshot. Otherwise the retention policy is broken :(

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for clarifying your first remarks. Makes sense.

@@ -2016,6 +2019,17 @@ private IndexWriterConfig getIndexWriterConfig() {
return iwc;
}

private Query softDeletesRetentionQuery() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: java docs please - something like "documents that are soft deleted and match this query should be retained and not cleaned up by merges"

engine.index(indexForDoc(doc));
liveDocs.add(doc.id());
}
for (int i = 0; i < numDocs; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question - do we also want to override documents, rather than just deleting them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question - do we also want to override documents, rather than just deleting them?

+1. I added updates.

@s1monw
Copy link
Contributor

s1monw commented May 3, 2018

@dnhatn https://issues.apache.org/jira/browse/LUCENE-8290 is in, you can pull a new snap if you want.

@s1monw
Copy link
Contributor

s1monw commented May 3, 2018

also @dnhatn you might wanna follow this #30357

@dnhatn
Copy link
Member Author

dnhatn commented May 4, 2018

CI failed because of WatchMetadataTests.
@elasticmachine test this please.

@dnhatn
Copy link
Member Author

dnhatn commented May 4, 2018

@elasticmachine test this please

@dnhatn
Copy link
Member Author

dnhatn commented May 4, 2018

Thanks @simonw and @bleskes for the review.

@dnhatn dnhatn merged commit 2c73969 into elastic:ccr May 5, 2018
@dnhatn dnhatn deleted the soft-deletes-mergepolicy branch May 5, 2018 03:19
jasontedor added a commit to martijnvg/elasticsearch that referenced this pull request May 6, 2018
* origin/ccr: (166 commits)
  Introduce soft-deletes retention policy based on global checkpoint (elastic#30335)
  Enable MockHttpTransport in ShardChangsIT
  Remove old sha files from dated Lucene snapshot
  Update InternalEngine tests on ccr side for elastic#30121
  Set the new lucene version for 6.4.0
  [ML][TEST] Clean up jobs in ModelPlotIT
  Upgrade to 7.4.0-snapshot-1ed95c097b (elastic#30357)
  Watcher: Ensure trigger service pauses execution (elastic#30363)
  [DOCS] Added coming qualifiers in changelog
  [DOCS] Commented out empty sections in the changelog to fix the doc build. (elastic#30372)
  Security: reduce garbage during index resolution (elastic#30180)
  Make RepositoriesMetaData contents unmodifiable (elastic#30361)
  Change quad tree max levels to 29. Closes elastic#21191 (elastic#29663)
  Test: use trial license in qa tests with security
  [ML] Add integration test for model plots (elastic#30359)
  SQL: Fix bug caused by empty composites (elastic#30343)
  [ML] Account for gaps in data counts after job is reopened (elastic#30294)
  InternalEngineTests.testConcurrentOutOfOrderDocsOnReplica should use two documents (elastic#30121)
  Change signature of Get Repositories Response (elastic#30333)
  Tests: Use different watch ids per test in smoke test (elastic#30331)
  ...
jasontedor added a commit to martijnvg/elasticsearch that referenced this pull request May 6, 2018
* origin/ccr: (127 commits)
  Introduce soft-deletes retention policy based on global checkpoint (elastic#30335)
  Enable MockHttpTransport in ShardChangsIT
  Remove old sha files from dated Lucene snapshot
  Update InternalEngine tests on ccr side for elastic#30121
  Set the new lucene version for 6.4.0
  [ML][TEST] Clean up jobs in ModelPlotIT
  Upgrade to 7.4.0-snapshot-1ed95c097b (elastic#30357)
  Watcher: Ensure trigger service pauses execution (elastic#30363)
  [DOCS] Added coming qualifiers in changelog
  [DOCS] Commented out empty sections in the changelog to fix the doc build. (elastic#30372)
  Security: reduce garbage during index resolution (elastic#30180)
  Make RepositoriesMetaData contents unmodifiable (elastic#30361)
  Change quad tree max levels to 29. Closes elastic#21191 (elastic#29663)
  Test: use trial license in qa tests with security
  [ML] Add integration test for model plots (elastic#30359)
  SQL: Fix bug caused by empty composites (elastic#30343)
  [ML] Account for gaps in data counts after job is reopened (elastic#30294)
  InternalEngineTests.testConcurrentOutOfOrderDocsOnReplica should use two documents (elastic#30121)
  Change signature of Get Repositories Response (elastic#30333)
  Tests: Use different watch ids per test in smoke test (elastic#30331)
  ...
jasontedor added a commit to martijnvg/elasticsearch that referenced this pull request May 6, 2018
* origin/ccr: (166 commits)
  Introduce soft-deletes retention policy based on global checkpoint (elastic#30335)
  Enable MockHttpTransport in ShardChangsIT
  Remove old sha files from dated Lucene snapshot
  Update InternalEngine tests on ccr side for elastic#30121
  Set the new lucene version for 6.4.0
  [ML][TEST] Clean up jobs in ModelPlotIT
  Upgrade to 7.4.0-snapshot-1ed95c097b (elastic#30357)
  Watcher: Ensure trigger service pauses execution (elastic#30363)
  [DOCS] Added coming qualifiers in changelog
  [DOCS] Commented out empty sections in the changelog to fix the doc build. (elastic#30372)
  Security: reduce garbage during index resolution (elastic#30180)
  Make RepositoriesMetaData contents unmodifiable (elastic#30361)
  Change quad tree max levels to 29. Closes elastic#21191 (elastic#29663)
  Test: use trial license in qa tests with security
  [ML] Add integration test for model plots (elastic#30359)
  SQL: Fix bug caused by empty composites (elastic#30343)
  [ML] Account for gaps in data counts after job is reopened (elastic#30294)
  InternalEngineTests.testConcurrentOutOfOrderDocsOnReplica should use two documents (elastic#30121)
  Change signature of Get Repositories Response (elastic#30333)
  Tests: Use different watch ids per test in smoke test (elastic#30331)
  ...
dnhatn added a commit that referenced this pull request May 10, 2018
…30335)

This commit introduces a soft-deletes retention merge policy based on
the global checkpoint. Some notes on this simple retention policy:

- This policy keeps all operations whose seq# is greater than the
persisted global checkpoint and configurable extra operations prior to
the global checkpoint. This is good enough for querying history changes.

- This policy is not watertight for peer-recovery. We send the
safe-commit in peer-recovery, thus we need to also send all operations
after the local checkpoint of that commit. This is analog to the min
translog generation for recovery.

- This policy is too simple to support rollback.

Relates #29530
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Aug 29, 2018
This PR integrates Lucene soft-deletes (LUCENE-8200) into Elasticsearch.
Highlight works in this PR include:

1. Replace hard-deletes by soft-deletes in InternalEngine
2. Use _recovery_source if _source is disabled or modified (elastic#31106)
3. Soft-deletes retention policy based on the global checkpoint (elastic#30335)
4. Read operation history from Lucene instead of translog (elastic#30120)
5. Use Lucene history in peer-recovery (elastic#30522)

These works have been done by the whole team; however, these individuals
(lexical order) have significant contribution in coding and reviewing:

Co-authored-by: Adrien Grand <jpountz@gmail.com>
Co-authored-by: Boaz Leskes <b.leskes@gmail.com>
Co-authored-by: Jason Tedor <jason@tedor.me>
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
Co-authored-by: Simon Willnauer <simonw@apache.org>
dnhatn added a commit that referenced this pull request Aug 31, 2018
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch.
Highlight works in this PR include:

- Replace hard-deletes by soft-deletes in InternalEngine
- Use _recovery_source if _source is disabled or modified (#31106)
- Soft-deletes retention policy based on the global checkpoint (#30335)
- Read operation history from Lucene instead of translog (#30120)
- Use Lucene history in peer-recovery (#30522)

Relates #30086
Closes #29530

---
These works have been done by the whole team; however, these individuals
(lexical order) have significant contribution in coding and reviewing:

Co-authored-by: Adrien Grand jpountz@gmail.com
Co-authored-by: Boaz Leskes b.leskes@gmail.com
Co-authored-by: Jason Tedor jason@tedor.me
Co-authored-by: Martijn van Groningen martijn.v.groningen@gmail.com
Co-authored-by: Nhat Nguyen nhat.nguyen@elastic.co
Co-authored-by: Simon Willnauer simonw@apache.org
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Aug 31, 2018
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch.
Highlight works in this PR include:

- Replace hard-deletes by soft-deletes in InternalEngine
- Use _recovery_source if _source is disabled or modified (elastic#31106)
- Soft-deletes retention policy based on the global checkpoint (elastic#30335)
- Read operation history from Lucene instead of translog (elastic#30120)
- Use Lucene history in peer-recovery (elastic#30522)

Relates elastic#30086
Closes elastic#29530

---
These works have been done by the whole team; however, these individuals
(lexical order) have significant contribution in coding and reviewing:

Co-authored-by: Adrien Grand <jpountz@gmail.com>
Co-authored-by: Boaz Leskes <b.leskes@gmail.com>
Co-authored-by: Jason Tedor <jason@tedor.me>
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
Co-authored-by: Simon Willnauer <simonw@apache.org>
dnhatn added a commit that referenced this pull request Aug 31, 2018
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch.
Highlight works in this PR include:

- Replace hard-deletes by soft-deletes in InternalEngine
- Use _recovery_source if _source is disabled or modified (#31106)
- Soft-deletes retention policy based on the global checkpoint (#30335)
- Read operation history from Lucene instead of translog (#30120)
- Use Lucene history in peer-recovery (#30522)

Relates #30086
Closes #29530

---
These works have been done by the whole team; however, these individuals
(lexical order) have significant contribution in coding and reviewing:

Co-authored-by: Adrien Grand <jpountz@gmail.com>
Co-authored-by: Boaz Leskes <b.leskes@gmail.com>
Co-authored-by: Jason Tedor <jason@tedor.me>
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
Co-authored-by: Simon Willnauer <simonw@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants