-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use soft-update to maintain document history #29458
Conversation
Today we can the soft-update feature from Lucene to maintain a history of document. This change simply replaces hard-update in the Engine by soft-update methods. Stale operations, delete, and no-ops will be handled in subsequent changes.
Pinging @elastic/es-distributed |
test this please. |
please test this. |
@elasticmachine test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, I left some comments
/** | ||
* Returns a numeric docvalues which can be used to soft-delete documents. | ||
*/ | ||
public static NumericDocValuesField getSoftDeleteDVMarker() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this newSoftDeletesField
?
* Returns <code>true</code> if soft-delete is enabled. | ||
*/ | ||
public boolean isSoftDeleteEnabled() { | ||
return getIndexVersionCreated().onOrAfter(Version.V_7_0_0_alpha1) && softDeleteEnabled; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe resolve this in the ctor?
if (softDeleteEnabled) { | ||
iwc.setSoftDeletesField(Lucene.SOFT_DELETE_FIELD); | ||
// TODO: soft-delete retention policy | ||
mergePolicy = new SoftDeletesRetentionMergePolicy(Lucene.SOFT_DELETE_FIELD, () -> new MatchAllDocsQuery(), mergePolicy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this here since it's match all. Lets just omit it until we use it
@@ -144,6 +145,11 @@ public boolean useCompoundFile(SegmentInfos segments, SegmentCommitInfo newSegme | |||
return delegate.useCompoundFile(segments, newSegment, writer); | |||
} | |||
|
|||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of doing this I think we should extend MergePolicyWrapper
in master and 6.x and then this is handled upstream
@@ -504,7 +507,7 @@ public void testSegmentsWithMergeFlag() throws Exception { | |||
|
|||
if (flush) { | |||
// we should have had just 1 merge, so last generation should be exact | |||
assertEquals(gen2, store.readLastCommittedSegmentsInfo().getLastGeneration()); | |||
assertEquals(gen2 + 1, store.readLastCommittedSegmentsInfo().getLastGeneration()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm why does this need to change?
@Override | ||
public long softUpdateDocument(Term term, Iterable<? extends IndexableField> doc, Field... softDeletes) throws IOException { | ||
maybeThrowFailure(); | ||
return super.softUpdateDocument(term, doc, softDeletes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
public LeafReader wrap(LeafReader in) { | ||
return new FilterLeafReader(in) { | ||
@Override | ||
public CacheHelper getCoreCacheHelper() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can return in.getCoreCacheHelper();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just move this entire thing to a util method in Lucene.java then we can reuse it in other places WDYT? @martijnvg you need this too somewhere right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the test and this wrapper in this PR, and will add it back to Lucene.java later. I prefer to keep this change as a cut-over.
@@ -1070,10 +1074,18 @@ private boolean assertDocDoesNotExist(final Index index, final boolean allowDele | |||
} | |||
|
|||
private void updateDocs(final Term uid, final List<ParseContext.Document> docs, final IndexWriter indexWriter) throws IOException { | |||
if (docs.size() > 1) { | |||
indexWriter.updateDocuments(uid, docs); | |||
if (softDeleteEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the delete case we should also use indexWriter.updateDocValue(uid, Lucene.getSoftDeleteDVMarker())
; instead of IndexWriter#deleteDocuments
and then override this in the test to maybe throw an exception.
@s1monw I have addressed your comments. Can you please take a look? Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -1073,9 +1070,9 @@ private boolean assertDocDoesNotExist(final Index index, final boolean allowDele | |||
private void updateDocs(final Term uid, final List<ParseContext.Document> docs, final IndexWriter indexWriter) throws IOException { | |||
if (softDeleteEnabled) { | |||
if (docs.size() > 1) { | |||
indexWriter.softUpdateDocuments(uid, docs, Lucene.getSoftDeleteDVMarker()); | |||
indexWriter.softUpdateDocuments(uid, docs, Lucene.newSoftDeleteField()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can cache this in here in a constant so we don't need to re-create a new one every time.
Thanks @s1monw for reviewing. |
Since elastic#29458, we use a searcher to calculate the number of documents for a commit stats. Sadly, that approach is flawed. The searcher might no longer point to the last commit if it's refreshed. This commit uses SoftDeletesDirectoryReaderWrapper to exclude the soft-deleted documents from numDocs in a SegmentInfos. I chose to modify the method Luence#getNumDocs so that we can read a store metadata snapshot correctly without opening an engine. Relates elastic#29458
Today we can use the soft-update feature from Lucene to maintain a history of document. This change simply replaces hard-update in the Engine by soft-update methods. Stale operations, delete, and no-ops will be handled in subsequent changes. This change is just a cut-over from hard-update to soft-update, no new functionality has been introduced. Relates #29530
Since #29458, we use a searcher to calculate the number of documents for a commit stats. Sadly, that approach is flawed. The searcher might no longer point to the last commit if it's refreshed. As synced-flush requires an exact numDocs to work correctly, we have to exclude all soft-deleted docs. This commit makes synced-flush stop using CommitStats but read an exact numDocs directly from an index commit. Relates #29458 Relates #29530
Since #29458, we use a searcher to calculate the number of documents for a commit stats. Sadly, that approach is flawed. The searcher might no longer point to the last commit if it's refreshed. As synced-flush requires an exact numDocs to work correctly, we have to exclude all soft-deleted docs. This commit makes synced-flush stop using CommitStats but read an exact numDocs directly from an index commit. Relates #29458 Relates #29530
Today we can use the soft-update feature from Lucene to maintain a history of document. This change simply replaces hard-update in the Engine by soft-update methods. Stale operations, delete, and no-ops will be handled in subsequent changes. This change is just a cut-over from hard-update to soft-update, no new functionality has been introduced.