-
Notifications
You must be signed in to change notification settings - Fork 684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce online Copy for scorch indexes. #1605
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I'm fine with the core logic, I just think the end-user API should be more user-friendly, and it seems like that shouldn't be hard to change by rearranging things.
index_test.go
Outdated
backupIndexPath := createTmpIndexPath(t) | ||
defer cleanupTmpIndexPath(t, backupIndexPath) | ||
|
||
getWriter := func(filePath string) io.WriteCloser { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I still strongly dislike the design that requires the caller to pass this function to CopyTo().
Let me start by asking, what would a user expect the API to look like? I feel pretty confident it would be something like:
func (b *bleveIndexImplStruct) CopyTo(destinationPath string) error
The end-user doesn't expect to pass a function that creates writers inside that directory, we should do that part on their behalf.
What if all of getWriter()
here were internal to bleve. And then when the user calls CopyTo(path), we internally use this getWriter implementation? It seems like the only logic inside getWriter has to do with internal implemenetation choices made by bleve about file/dir naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mschoch, I completely agree that the current API is CB-specific and can't be used by other users due to its ugliness.
The reason I ended up here is that -
If it accepts a path, then cbft has to copy/duplicate all the disk segment files into a temp location. And then perform a stream write. Wanted to save the additional disk requirements as well as the extra stream write/copy.
With the current api, we can directly write to stream without demanding additional disk resources.
Not sure whether we can achieve that if we accept a CopyTo(destinationPath string)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so I'm going to try to make it work, because I think that getting the API right is the most important thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sreekanth-cb why can't this work? https://play.golang.org/p/vqwsUB7h487
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you meant now.
CBFT can directly call the scorch level implementation which takes a getWriter and bleve users can invoke the Index level API which takes a path.
Sounds good! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sreekanth-cb I'm still not clear what cbft does differently, such that it needs to do that? Can you link me to the right part of the cbft review? I did try to find, but did not...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, Looks like I am circling around the same problems here.
The above approach won't help as the index impl's meta data is associated with the bleve.Index implementation and my use case can't get that copied by directly calling any APIs from the index implementations like scorch which takes a getWriter() callback.
Then the only way would be to directly call index level CopyTo(path) which compromises on the additional disk space requirements. Let's sync on this in a call.
index_test.go
Outdated
|
||
getWriter := func(filePath string) io.WriteCloser { | ||
var path string | ||
if strings.Contains(filePath, "store") || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a little bit brittle, for example, what if the destination of the copy contains the word "store", then all attempts to getWriter will mistakenly use this logic. It would seem more correct to check for suffice "/store".
@mschoch , changed the signature to accept a path here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok, but my review of bleve_index_api includes comments to move one thing into this PR...
@mschoch, moved the FileSystemDirectory here. |
Once the bleve_index_api and zapx changes are merged - we'll need to set up new versions and update the go.mod here absorbing the right version of the two before committing this. |
@sreekanth-cb Rebasing your change on master branch (which now includes #1612) should allow for the checks to pass. |
Approach used is to acquire an index snapshot and reuse existing code for persisting snapshot segments and recording snapshot information into the root bolt.
index.Directory interface implementation for the destination. Add a default FileSystemDirectory implementation for the index.Directory interface.
Approach used is to acquire an index snapshot and reuse existing code for persisting snapshot segments and recording snapshot information into the root bolt. Ref: blevesearch#1605
… interface implementation for the destination. Add a default FileSystemDirectory implementation for the index.Directory interface. Ref: blevesearch#1605
Approach used is to acquire an index snapshot
and reuse existing code for persisting
snapshot segments and recording snapshot
information into the root bolt.