Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the search service index management #5515

Open
mmattel opened this issue Feb 6, 2023 · 4 comments
Open

Improve the search service index management #5515

mmattel opened this issue Feb 6, 2023 · 4 comments
Labels
Category:Enhancement Add new functionality Priority:p3-medium Normal priority

Comments

@mmattel
Copy link
Contributor

mmattel commented Feb 6, 2023

Referencing: #5503 ([docs-only] search service readme)

While documenting the search service, some points raised that need improvement:
It is more or less handling the index - from different POV's.

  1. After the discussion about a service user, we also need to update the command line, means removing
    the --user argument:
    ocis search index --space $SPACE_ID --user $USER_ID -- >
    ocis search index --space $SPACE_ID

  2. We do not have a mechanism to fully recreate an index from scratch, either because it was corrupt or not present (like missed restore from backup). Note that the current mechanism to update on change is not an option as it updates items ONLY if the path has changed like when renaming or moving a path component:
    ocis search index --purge
    ocis search index --recreate (recreate is an automatic iteration over all spaces)

  3. It is possible that content extraction was added at a later time or the extractor did not do its job. Because there is a missing gap in indexing, a manual trigger for "reindexing" the missing is necessary. The gap can be identified via an internal query and then executed:
    ocis search index --update

Item 4 is solved via #5503

  1. From the discussion, there is a state named Resource Trashed stating that the index entry will not get purged though the file got trashed. IMHO there is a difference in Resource Trashed (trashbin) vs Resource Purged (finally gone):
    a. Either implement a Resource Purged state which removes the index entry or
    b. Implement ocis search index --cleanup to remove orphaned index entries

  2. As space needed for the index is always growing, a command is necessary to return the current index size used and, if feasable, how much space is remaining on the filesystem used:
    ocis search index --size

  3. We need to have a mechanism to check if the index aligns to real files present and in addition an option to purge those index items that have no match:
    ocis search index --check -purge

@fschade @aduffeck @micbar @butonic @dragotin

@mmattel
Copy link
Contributor Author

mmattel commented Feb 7, 2023

Note: with PR #5503 (many thanks for the info @aduffeck), item number 4 is fixed.
The remaining items stay valid.

@mmattel
Copy link
Contributor Author

mmattel commented Feb 8, 2023

@butonic as discussed, should be added to sprint. Tip, I sent @fschade a URL how another site uses bleve which might help to see how they handle it.

@stale
Copy link

stale bot commented Apr 11, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 10 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status:Stale label Apr 11, 2023
@mmattel
Copy link
Contributor Author

mmattel commented Apr 11, 2023

unstale

@stale stale bot removed the Status:Stale label Apr 11, 2023
@micbar micbar added the Category:Enhancement Add new functionality label Apr 20, 2023
@micbar micbar added Priority:p3-medium Normal priority and removed p3-medium labels Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category:Enhancement Add new functionality Priority:p3-medium Normal priority
Projects
None yet
Development

No branches or pull requests

2 participants