Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow additional settings for hybridfs from plugin #5735

Conversation

martin-gaievski
Copy link
Member

@martin-gaievski martin-gaievski commented Jan 6, 2023

Signed-off-by: Martin Gaievski gaievski@amazon.com

Description

With this change it's possible for plugins to register additional file extensions for core store types. One use case is for HybridFS, it's needed to use MMap with additional file types. Extensions are appended to the existing list of extensions in core to avoid inconsistency and conflicts.

Plugin can register additional extensions via onIndexModule method from Plugin class, similarly to how it's done in k-NN.

Issues Resolved

#5609

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationIT.testPrimaryStopped_ReplicaPromoted

@martin-gaievski
Copy link
Member Author

martin-gaievski commented Jan 6, 2023

failed gradle-check

TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationIT.testPrimaryStopped_ReplicaPromoted

found known issue for this flaky test: #5669

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2023

Gradle Check (Jenkins) Run Completed with:

@martin-gaievski
Copy link
Member Author

@dbwiddis are you able to help with reviews? It's for the previously discussed issue #5609

@@ -518,6 +519,15 @@ public IndexService newIndexService(
if (IndexService.needsMapperService(indexSettings, indexCreationContext)) {
indexAnalyzers = analysisRegistry.build(indexSettings);
}
final List<String> hybridMMapExtensions = new ArrayList<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-gaievski I am really curious why we need this additional bag of providers or/and settings vs using the IndexSettings directly?

Fe, the idea roughly:

"settings": {
    "index.store.type": "hybridfs",
    "index.store.hybridfs.extensions": "...."
  }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta index setting approach requires manual work for existing indexes, and the way it's implemented is override vs append. Another dimension is future additions, my view is that it's easy to allow plugins maintaining list of extensions based on a plugin's need, and that doesn't always mean a single setting of file extensions. Say list for hybridfs, it has some tradeoffs and will not work for everyone. In kNN we want to provide two different modes for user to choose from, and that list setting is only part of one mode. With this PR we can encapsulate hybridfs setting as it's internal detail of implementation and also minimize the chance of configuration error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept of index store is generic in OpenSearch and I think it should stay generic: the extensions or any additional settings should be able to be configured for any index store (including hybridfs for example).

If IndexSettings are not sufficient, it probably make sense to extend the IndexStorePlugin with additional factory methods / extensions. In this way the KNN (or any other plugin) could contribution additional modes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of implementing IndexStorePlugin on kNN side, it makes more sense and using existing mechanism for extension. I have one doubt related to the type, mainly it's due to this check on core side. Seems we cannot override factory for existing storage type, only add new ones.

For the new type how should we register and enable it from plugin? And the second question - should we add existing core extension list from hydridfs to our type or both storage type exist independently?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one doubt related to the type, mainly it's due to this check on core side. Seems we cannot override factory for existing storage type, only add new ones.

I think this is fine, we could introduce the concept of customizer or similar to apply to built-in factories

For the new type how should we register and enable it from plugin?

Yes

should we add existing core extension list from hydridfs to our type or both storage type exist independently?

This part I not sure, you mentioned that KNN needs to add hydridfs extensions which may not be suitable to everyone, probably we could limit to KNN only

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one doubt related to the type, mainly it's due to this check on core side. Seems we cannot override factory for existing storage type, only add new ones.

I think this is fine, we could introduce the concept of customizer or similar to apply to built-in factories

If this case be provided I think it resolves the rest of concerns with a new store type. However I do have some doubts we should use mechanism of IndexStorePlugin. At the end directory factory return one Directory object, so any type of customizer on the store type level should override for core type, and I'm not sure if we want to allow it. Something on a lower level will look similar to what is currently in PR, with idea - core will apply their logic in factories and return proper Directory (hybridfs or niofs or other), but customizer (from plugin side) just provides list of additional extensions.

For the new type how should we register and enable it from plugin?

Yes

should we add existing core extension list from hydridfs to our type or both storage type exist independently?

This part I not sure, you mentioned that KNN needs to add hydridfs extensions which may not be suitable to everyone, probably we could limit to KNN only

After checking code I think different types exist independently, we can have only "new" extensions in case of new type. In case of same type with customization it's up to implementation of the customization mechanism, say with current PR it's append only, no overrides

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with idea - core will apply their logic in factories and return proper Directory (hybridfs or niofs or other), but customizer (from plugin side) just provides list of additional extensions.

👍 Sounds good to me

In case of same type with customization it's up to implementation of the customization mechanism, say with current PR it's append only, no overrides

👍

@martin-gaievski martin-gaievski requested a review from reta January 10, 2023 05:10
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

* Listeners can't be unregistered they will stay alive for the entire time the index is allocated on a node.
* </p>
*/
public void addFSTypeFileExtensions(final Map<Type, Set<String>> extensionsByType) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The propagation of the FS extensions is not clear to me, who will call this method and when? I see it only being used in tests ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the plugin who needs to register additional extensions, it has IndexModule instance which is available via Plugin class and this will be part of initialization. It's similar to what plugin do today to register event listeners, e.g. k-NN has custom listener for SettingsUpdate event.

So method call in plugin will be something like:

public void onIndexModule(IndexModule indexModule) {
        indexModule. addFSTypeFileExtensions(Map.of(IndexModule.Type.HYBRIDFS, Set.of("ext1", "ext2")));
 }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, so what we could try here is to leverage IndexSettings, something along these lines (the functionality does not exist yet):

public void onIndexModule(IndexModule indexModule) {
        indexModule.updateIndexSetting (Module.INDEX_STORE_HYBRID_MMAP_EXTENSIONS,  Set.of("ext1", "ext2")),
          ... merge function);
 }

In this case you could get rid of all FS specific methods, additionlaExtensions arguments and purely rely on existing IndexSettings mechanism.

Copy link
Member Author

@martin-gaievski martin-gaievski Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you idea, basically update the setting for hybridFs extensions and all the values will flow to FsFactory class without any additional changes. One concern that I have is that we can freely update any setting, I may need to test it (with this public method) but I'm not sure it's possible with non-dynamic settings. Currently that hybridFsExtensions is not dynamic.
Also such calls will trigger any registered onSettingUpdate listeners, and maybe will expose setting for API's update (this is in case we need to make it dynamic). Just want to make sure this is acceptable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern that I have is that we can freely update any setting, I may need to test it (with this public method) but I'm not sure it's possible with non-dynamic settings.

Correct, the hybridFsExtensions is not dynamic and probably should not be (since once FS is initialized, it could not be changed on the fly).

Also such calls will trigger any registered onSettingUpdate listeners

That could happen but I don't see update listener for hybridFsExtensions (since this is not dynamic). I have just this general idea which we could work through but the reasoning behind is - FS code should not leak anywhere (like it does now), IndexSettings is the only source for index configuration.

Copy link
Member Author

@martin-gaievski martin-gaievski Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta I see a problem with a Settings approach. If setting is not dynamic then I can't register customer update handler (like it's done for other existing settings) to perform "append"-like update and combine existing core and custom extensions into one list. With default updater new value overwrite existing ones, which normally should be enough but not in this case. The problem is in this validation that checks if setting is dynamic.

Copy link
Collaborator

@reta reta Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-gaievski I think I found exactly what we need, each Plugin has the way to contribute additional IndexSettings, it may need some minor tweaks (I can help with that), but I believe it should address all the concerns raised:

    /**
     * Get the setting upgraders provided by this plugin.
     *
     * @return the settings upgraders
     */
    public List<SettingUpgrader<?>> getSettingUpgraders() {
        return Collections.emptyList();
    }

    /**
     * An {@link IndexSettingProvider} allows hooking in to parts of an index
     * lifecycle to provide explicit default settings for newly created indices. Rather than changing
     * the default values for an index-level setting, these act as though the setting has been set
     * explicitly, but still allow the setting to be overridden by a template or creation request body.
     */
    public Collection<IndexSettingProvider> getAdditionalIndexSettingProviders() {
        return Collections.emptyList();
    }

Copy link
Member Author

@martin-gaievski martin-gaievski Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inspired by your finding, I tested even simpler method

    /**
     * Additional node settings loaded by the plugin. Note that settings that are explicit in the nodes settings can't be
     * overwritten with the additional settings. These settings added if they don't exist.
     */
    public Settings additionalSettings() {
        return Settings.Builder.EMPTY_SETTINGS;
    }

It works for our setting, just only one thing is that it overrides value in core, so I had to do read-update.

public Settings additionalSettings() {
        final List<String> knnSpecificSettings = List.of("vem", "vec");
        final List<String> combinedSettings = Stream.concat(
            IndexModule.INDEX_STORE_HYBRID_MMAP_EXTENSIONS.getDefault(Settings.EMPTY).stream(),
            knnSpecificSettings.stream()
        ).collect(Collectors.toList());
        return Settings.builder().putList(IndexModule.INDEX_STORE_HYBRID_MMAP_EXTENSIONS.getKey(), combinedSettings).build();
    }

I think that's ok as core allowing it in the first place. With this all changes are on plugin side, I'll be closing this PR. Thank you @reta for suggesting this approach, it makes more sense than initial one.

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
@martin-gaievski martin-gaievski force-pushed the allow-adding-to-hybrid-mmap-extensions-list branch from 328a07b to 03da782 Compare January 10, 2023 17:38
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@martin-gaievski
Copy link
Member Author

Closing PR as required functionality is achievable with existing mechanisms for extensions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants