-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow additional settings for hybridfs from plugin #5735
Allow additional settings for hybridfs from plugin #5735
Conversation
862b3ed
to
7b3d915
Compare
Gradle Check (Jenkins) Run Completed with:
|
failed gradle-check
found known issue for this flaky test: #5669 |
Gradle Check (Jenkins) Run Completed with:
|
@@ -518,6 +519,15 @@ public IndexService newIndexService( | |||
if (IndexService.needsMapperService(indexSettings, indexCreationContext)) { | |||
indexAnalyzers = analysisRegistry.build(indexSettings); | |||
} | |||
final List<String> hybridMMapExtensions = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martin-gaievski I am really curious why we need this additional bag of providers or/and settings vs using the IndexSettings
directly?
Fe, the idea roughly:
"settings": {
"index.store.type": "hybridfs",
"index.store.hybridfs.extensions": "...."
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reta index setting approach requires manual work for existing indexes, and the way it's implemented is override vs append. Another dimension is future additions, my view is that it's easy to allow plugins maintaining list of extensions based on a plugin's need, and that doesn't always mean a single setting of file extensions. Say list for hybridfs, it has some tradeoffs and will not work for everyone. In kNN we want to provide two different modes for user to choose from, and that list setting is only part of one mode. With this PR we can encapsulate hybridfs setting as it's internal detail of implementation and also minimize the chance of configuration error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concept of index store is generic in OpenSearch and I think it should stay generic: the extensions or any additional settings should be able to be configured for any index store (including hybridfs
for example).
If IndexSettings
are not sufficient, it probably make sense to extend the IndexStorePlugin
with additional factory methods / extensions. In this way the KNN
(or any other plugin) could contribution additional modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of implementing IndexStorePlugin
on kNN side, it makes more sense and using existing mechanism for extension. I have one doubt related to the type, mainly it's due to this check on core side. Seems we cannot override factory for existing storage type, only add new ones.
For the new type how should we register and enable it from plugin? And the second question - should we add existing core extension list from hydridfs
to our type or both storage type exist independently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one doubt related to the type, mainly it's due to this check on core side. Seems we cannot override factory for existing storage type, only add new ones.
I think this is fine, we could introduce the concept of customizer or similar to apply to built-in factories
For the new type how should we register and enable it from plugin?
Yes
should we add existing core extension list from hydridfs to our type or both storage type exist independently?
This part I not sure, you mentioned that KNN needs to add hydridfs
extensions which may not be suitable to everyone, probably we could limit to KNN only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one doubt related to the type, mainly it's due to this check on core side. Seems we cannot override factory for existing storage type, only add new ones.
I think this is fine, we could introduce the concept of customizer or similar to apply to built-in factories
If this case be provided I think it resolves the rest of concerns with a new store type. However I do have some doubts we should use mechanism of IndexStorePlugin
. At the end directory factory return one Directory object, so any type of customizer on the store type level should override for core type, and I'm not sure if we want to allow it. Something on a lower level will look similar to what is currently in PR, with idea - core will apply their logic in factories and return proper Directory (hybridfs or niofs or other), but customizer (from plugin side) just provides list of additional extensions.
For the new type how should we register and enable it from plugin?
Yes
should we add existing core extension list from hydridfs to our type or both storage type exist independently?
This part I not sure, you mentioned that KNN needs to add
hydridfs
extensions which may not be suitable to everyone, probably we could limit to KNN only
After checking code I think different types exist independently, we can have only "new" extensions in case of new type. In case of same type with customization it's up to implementation of the customization mechanism, say with current PR it's append only, no overrides
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with idea - core will apply their logic in factories and return proper Directory (hybridfs or niofs or other), but customizer (from plugin side) just provides list of additional extensions.
👍 Sounds good to me
In case of same type with customization it's up to implementation of the customization mechanism, say with current PR it's append only, no overrides
👍
Gradle Check (Jenkins) Run Completed with:
|
* Listeners can't be unregistered they will stay alive for the entire time the index is allocated on a node. | ||
* </p> | ||
*/ | ||
public void addFSTypeFileExtensions(final Map<Type, Set<String>> extensionsByType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The propagation of the FS extensions is not clear to me, who will call this method and when? I see it only being used in tests ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the plugin who needs to register additional extensions, it has IndexModule instance which is available via Plugin class and this will be part of initialization. It's similar to what plugin do today to register event listeners, e.g. k-NN has custom listener for SettingsUpdate event.
So method call in plugin will be something like:
public void onIndexModule(IndexModule indexModule) {
indexModule. addFSTypeFileExtensions(Map.of(IndexModule.Type.HYBRIDFS, Set.of("ext1", "ext2")));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, so what we could try here is to leverage IndexSettings
, something along these lines (the functionality does not exist yet):
public void onIndexModule(IndexModule indexModule) {
indexModule.updateIndexSetting (Module.INDEX_STORE_HYBRID_MMAP_EXTENSIONS, Set.of("ext1", "ext2")),
... merge function);
}
In this case you could get rid of all FS specific methods, additionlaExtensions
arguments and purely rely on existing IndexSettings
mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you idea, basically update the setting for hybridFs extensions and all the values will flow to FsFactory class without any additional changes. One concern that I have is that we can freely update any setting, I may need to test it (with this public method) but I'm not sure it's possible with non-dynamic settings. Currently that hybridFsExtensions is not dynamic.
Also such calls will trigger any registered onSettingUpdate listeners, and maybe will expose setting for API's update (this is in case we need to make it dynamic). Just want to make sure this is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One concern that I have is that we can freely update any setting, I may need to test it (with this public method) but I'm not sure it's possible with non-dynamic settings.
Correct, the hybridFsExtensions is not dynamic and probably should not be (since once FS is initialized, it could not be changed on the fly).
Also such calls will trigger any registered onSettingUpdate listeners
That could happen but I don't see update listener for hybridFsExtensions
(since this is not dynamic). I have just this general idea which we could work through but the reasoning behind is - FS code should not leak anywhere (like it does now), IndexSettings
is the only source for index configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reta I see a problem with a Settings approach. If setting is not dynamic then I can't register customer update handler (like it's done for other existing settings) to perform "append"-like update and combine existing core and custom extensions into one list. With default updater new value overwrite existing ones, which normally should be enough but not in this case. The problem is in this validation that checks if setting is dynamic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martin-gaievski I think I found exactly what we need, each Plugin
has the way to contribute additional IndexSettings
, it may need some minor tweaks (I can help with that), but I believe it should address all the concerns raised:
/**
* Get the setting upgraders provided by this plugin.
*
* @return the settings upgraders
*/
public List<SettingUpgrader<?>> getSettingUpgraders() {
return Collections.emptyList();
}
/**
* An {@link IndexSettingProvider} allows hooking in to parts of an index
* lifecycle to provide explicit default settings for newly created indices. Rather than changing
* the default values for an index-level setting, these act as though the setting has been set
* explicitly, but still allow the setting to be overridden by a template or creation request body.
*/
public Collection<IndexSettingProvider> getAdditionalIndexSettingProviders() {
return Collections.emptyList();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inspired by your finding, I tested even simpler method
/**
* Additional node settings loaded by the plugin. Note that settings that are explicit in the nodes settings can't be
* overwritten with the additional settings. These settings added if they don't exist.
*/
public Settings additionalSettings() {
return Settings.Builder.EMPTY_SETTINGS;
}
It works for our setting, just only one thing is that it overrides value in core, so I had to do read-update.
public Settings additionalSettings() {
final List<String> knnSpecificSettings = List.of("vem", "vec");
final List<String> combinedSettings = Stream.concat(
IndexModule.INDEX_STORE_HYBRID_MMAP_EXTENSIONS.getDefault(Settings.EMPTY).stream(),
knnSpecificSettings.stream()
).collect(Collectors.toList());
return Settings.builder().putList(IndexModule.INDEX_STORE_HYBRID_MMAP_EXTENSIONS.getKey(), combinedSettings).build();
}
I think that's ok as core allowing it in the first place. With this all changes are on plugin side, I'll be closing this PR. Thank you @reta for suggesting this approach, it makes more sense than initial one.
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
328a07b
to
03da782
Compare
Gradle Check (Jenkins) Run Completed with:
|
Closing PR as required functionality is achievable with existing mechanisms for extensions |
Signed-off-by: Martin Gaievski gaievski@amazon.com
Description
With this change it's possible for plugins to register additional file extensions for core store types. One use case is for HybridFS, it's needed to use MMap with additional file types. Extensions are appended to the existing list of extensions in core to avoid inconsistency and conflicts.
Plugin can register additional extensions via onIndexModule method from Plugin class, similarly to how it's done in k-NN.
Issues Resolved
#5609
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.