Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metadata Immutability] Change different indices lookup objects from array type to lists #8647

Open
sandeshkr419 opened this issue Jul 11, 2023 · 12 comments · May be fixed by #14723
Open

[Metadata Immutability] Change different indices lookup objects from array type to lists #8647

sandeshkr419 opened this issue Jul 11, 2023 · 12 comments · May be fixed by #14723
Assignees
Labels
Cluster Manager enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers

Comments

@sandeshkr419
Copy link
Contributor

sandeshkr419 commented Jul 11, 2023

Is your feature request related to a problem? Please describe.
Metadata object as part of ClusterState is supposed to be immutable by design. However, certain objects within Metadata do not reflect this design.

The following arrays should be replaced with lists and should implement Collections.unmodifiableList() or any other immutable implementations. Assigning them as arrays leave them open for mutability.
https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/cluster/metadata/Metadata.java#L268

 String[] allIndices,
 String[] visibleIndices,
 String[] allOpenIndices,
 String[] visibleOpenIndices,
 String[] allClosedIndices,
 String[] visibleClosedIndices,

Also, the usage of these arrays is prominently in IndexNameExpressionResolver where before these arrays are utilized, they are converted to lists before they are consumed.
It is thus unreasonable to keep these objects as arrays as opposed to lists.

Describe the solution you'd like
Store the above objects as lists, and implement them as Collections.unmodifiableList() in the Metadata constructor.

Describe alternatives you've considered
Briefly discussed this change here, but this change was out of scope of #7853

Additional context
this change will require changes in how the metadata or its diff is parsed as XContent as well.

@sandeshkr419 sandeshkr419 added enhancement Enhancement or improvement to existing feature or request untriaged labels Jul 11, 2023
@sandeshkr419 sandeshkr419 changed the title [Metadata Immutability] [Metadata Immutability] Change different indices lookup objects from array type to lists Jul 11, 2023
@andrross
Copy link
Member

What do you think about this comment:

// TODO: I think we can remove these arrays. it isn't worth the effort, for operations on all indices.
// When doing an operation across all indices, most of the time is spent on actually going to all shards and
// do the required operations, the bottleneck isn't resolving expressions into concrete indices.

While I completely agree with you that immutable lists are better than mutable arrays, my first thought when digging into this code was to remove the arrays all together, then I found the above comment. If I understand correctly, then the existing getters would change to compute the results on demand. For example, getConcreteVisibleIndices() would change to:

/**
 * Returns all the concrete indices that are not hidden.
 */
public List<String> getConcreteVisibleIndices() {
    return indices.entrySet().stream()
        .filter(e -> IndexMetadata.INDEX_HIDDEN_SETTING.get(e.getValue().getSettings()) == false)
        .map(Map.Entry::getKey)
        .collect(Collectors.toUnmodifiableList());
}

(This also changes the return type from String[] to List, which isn't strictly necessary since mutability is no longer a concern given that a new array would be computed each invocation. However, if this is always being used as a list then it should probably just return a list.)

Pre-computing the filtered results strike me as an unnecessary optimization and I would suggest doing the simpler thing unless we know for sure there are performance benefits to pre-computing.

@sandeshkr419
Copy link
Contributor Author

@andrross Thanks for taking a look. I had visited the comment you mentioned in my previous PR #7853 which touched the same code base.

I don't support the idea of computing different indices lookups (arrays in concern) on demand is that these different different arrays are used for expression resolving which is invoked in each almost all search and index data modification related calls. Calculating these on demand will surely increase the latency of all search and indexing related operations, which we should avoid.
Computing these arrays with each cluster state update does not even takes time of the order of miliseconds. Plus trading off some memory/time in ClusterState update activity seems a better choice than to put it on indexing/search tasks where latency is required at minimal.

Also, I would like to point out that except the getConcreteAllIndices(), all the other arrays (5 of them) are only invoked at this resolveEmptyOrTrivialWildcard() in the entire code base, where they are used as lists directly.

@andrross
Copy link
Member

Thanks @sandeshkr419! It makes total sense if these data structures are used in hot path indexing and search code then precomputing can be a good tradeoff. However, these TODO comments should be resolved or removed to avoid future confusion. The TODO comment referenced in #7853 actually does make a lot of sense (if I understand it correctly) as it is suggests to move the pre-computed structures into IndexNameExpressionResolver. That would likely make the code and intent more clear as the structures would more clearly be in hot path code (index name resolution). I haven't looked into what it would take to do that refactoring. Either way, I suggest addressing or removing the TODOs wherever possible.

@sandeshkr419
Copy link
Contributor Author

@andrross Yes, we should remove these comments altogether. We did not removed it previously just to be sure that we have not misunderstood the correct context of these comments. Plus, there is also discussion which seems to be initiated in future by @shwetathareja on remote cluster state as in #7002 (comment)

For this issue, I would like to propose a minor enhancement only in terms of Metadata immutability.

@andrross
Copy link
Member

@sandeshkr419 I'm all for incremental improvements. However, I do think you need to do something with the comment that says "TODO: I think we can remove these arrays..." as a part of this change since you're actually working with the arrays in question here.

@sandeshkr419
Copy link
Contributor Author

@andrross Seems reasonable to remove the comment since we have valid reasons (potential performance regressions) of not implementing this.

@shwetathareja What are your thoughts on this (removing the TODOs - since we have reason of not implementing them).

@shwetathareja
Copy link
Member

Like the comment said, these arrays are used during index resolution. We can benchmark to see how many cpu cycles are spent to generate these arrays during ingestion and search traffic. Personally I think even if there is small performance gain, we should keep them. @sandeshkr419 @andrross

@andrross
Copy link
Member

@sandeshkr419 @shwetathareja Just to summarize our discussion, I think we're in agreement we should keep the pre-computation. So that means we should remove the TODO comment. I'm also on board with changing the arrays to immutable List instances. And finally, I think we should probably add additional versions of the getters (we need to keep the existing signatures for compatibility) that return List (or Collection) types because a quick glance around the code base shows that many times these arrays are just immediately converted to lists.

I'm also going to tag this with the "good first issue" as this looks like it should be a fairly straightforward refactoring. Let me know if you think this is wrong or have anything else to add. Thanks!

@andrross andrross added the good first issue Good for newcomers label May 24, 2024
@sandeshkr419
Copy link
Contributor Author

@andrross Yes, this now makes a nice good first issue.

Just to summarize our discussion, I think we're in agreement we should keep the pre-computation. So that means we should remove the TODO comment.

Yes. We should keep these pre-computation as of now. Since we are deciding against the TODO, I think yes, we should remove the TODO to conclude this discussion for future.

a quick glance around the code base shows that many times these arrays are just immediately converted to lists

Yes, as I mentioned in the original issue description, they are always used as lists so no point in storing them as arrays. There might be some changes on how the metadata diff is computed or how different nodes send/interpret the changed data structures but should be minimal.

@akolarkunnu
Copy link
Contributor

@sandeshkr419 @andrross
Can you please assign this bug to me, if no one is working on this?

@akolarkunnu
Copy link
Contributor

@sandeshkr419 @andrross
Why do we need to keep two versions of all get APIs(Eg: getConcreteVisibleIndices()) , one returns array of Strings and new one returns List ? We can change all call places within the OpenSearch code base along with this change. Are you expecting that these APIs might have used from out of OpenSerach code base?

If yes, shall we deprecate old API(which returns String []) along with this change and introduce new API.
A note, the new API name will be different from old one, since we can't have overridden APIs which have difference only in return types.

@andrross
Copy link
Member

Are you expecting that these APIs might have used from out of OpenSerach code base?

@akolarkunnu Yes. This class is exposed to all OpenSearch plugins.

akolarkunnu added a commit to akolarkunnu/OpenSearch that referenced this issue Jun 26, 2024
…array type to lists

Changed the arrays to immutable List instances, added new versions of the getters which returns List instances.

Resolves opensearch-project#8647

Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
akolarkunnu added a commit to akolarkunnu/OpenSearch that referenced this issue Jun 26, 2024
…array type to lists

Changed the arrays to immutable List instances, added new versions of the getters which returns List instances.

Resolves opensearch-project#8647

Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
akolarkunnu added a commit to akolarkunnu/OpenSearch that referenced this issue Jun 27, 2024
…array type to lists

Changed the arrays to immutable List instances, added new versions of the getters which returns List instances.

Resolves opensearch-project#8647

Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
akolarkunnu added a commit to akolarkunnu/OpenSearch that referenced this issue Jun 27, 2024
…array type to lists

Changed the arrays to immutable List instances, added new versions of the getters which returns List instances.

Resolves opensearch-project#8647

Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
akolarkunnu added a commit to akolarkunnu/OpenSearch that referenced this issue Jun 27, 2024
…array type to lists

Changed the arrays to immutable List instances, added new versions of the getters which returns List instances.

Resolves opensearch-project#8647

Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
akolarkunnu added a commit to akolarkunnu/OpenSearch that referenced this issue Jun 27, 2024
…array type to lists

Changed the arrays to immutable List instances, added new versions of the getters which returns List instances.

Resolves opensearch-project#8647

Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
akolarkunnu added a commit to akolarkunnu/OpenSearch that referenced this issue Jul 9, 2024
…array type to lists

Changed the arrays to immutable List instances, added new versions of the getters which returns List instances.

Resolves opensearch-project#8647

Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
akolarkunnu added a commit to akolarkunnu/OpenSearch that referenced this issue Jul 9, 2024
…array type to lists

Changed the arrays to immutable List instances, added new versions of the getters which returns List instances.

Resolves opensearch-project#8647

Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
akolarkunnu added a commit to akolarkunnu/OpenSearch that referenced this issue Jul 11, 2024
…array type to lists

Changed the arrays to immutable List instances, added new versions of the getters which returns List instances.

Resolves opensearch-project#8647

Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
akolarkunnu added a commit to akolarkunnu/OpenSearch that referenced this issue Jul 11, 2024
…array type to lists

Changed the arrays to immutable List instances, added new versions of the getters which returns List instances.

Resolves opensearch-project#8647

Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
akolarkunnu added a commit to akolarkunnu/OpenSearch that referenced this issue Jul 11, 2024
…array type to lists

Changed the arrays to immutable List instances, added new versions of the getters which returns List instances.

Resolves opensearch-project#8647

Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
akolarkunnu added a commit to akolarkunnu/OpenSearch that referenced this issue Aug 6, 2024
…array type to lists

Changed the arrays to immutable List instances, added new versions of the getters which returns List instances.

Resolves opensearch-project#8647

Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
akolarkunnu added a commit to akolarkunnu/OpenSearch that referenced this issue Aug 6, 2024
…array type to lists

Changed the arrays to immutable List instances, added new versions of the getters which returns List instances.

Resolves opensearch-project#8647

Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
akolarkunnu added a commit to akolarkunnu/OpenSearch that referenced this issue Sep 10, 2024
…array type to lists

Changed the arrays to immutable List instances, added new versions of the getters which returns List instances.

Resolves opensearch-project#8647

Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
akolarkunnu added a commit to akolarkunnu/OpenSearch that referenced this issue Sep 10, 2024
…array type to lists

Changed the arrays to immutable List instances, added new versions of the getters which returns List instances.

Resolves opensearch-project#8647

Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cluster Manager enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers
Projects
Status: 🆕 New
6 participants