Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] New Labels for OpenSearch Core Repo #10566

Closed
macohen opened this issue Oct 11, 2023 · 21 comments
Closed

[RFC] New Labels for OpenSearch Core Repo #10566

macohen opened this issue Oct 11, 2023 · 21 comments
Labels
feedback needed Issue or PR needs feedback RFC Issues requesting major changes

Comments

@macohen
Copy link
Contributor

macohen commented Oct 11, 2023

OpenSearch Core (https://github.com/opensearch-project/OpenSearch) is a large repository with many components. We want to make it more efficient and helpful to identify issues belonging to specific components, so we’re proposing some new labels to get more specific. This RFC is a place for people to comment on these labels. Are they effective? Do they mean something useful to the community? Are there missing lables? Too many labels?

What Do We Do With These Labels Anyway?

Across all repositories in opensearch-project, we have a process called “triage.” The goal of this process is to make sure repository owners and maintainers check new issues to understand them. First - we learn if the issue is security related (e.g. CVE), second - we understand if we have enough information to act on the issue, third - we re-label or respond on the issue. During triage we do not work on these issues, but instead just try to get the issue to a next step. Because the OpenSearch core repository (https://github.com/opensearch-project/OpenSearch) has so many different functions with different sets of contributors working on these functions, repo owners decided to treat these functions like separate repositories when triaging to get through the backlog faster and have more focus. There are a few maintainers that do triage in public (see https://meetup.com/OpenSearch to join) and there will be more coming.

Here is an example of the workflow for a new issue in the core repo.

  1. Every issue is created with an untriaged label by default.
  2. No less than once per week, untriaged issues should be reviewed:
    1. Read through the issue to understand the request.
      1. If the issue does not meet the template requirements, ask the requester for missing details and keep the untriaged label until all the details have been disclosed .
      2. Move to 3 if you have all the details on the associated issue.
  3. If the issue does not impact the OpenSearch core repository, then transfer the issue to the appropriate team using the Transfer issue button on the lower right hand corner. If you don’t have the access to transfer, then you can add comment by mentioning @opensearch-project/admin to do the transfer for you. If not, move to step 4.
  4. Assign the labels as per the details of the ticket
    1. Overall request type - bug. enhancement, feature.
    2. Additional context like good first issue, wontfix, question, discuss, rfc ...etc. (General guidelines for github triaging)
    3. Make sure to enrich the issue with more details by adding your thoughts, add more context to the issue, or ask the author for more clarification if needed ...etc.
    4. Remove the untriaged label
    5. Add the component level labels based on TABLE 1
  5. At any point a maintainer or issue owner can add or remove a label which will help clarify the work that needs to be done. This RFC will also become a markdown file in the OpenSearch repo and we will add a link to the issue templates in the repo so issue creators can reference this process to help it go smoother.

TABLE 1

Component/Area Sub-Area Github Label Desc
Search Resiliency\Scale Search:Resiliency keep search working in spite of failures that may occur
Search Query Performance Search:Performance maintain and improve read query performance
Search Query Capabilities Search:Query Capabilities add new capabilities for querying
Search Query Insights Search:Query Insights capabilities to understand what's happening under the covers in a query
Search Aggregations Search:Aggregations aggregations/facets
Search Remote Search Search:Remote Search using remote storage for search applications
Search Search Relevance Search:Relevance query tuning to improve results (tools, query language, features)
Search Searchable Snapshots Search:Searchable Snapshots
Search Others\Unknowns Search catch-all for unclear issues (could break these down, assign multiple labels, add other labels)
Indexing Replication Indexing:Replication moving data throughout the cluster at index time
Indexing Performance\Throughput Indexing:Performance things that make indexing perform better
Indexing Others\Unknowns Indexing catch-all for unclear issues (could break these down, assign multiple labels, add other labels)
Storage Storage:Snapshots Storage:Snapshots
Storage Storage:Performance Storage:Performance
Storage Storage:Durability Storage:Durability
Storage Remote Storage Storage:Remote
Storage Others\Unknowns Storage catch-all for unclear issues (could break these down, assign multiple labels, add other labels)
Cluster Manager Cluster Manager Cluster Manager
Extensions Extensions Extensions
Release & Build Build and Libraries Build Libraries & Interfaces Make sure the build tasks are useful and own the whole gradle plugin so that packaging and distribution are easy.
Release & Build Upgrades For example Lucence Upgrades
Release & Build Core Plugins Plugins Plugins within Plugins like: language analyzers or within modules
Release & Build Plugins Framework Plugins Solves the plugin infrastructure with core.

Questions for the Community

  • Do these labels make sense?
  • Is this clarifying or making the process more complicated?
  • Any suggestions for clarification of the process?
  • Does adding a reference to a permanent document of this RFC in issue templates make sense? How else can we make sure this process is clear and understood?

I plan on keeping this RFC open until 2023-11-11.

@macohen macohen added enhancement Enhancement or improvement to existing feature or request untriaged feedback needed Issue or PR needs feedback RFC Issues requesting major changes and removed enhancement Enhancement or improvement to existing feature or request untriaged labels Oct 11, 2023
@CEHENKLE
Copy link
Member

Thanks for writing this up, @macohen!

A couple of thoughts....

1/ For "No less than once per two weeks, untriaged issues should be reviewed" I think the cadence needs to be once a week, not twice a week? That's what we've been doing for the last two years.

The reason we don't want to wait two weeks is that, even through we ask folks to report potential security issues AWS/Amazon Security via our vulnerability reporting page or directly via email to aws-security@amazon.com, we still have had multiple instances of reports getting posted to github. We really don't want to wait 2 weeks with an issue out there without someone taking action.

2/ Many teams (shout out Security!) have moved to a public triaging mechanism. What's the plan to move Core to a more open triaging model? (@krisfreedain poke)

3/ You say "This RFC will also become a markdown file in the OpenSearch repo and we will add a link to the issue templates in the repo so issue creators can reference this process to help it go smoother." Can we also add something to .github that describes all the project labels, as well as something that describes the repo specific labels? That way you can just "include" the project label file for the broader labels.

@macohen
Copy link
Contributor Author

macohen commented Oct 12, 2023

Thanks for writing this up, @macohen!

A couple of thoughts....

1/ For "No less than once per two weeks, untriaged issues should be reviewed" I think the cadence needs to be once a week, not twice a week? That's what we've been doing for the last two years.
-- that's what we've been doing since I started on the project; if that's the gold standard, I'm changing it...

The reason we don't want to wait two weeks is that, even through we ask folks to report potential security issues AWS/Amazon Security via our vulnerability reporting page or directly via email to aws-security@amazon.com, we still have had multiple instances of reports getting posted to github. We really don't want to wait 2 weeks with an issue out there without someone taking action.

100%

2/ Many teams (shout out Security!) have moved to a public triaging mechanism. What's the plan to move Core to a more open triaging model? (@krisfreedain poke)

right now, we have part of core in this model. The Search Relevance Backlog & Triage is public. We review this board which has issues labeled "Search" https://github.com/orgs/opensearch-project/projects/45. Turns out to be anywhere from 0-10 issues per week. Core overall has many more right now. In the last public meeting, we realized that we talk more about search/core/search pipelines issues than actual relevance topics. I was thinking about broadening the scope of that meeting to talk core search overall, but would like feedback. There are many more untriaged issues in core than we can handle in one meeting, which is why splitting these labels was a good idea (this is really an open version of an idea/document @anasalkouz wrote up that we discussed with @yigithub (i get this login now after looking at it a bunch - funny), @kkhatua and others). We first discussed this, then put out the labels, and then want the feedback so we can refine. Chipping away at what's there in a public meeting would be good, but we also want to get the community involved in big picture discussions in those meetings.

Happy to discuss this in public next Wednesday at 9AM PT: https://www.meetup.com/opensearch/events/295393206/.

3/ You say "This RFC will also become a markdown file in the OpenSearch repo and we will add a link to the issue templates in the repo so issue creators can reference this process to help it go smoother." Can we also add something to .github that describes all the project labels, as well as something that describes the repo specific labels? That way you can just "include" the project label file for the broader labels.

Yes! Sounds great.

@krisfreedain
Copy link
Member

2/ Many teams (shout out Security!) have moved to a public triaging mechanism. What's the plan to move Core to a more open triaging model? (@krisfreedain poke)

right now, we have part of core in this model. The Search Relevance Backlog & Triage is public. We review this board which has issues labeled "Search" https://github.com/orgs/opensearch-project/projects/45. Turns out to be anywhere from 0-10 issues per week. Core overall has many more right now. In the last public meeting, we realized that we talk more about search/core/search pipelines issues than actual relevance topics. I was thinking about broadening the scope of that meeting to talk core search overall, but would like feedback. There are many more untriaged issues in core than we can handle in one meeting, which is why splitting these labels was a good idea (this is really an open version of an idea/document @anasalkouz wrote up that we discussed with @yigithub (i get this login now after looking at it a bunch - funny), @kkhatua and others). We first discussed this, then put out the labels, and then want the feedback so we can refine. Chipping away at what's there in a public meeting would be good, but we also want to get the community involved in big picture discussions in those meetings.

Happy to discuss this in public next Wednesday at 9AM PT: https://www.meetup.com/opensearch/events/295393206/.

Yes! thanks @CEHENKLE & @macohen - so far we have public meetings for 'Security' (shout-out to @scrawfor99, @peternied, & @davidlago for being the pioneers), 'Search Relevance' from @macohen & team, 'Dashboards' from @joshuarrrr & team, as well as 'ml-commons' from @ylwu-amzn & team. We'll continue to chip away at each of the dev teams and make each of them public - sooner rather than later.

@Pallavi-AWS
Copy link
Member

Thanks @macohen. In order to truly move to a component model and reduce the high load on core maintainers/core oncalls to triage issues, can we get issue originators to attach a Component/Area label when the issue is opened? Someone opening an issue has a decent idea on the area the issue is being opened for. For the issues where ownership is unclear, originators can leave it 'untriaged'. For the issues that start with an area/component, the first component owner can move the issue downstream to another component if the issue is found to belong to another component after initial analysis. This is a better model to scale our core operations.

@macohen
Copy link
Contributor Author

macohen commented Oct 12, 2023

The main usage of the untriage label here is to mark everything as "new" so we don't miss potential security issues. The rest of it could be handled other ways. I think we could do two things to simplify the process.
1 - as you suggest @Pallavi-AWS: ask that contributors label the issue to route it to the right component area (we still may not do this, may mis-label), but at least untriaged issues end up getting divided up more cleanly
2 - lighten the initial triage process if needed to validate security questions. the workflow in that case can be more like "does this look like a security issue?" -> no -> remove untriage label, optionally add a "triaged" label (stolen from the security team) -> move on; when an issue is "triaged" it means there's more to look into but this is not security related. if it is a security issue, alarms go off, and we follow the process to fix security problems.

@peternied
Copy link
Member

Great issue @macohen, I'll have to join one of your session to see what I can learn from from how you get things done.

A couple of thoughts on acknowledgement vs accepting an issue.

From the security triage we've got a multiphase process that works well for us, issues are always labeled untriaged automatically on creation/reopen. Issues are marked 'triaged' when they are deemed actionable by the team only during the triage meeting.

We have a section on labeling that calls out this lifecycle https://github.com/opensearch-project/security/blob/main/TRIAGING.md#what-are-the-issue-labels-associated-with-triaging

@stephen-crawford
Copy link
Contributor

HI @macohen, this seems like a great change!

As Peter mentioned, Security has a document we follow each time which can be helpful for keeping a consistent process. It also helps keep a consistent order of the meeting and avoid things running over.

For the labels you proposed, I think the majority of them seem good. I was wondering at what scale these meetings were expected to take place. I am not sure how easy it will be for people who are not subject matter experts to identify the difference between an issue which should be assigned 'Area:Relevance' vs. 'Area:Capabilities' for example.

I know that there are some ares where an expert can clarify this distinction, but I wanted to check whether you felt this would be better served with broader categories or think it would become too confusing? Either way is fine, I just wanted to raise this point.

Overall, this is a really great idea.

@macohen
Copy link
Contributor Author

macohen commented Oct 13, 2023

@scrawfor99 that's why we created the RFC! we need to balance the need to create a meeting for every label with focus and concentration. The search relevance meeting today is typically 10-15 minutes of triage and then deep dive into other topics with time for anyone to raise things they want to discuss. Expanding that meeting to all of core is way too much to cover anything but triage, IMO. Thanks for sharing the process doc @peternied!

@reta
Copy link
Collaborator

reta commented Oct 13, 2023

Thanks @macohen. In order to truly move to a component model and reduce the high load on core maintainers/core oncalls to triage issues, can we get issue originators to attach a Component/Area label when the issue is opened?

@Pallavi-AWS AFAIK sadly adding label requires commit rights on the repository (at least for now), so external contributors (as originators) won't be able to add those

@stephen-crawford
Copy link
Contributor

@macohen, thanks for the quick reply. Again, I think this is great.

I guess it depends on who will be assigning the labels as well as how it will be used. I think the broader categories are quite good as-is but I am hesitant to add too many new labels because there are already so many. Without removing some existing ones I would probably vote for using the broader labels.

Will we be able to remove the old labels we no longer use?

I see for example we have Search as a label but maybe we can remove that one when we add the new etc.?

If we can remove a lot of the old ones then I think that the new labels look good.

@macohen
Copy link
Contributor Author

macohen commented Nov 13, 2023

@scrawfor99 I think maintenance of those labels belong to the repo maintainers themselves. Assigning labels could be done by anyone. Should making sure the issues/labels are assigned correctly belong to maintainers or others who can update labels? (@opensearch-project/triage members)? I think so. What do you think?

BTW, I also think there should be an ability to change labels. We could generalize them or make them more specific as needed. Looking at this as a starting point. Also, I'll include a reference to the security triage process as an example of how to do it. I think repo/label owners should be able to find their own path, but I like using the first triage process as a good example.

@macohen
Copy link
Contributor Author

macohen commented Nov 13, 2023

Oh. Also, to answer @CEHENKLE's question about moving core to a more public process, we do triage core issues with certain labels in the Search Relevance meeting because they automatically appear on the board we review: https://github.com/orgs/opensearch-project/projects/45

Because Core is so large, I think there's use in having a few of these meetings.

cc: @getsaurabh02 , @anasalkouz , @yigithub , @kkhatua , @Pallavi-AWS

@stephen-crawford
Copy link
Contributor

Hi @macohen, thanks again for keeping this moving forward.

I don't think GitHub will let people other than maintainers modify labels or assign/unassign them. As long as we have someway of keeping the number of labels used and the number of labels that exist in the project as a whole (the maintainers can do this like you said), I think we will be in good shape.

I also would be incredibly in favor of more parts of the project adopting the Security repo's triaging model (or the similar version that you and Search do).

@macohen
Copy link
Contributor Author

macohen commented Dec 5, 2023

@aalkouz, @krisfreedain, and I had a discussion I'm documenting here

  • One option to try out is for issue authors to assign components at creation time
    Authors to assign component's label #10901 · opensearch-project/OpenSearch. We thought this would be best if we ensure the process doesn't dissuade anyone from creating a new issue.
  • If that doesn't work we agreed that creating an untriaged label for each component (e.g., Search:Relevance and untriaged-Search:Relevance) could work. An issue would come in as "untriaged," a maintainer would review to see if it needs to be addressed immediately or can be assigned a component. The maintainer would then remove the untriaged label and assign "untriaged-Search:Relevance." Some of us have been using Github Projects to have a label assignment trigger a move to a project board. In this case an issue assigned an "untriaged-Search:Relevance" label would appear on https://github.com/org/opensearch-project/projects/45 to be handled by an owner of issues related to that owner.

Either way, we also will look for a maintainer per component/label. If there are any maintainer volunteers to take care of a specific label, please raise your hand here!

Any thoughts, questions on these ideas? Volunteers?

@Pallavi-AWS
Copy link
Member

@mch2 will you be able to take care of the changes to our issue template to assign component labels at the time of opening an issue? We want to avoid the pileup of issues on core. Thanks.

@peternied
Copy link
Member

@macohen I'm a maintainer, but I don't own areas as you've defined them. I don't like the overhead and specifically the diffusion of responsibility between different maintainers and contributors. I prefer the broad project approach. Everyone is responsible.

If we had a simple majority of maintainers vote +1 on this issue that would sway me.

@macohen
Copy link
Contributor Author

macohen commented Dec 7, 2023

let's vote then, but also clarify what the vote is about. There also may be 1) a different way to slice the components up, 2) a gap in maintainers who do own those areas - I'd ask maintainers to review this based on merit - the usual way and see if any new folks qualify.

My concern is that if everyone owns it then nobody does which seems to be the state we're in now. But, that means we need to adjust something. I'd like to simplify this down to a few options for a vote and get more maintainers to chime in on these. 1) do these component labels make sense? 2) do we have at least one maintainer who can be responsible for triaging and clarifying label assignments right now? 3) do we agree on the goal? - no issue has an untriaged label for more than one week and issues can be assigned a label to further categorize issues so action can be taken.

cc maintainers: @abbashus @adnapibar @anasalkouz @andrross @Bukhtawar @CEHENKLE @dblock @dbwiddis @dreamer-89 @gbbafna @kartg @kotwanikunal @mch2 @msfroh @nknize @owaiskazi19 @peternied @reta @Rishikesh1159 @ryanbogan @sachinpkale @saratvemulapalli @setiah @shwetathareja @sohami @tlfeng @VachaShah

@vikasvb90
Copy link
Contributor

I am not a maintainer but I have a mixed opinion regarding this issue. @macohen I agree with all of your points which can potentially lead us to the goal of no issue has an untriaged label for more than one week but at the same time I also agree with a variant of @peternied's comment that this shouldn't mean diffusion of responsibilities. Any maintainer or contributor should be able to contest or contribute on any issue irrespective of the ownership. This movement should not draw boundaries and hence we may want to probably rephrase component ownership to something specific about triaging.

@macohen
Copy link
Contributor Author

macohen commented Dec 7, 2023

@vikasvb90 I agree with you on diffusion of responsibilities. thanks for that. My intent is not to say that one and only one maintainer is responsible for each component, but I definitely see why it comes across that way. I wonder if a process where there are at least two maintainers who are the go-to for each label, but of course, any maintainer can assign labels as they judge.

@Pallavi-AWS
Copy link
Member

Having component labels just gives a good starting point for issue triaging as we scale. All maintainers will have access to all issues, it is not exclusive to maintainers from that component only.

macohen added a commit to macohen/OpenSearch that referenced this issue Dec 14, 2023
…labels in core for easier understanding of what's there

Signed-off-by: Mark Cohen <markcoh@amazon.com>
@andrross
Copy link
Member

Closing as I believe we have created component labels. If there are any outstanding issues here please comment and/or reopen. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feedback needed Issue or PR needs feedback RFC Issues requesting major changes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants