Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Contributor Guidelines, User Guide, and Pull Request Template #425

Merged
merged 8 commits into from
Oct 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
### Description
[Describe what this change achieves]

### Issues Resolved
[List any issues this PR will resolve]

### Testing
- [ ] New functionality includes testing

[Describe how this change was tested]

### Backport to Branches:
- [ ] 6
- [ ] 7
- [ ] 1
- [ ] 2
- [ ] 3

---
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check [here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).
105 changes: 105 additions & 0 deletions CONTRIBUTING.md
IanHoang marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Contributor Guidelines

This repository contains the default workload specifications for the OpenSearch benchmarking tool [OpenSearch Benchmark](https://github.com/opensearch-project/OpenSearch-Benchmark). This document is a general guide on best practices for contributing to this repository.

## Contents
- [Contributing a change to existing workload(s)](#contributing-a-change-to-existing-workloads)
- [Test changes](#test-changes)
- [Testing changes locally](#testing-changes-locally)
- [Testing changes with integration tests](#testing-changes-with-integration-tests)
- [Publish changes in a pull-request](#publish-changes-in-a-pull-request)
- [Reviewing pull-requests](#reviewing-pull-requests)
- [Backporting](#backporting)
- [Important note on backporting reverted commits](#important-note-on-backporting-reverted-commits)
- [Contributing a workload](#contributing-a-workload)


## Contributing a change to existing workload(s)

Before making a change, we recommend you fork the official workloads repository and make the change there.

You should also consider whether or not your change should be applied to one or more branches.
- If you know your change is only applicable to a specific branch, create the feature branch based off of that specific branch and make the changes there.
- If the change is applicable to several branches, create the feature branch based off of `main`, let's call it `test-forked-workloads`.

## Test changes

After making changes in your feature branch from your forked workloads repository, we recommend testing them locally and with integration tests via Github Actions in your forked OpenSearch Benchmark repository.

### Testing changes locally
It's recommended to test your change locally by performing the following:
1. Set up or use an existing OpenSearch cluster to test your changes against.
2. Based on the major version of the test cluster, cherry-pick the commit(s) with your change to the corresponding major version branch in your forked workloads repository. For example, if you're testing against an OpenSearch 2.X.X cluster, cherry-pick the changes from the feature branch to `2` branch.
3. Run the OpenSearch Benchmark command against your cluster with your modified workload in `--test-mode`. Ensure it works successfully.

Other tips when running the command in test mode against your cluster:
- Ensure you are using the workloads repository that you committed your changes in. To enforce this, provide the path to your repository via the `--workloads-repository` parameter.
- Alternatively, you can force OSB to use a specific branch by specifying the distribution version of your OpenSearch cluster via the `--distribution-version` parameter. To build on the example from the previous step, to ensure you are using branch `2`, set `--distribution-version=2.0.0` in the OpenSearch Benchmark command.
- Note that if changes that rely on newer OSB features are backported to older branches, integration tests in the OSB repository may begin to fail.

### Testing changes with integration tests

To ensure that there are no other breaking changes, we recommend running integration tests locally. To do this, you will need your own fork of the OpenSearch Benchmark repository (note, this is not the workloads repository). The tests will be run using GitHub Actions as described below.

**Prerequisites:**

To run integration tests, we recommend dedicating a branch in your forked OpenSearch Benchmark repository for this purpose. You only need to perform these steps once. After you have set this branch up, you will use this branch whenever you need to run integration tests against workload changes from your forked workloads repository.

1. In your forked OpenSearch Benchmark repository, create a separate branch that's based off of `main` and call it `test-forked-workloads`.
IanHoang marked this conversation as resolved.
Show resolved Hide resolved
2. In this branch, update two files -- `benchmark-os-it.ini` and `benchmark-in-memory.ini` files in the `/osbenchmark/it/resources` directory -- to point to the forked workloads repository containing your workload, similar to the output below.
```
# Update default.url in each ini file
[workloads]
default.url = https://github.com/<YOUR GITHUB USERNAME>/opensearch-benchmark-workloads
```
3. Push these changes and this branch up to your forked OpenSearch Benchmark repository.

You are now ready to run integration tests aginst your forked workloads repository.

**Run integration tests against your forked repository:**

1. Cherry-pick the commit(s) with your change to the branches that you expect your changes to be merged into.
2. Push these changes up to the remote branches of your forked workloads repository
3. In your forked OpenSearch Benchmark repository, visit Github Actions, click `Run Integration Tests` towards the left panel, select `test-forked-workloads` branch on the right, and click `Run workflow`. The tests should run for about 20-30 minutes. Verify that they run successfully. See the following reference screenshot for guidance.

![example](https://dbyiw3u3rf9yr.cloudfront.net/assets/test-forked-workloads.png)

## Publish changes in a pull-request

When committing changes, please include the `--signoff` flag.

Before publishing the pull-request containg your changes, please ensure you've addressed the following in the PR:

1. **Describe the changes**: In PR description, indicate what this change does and what it solves. If it fixes a bug, provide a sample output of what users experience before the fix and what they can expect after the fix is applied. If it's supporting a new feature, provide an output of what users can expect.
2. **Indicate where to backport**: It is the contributor's responsibility to indicate whether this change should be merged into a single branch or into several branches. The changes should always go into `main` branch but might only apply to specific branches.
- If your change needs to go into different branches, determine if they will smoothly backport. To do this, perform a diff between `main` branch containing your cherry-picked commit and the other branches that need the change. If there are some conflicts or changes that you might introduce that are not related to your PR, take note of that in the PR description. Maintainers will use all this information to properly label the pull-request.
3. **Provide evidence that your changes were tested**: If you tested locally, paste a short sample output in the description or attach a file displaying the output. If you tested with your forked OSB repository's Github Actions, link that.
4. **Request additional members to review**: If your change is adding support for a new features in OpenSearch, please tag an individual who is a subject-matter expert (SME) and can review the change.

Create a pull request (PR) from your fork to the OpenSearch Benchmark [workloads repository](https://github.com/opensearch-project/opensearch-benchmark-workloads/)

## Reviewing pull-requests

Reviewers and maintainers should review pull-requests and ensure that the changes are well-defined and well-scoped.

Other tips:
1. Review changes. If the PR deals with adding support for specific features in OpenSearch, ensure a subject-matter expert (SME) reviews the change in addition to your review
2. Ensure that the change is tested
3. Label with backporting options based on the PR description before approving. The contributor should have included which branches, aside from `main` branch, the PR should be merged into.
IanHoang marked this conversation as resolved.
Show resolved Hide resolved

### Backporting
Ensure that there are no backport errors or conflicts. If there are If there are, be careful on backporting changes.

Changes should be `git cherry-pick`ed from `main` to the most recent version of OpenSearch and backward from there.
Example:
```
main → OpenSearch 3 → OpenSearch 2 → OpenSearch 1 → Elasticsearch 7 → Elasticsearch 6
```
In the case of a merge conflict for a backported change introduced by the contributor's PR, a separate pull request should be raised which merges the change directly into that target branch. **Ensure the only changes added to the branch are the ones from the contributor's PR.**

### Important note on backporting reverted commits
Sometimes we'll need to revert a change. In those cases, we should revert the change across all branches. Revert the precise change that was made to each branch, rather than backporting a new change. If reverting isn't performed properly, this can create other issues since each branch contain variations of relevant workloads appropriate for the respective branches. Note that versions `6` and `7` can be thought of as "older" versions of OpenSearch, based on an historical version-numbering oddity.

## Contributing a workload

For information on how to contribute a workload to the repository, please see [Sharing Custom Workloads](https://opensearch.org/docs/latest/benchmark/user-guide/contributing-workloads/) in the official documentation.
79 changes: 21 additions & 58 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,76 +1,39 @@
[![Chat](https://img.shields.io/badge/chat-on%20forums-blue)](https://forum.opensearch.org/categories)
![PRs welcome!](https://img.shields.io/badge/PRs-welcome!-success)

OpenSearch Benchmark Workloads
------------
------------------------------

This repository contains the default workload specifications for the OpenSearch benchmarking tool [OpenSearch Benchmark](https://github.com/opensearch-project/OpenSearch-Benchmark).

You should not need to use this repository directly, except if you want to look under the hood or create your own workloads.

How to Contribute
-----------------

If you want to contribute a workload, please ensure that it works against the main version of OpenSearch (i.e. submit PRs against the `main` branch). We can then check whether it's feasible to backport the track to earlier OpenSearch/Elasticsearch versions.

After making changes to a workload, it's recommended for developers to run a simple test with that workload in `test-mode` to determine if there are any breaking changes.

See all details in the [contributor guidelines](https://github.com/opensearch-project/opensearch-benchmark/blob/main/CONTRIBUTING.md).

**Following are the steps to consider when contributing.**
### Create a README.md

- The purpose of the workload. When creating a description for the workload, consider its specific use and how the that use case differs from others in the repository.
- An example document from the dataset that helps users understand the data’s structure.
- The workload parameters that can be used to customize the workload.
- A list of default test procedures included in the workload as well as other test procedures that the workload can run.
- An output sample produced by the workload after a test is run.
- A copy of the open-source license that gives the user and OpenSearch Benchmark permission to use the dataset.

For an example workload README file, go to the [http_logs](https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/main/http_logs/README.md).
How to contribute a change
--------------------------

### Verify the workload’s structure
See an area to make improvements or add support? Follow these major steps:

The workload must include the following files:
- `workload.json`
- `index.json`
- `files.txt`
- `test_procedures/default.json`
- `operations/default.json`
1. Fork this repository and make the change on a feature branch that's based off of `main`
2. After making changes to an existing workload, it's recommended for developers to run a simple test against the change in `test-mode` to determine if there are any breaking changes. It's also recommended to [test the changes against the OpenSearch Benchmark's integration tests](https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/main/CONTRIBUTING.md#testing-changes-with-integration-tests).
3. Lastly, create a pull-request against `main` branch of this repository and ensure you have include how you tested it and which branches the change should be backported to.

Both default.json file names can be customized to have a descriptive name. The workload can include an optional workload.py file to add more dynamic functionality. For more information about a file’s contents, go to [Anatomy of a workload](https://opensearch.org/docs/latest/benchmark/user-guide/understanding-workloads/anatomy-of-a-workload/).
For more details, see the [contributor guidelines](https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/main/CONTRIBUTING.md).

### Testing the workload

- All tests run to explore and produce an example from the workload must target an OpenSearch cluster.
- The workload must pass all integration tests. Follow these steps to ensure that the workload passes the integration tests:
1. Add the workload to your forked copy of the [workloads repository](https://github.com/opensearch-project/opensearch-benchmark-workloads/). Make sure that you’ve forked both the opensearch-benchmark-workloads repository and the OpenSearch Benchmark repository.
2. In your forked OpenSearch Benchmark repository, update the `benchmark-os-it.ini` and `benchmark-in-memory.ini` files in the `/osbenchmark/it/resources` directory to point to the forked workloads repository containing your workload.
3. After you’ve modified the `.ini` files, commit your changes to a branch for testing.
4. Run your integration tests using GitHub actions by selecting the branch for which you committed your changes. Verify that the tests have run as expected.
5. If your integration tests run as expected, go to your forked workloads repository and merge your workload changes into branches 1 and 2. This allows for your workload to appear in both major versions of OpenSearch Benchmark.
How to Contribute a Workload
----------------------------

### Create a PR
Please see the [sharing custom workloads guide](https://opensearch.org/docs/latest/benchmark/user-guide/contributing-workloads/) in the official documentation for OpenSearch Benchmark.

After testing the workload, create a pull request (PR) from your fork to the opensearch-project [workloads repository](https://github.com/opensearch-project/opensearch-benchmark-workloads/). Add a sample output and summary result to the PR description. The OpenSearch Benchmark maintainers will review the PR.

Once the PR is approved, you must share the data corpora of your dataset. The OpenSearch Benchmark team can then add the dataset to a shared S3 bucket. If your data corpora is stored in an S3 bucket, you can use [AWS DataSync](https://docs.aws.amazon.com/datasync/latest/userguide/create-s3-location.html) to share the data corpora. Otherwise, you must inform the maintainers of where the data corpora resides.

For more details, see this [guide](https://opensearch.org/docs/latest/benchmark/user-guide/contributing-workloads/)

Backporting changes
-------------------

With each pull request, maintainers of this repository will be responsible for determining if a change can be backported.
Backporting a change involves cherry-picking a commit onto the branches which correspond to earlier versions of OpenSearch/Elasticsearch.
This ensures that workloads work for the latest `main` version of OpenSearch as well as older versions.

Changes should be `git cherry-pick`ed from `main` to the most recent version of OpenSearch and backward from there.
Example:
```
main → OpenSearch 2 → OpenSearch 1 → Elasticsearch 7 → ...
```
In the case of a merge conflict for a backported change a new pull request should be raised which merges the change.
Getting help
------------


- Want to contribute to OpenSearch Benchmark? See [OpenSearch Benchmark's Developer Guide](https://github.com/opensearch-project/OpenSearch-Benchmark/blob/main/DEVELOPER_GUIDE.md) for more information.
- Want to contribute to OpenSearch Benchmark Workloads? Look at OpenSearch Benchmark workloads repository's [Contribution Guide](https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/main/CONTRIBUTING.md) for more information.
- For any questions or answers, visit [our community forum](https://forum.opensearch.org/).
- File improvements or bug reports in our [Github repository](https://github.com/opensearch-project/opensearch-benchmark-workloads/issues).
License
-------

There is no single license for this repository. Licenses are chosen per workload. They are typically licensed under the same terms as the source data. See the README files of each workload for more details.
26 changes: 26 additions & 0 deletions USER_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# OpenSearch Benchmark Workloads User Guide

OpenSearch Benchmark (OSB) comes packaged with several workloads that are included in this repository. This guide provides a brief overview of the organization of this repository as well as how OSB uses these workloads.

### Contents
- [What do branches represent](#what-do-the-numbered-branches-represent)
- [How does OSB select which branch to use](#how-does-osb-select-which-branch-to-use)
- [Force OSB to use a Specific Branch](#force-osb-to-use-a-specific-branch)
IanHoang marked this conversation as resolved.
Show resolved Hide resolved

### What do the numbered branches represent?

Don't worry, these numbers are not the same [numbers](https://lostpedia.fandom.com/wiki/The_Numbers) in the series [Lost](https://en.wikipedia.org/wiki/Lost_(2004_TV_series)). Each branch -- `main`, `6`, `7`, `1`, `2`, `3` -- is associated with a specific major version of OpenSearch or Elasticsearch and contains variations of each workload.
IanHoang marked this conversation as resolved.
Show resolved Hide resolved

### How does OSB select which branch to use?
OSB has a mechanism to detect the major version of the target cluster.

Based off the major version it detects:
- OSB will select workloads from branches `1`, `2`, or `3` if the target cluster has an OpenSearch major version of 1.X.X, 2.X.X, or 3.X.X respectively.
- OSB will select workloads from branches `6` or `7` if the target cluster has an Elasticsearch major versions 6.X.X or 7.X.X respectively.

If OSB cannot determine the major version or if the major version does not exist as a branch in the repository, OSB will select workloads from `main` branch as a last resort.

### Force OSB to use a specific branch
Users can force OSB to use a specific branch by specifying `--distribution-version=X.X.X`. For example, if a user is testing a cluster with OpenSearch version 2.0.0 but wants to use the workloads associated with OpenSearch version 1.X.X, they can supply `--distribution-version=1.0.0` when invoking OSB.

However, it's not recommended to force testing workloads from a branch that is greater than the target cluster's major version (e.g. testing workloads from branch `2` on OpenSearch cluster 1.X.X). This can cause issues as earlier versions might not have operations that are included in later version workloads.
Loading