-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Create-Workload Enhancements] Rearchitect Create-Workload Feature #587
Labels
enhancement
New feature or request
Comments
IanHoang
added
enhancement
New feature or request
untriaged
and removed
untriaged
labels
Jul 17, 2024
3 tasks
gkamat
changed the title
[Create-Workload Enhancements] Reorganize Create-Worklload Feature
[Create-Workload Enhancements] Reorganize Create-Workload Feature
Jul 18, 2024
3 tasks
IanHoang
changed the title
[Create-Workload Enhancements] Reorganize Create-Workload Feature
[Create-Workload Enhancements] Rearchitect Create-Workload Feature
Aug 9, 2024
Received feedback to add support for pbzip2 compression now that OSB supports it. Will create a separate PR for it. |
@IanHoang, it may be helpful to add some child tasks to this issue, since there are multiple items here. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Overview
This is an issue based off one of the proposed priorities in this RFC: #395
Background
As of now, OSB's create-workload is a monolith that uses a two modules of functions to create a custom workload. It was inherently designed to be a quick and easy way to build custom workloads off of small corpora. While this approach has worked in the past, there is an increasing demand for building custom workloads based off of complex workloads and more users are using this feature to achieve this.
Users who have been using this feature have mentioned that the create-workload code currently is difficult to extend, maintain, and, for newcomers to OSB, difficult to follow and interpret.
We should rearchitect the code to be more organized and scalable, which in turn will make it easier to extend and maintain. This work will also serve as the foundation for future development, such as extracting a random sampling of the documents and repairing incomplete workloads.
Proposed Design
While the existing approach is considered modular, create-workload in its current state is unwieldy. We have gathered feedback from users who have extended the feature and have used the feature to build custom workloads based on complex production workloads that are up to 10TB. Based on the feedback received, we should rearchitect create-workload to have the following components:
Proposed priority
It also makes it difficult for newcomers to come and understand the code easily. This approach would promote encapsulation and abstraction, overall making create-workload more organized and scalable as well as will be easier to extend and maintain.
The text was updated successfully, but these errors were encountered: