Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add worker config options #1051

Closed
wants to merge 5 commits into from

Conversation

akihikokuroda
Copy link
Collaborator

Summary

Fix #1031

This PR is rework of PR #1025

Details and comments

Summary
expose number of workers option (number of workers, min workers and max workers when auto scaling is enabled)

Details and comments
The number of worker can be specified in the Configuration object for QuantumServerless.run function.

provider = ServerlessProvider(
username=os.environ.get("GATEWAY_USER", "user"),
password=os.environ.get("GATEWAY_PASSWORD", "password123"),
# token=os.environ.get("GATEWAY_TOKEN", ""), # token can be used instead of user/password combination
host=os.environ.get("GATEWAY_HOST", "http://localhost:8000"),
)
serverless = QuantumServerless(provider)
program = Program(
title="First program", entrypoint="program.py", working_dir="./source_files/"
)

job = serverless.run(program, config=Configuration(workers=2, min_workers=1, max_workers=5, auto_scaling=True))

@akihikokuroda
Copy link
Collaborator Author

I still need to work with tests for this but please take a look. Thanks!

Copy link
Collaborator

@psschwei psschwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general question on workers vs. min/max workers : does one ever override the other (i.e. if we set workers, does that invalidate min/max scaling)?

I didn't check if there's a ray doc that covers this, so if there is feel free to tell me to go read the #$%#$%# docs 😄

Comment on lines 83 to 91
Args:
title: program name
entrypoint: is a script that will be executed as a job
ex: job.py
env_vars: env vars
dependencies: list of python dependencies to execute a program
working_dir: directory where entrypoint file is located (max size 50MB)
description: description of a program
version: version of a program
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the args here be workers, cpu, etc.?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration is specified in run method of Quantum_serverless

serverless.run(program, config=Configuration(max_workers=12, min_workers=5)

I'll update the doc string of the function

@akihikokuroda
Copy link
Collaborator Author

@psschwei The workers is used when the auto scaling is not enabled. The min/max workers are used when the auto scaling is enabled.

@Tansito
Copy link
Member

Tansito commented Oct 23, 2023

I was a bit out these last weeks so maybe this question is a bit dumb so sorry in advance (and maybe this may be discussed in the issue). Is it has sense to let the user to modify the min/max number of workers? And if so, I see valuable to have an option that prevent to do that in case the user doesn't want to allow that kind of behavior from external users. Opinions: @IceKhan13 , @psschwei , @akihikokuroda ? (maybe a dumb question, as I said 😅 ).

@akihikokuroda
Copy link
Collaborator Author

@Tansito The max allowed values for the workers are specified in the settings.py (read from env variables). So users are not allowed to set values bigger than those.

@psschwei
Copy link
Collaborator

let the user to modify the min/max number of workers

it came up during the internal tests

@Tansito
Copy link
Member

Tansito commented Oct 23, 2023

@Tansito The max allowed values for the workers are specified in the settings.py (read from env variables). So users are not allowed to set values bigger than those.

I see so we are going to allow only modifications inside a threshold. Thanks @akihikokuroda ❤️

Happy to return to the work, I will try to catch-up and return 100% as soon as I can haha. Thanks team!

@psschwei
Copy link
Collaborator

@psschwei The workers is used when the auto scaling is not enabled. The min/max workers are used when the auto scaling is enabled.

so, for example, something like this

serverless.run(program, config=Configuration(workers=2, min_workers=1, max_workers=5, auto_scaling=True))

would ignore the workers=2 since auto_scaling is set to true, right?

We should probably add a doc on program configuration (to go along with the one for client configuration) to spell out all the options.

gateway/api/ray.py Outdated Show resolved Hide resolved
gateway/api/schedule.py Outdated Show resolved Hide resolved
@@ -86,11 +87,10 @@ def run(self, request):
tracer = trace.get_tracer("gateway.tracer")
ctx = TraceContextTextMapPropagator().extract(carrier=request.headers)
with tracer.start_as_current_span("gateway.program.run", context=ctx):
serializer = self.get_serializer(data=request.data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serializer should handle all related entities, so you do not need to created them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I restore the use of the ProgramSerializer. Thanks!

Copy link
Collaborator

@psschwei psschwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had one question (and one nit), but overall LGTM.
Will leave for @IceKhan13 to approve (as he had requested a few changes)

client/quantum_serverless/core/job.py Outdated Show resolved Hide resolved
Comment on lines 185 to 188
"workers": program_config.workers,
"min_workers": program_config.min_workers,
"max_workers": program_config.max_workers,
"auto_scaling": program_config.auto_scaling,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if bad values are passed? For example, a user sets min=5 and max=2... what kind of error message would they see? Is that caught by Ray or do we need to catch it? (I haven't tried yet, but based on past experience users will pass all kinds of combinations of bad values when configuring...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it with max=1, min=2. Ray didn't care, it just created 1 worker.

@IceKhan13 IceKhan13 modified the milestone: 0.7 Oct 26, 2023
@akihikokuroda akihikokuroda marked this pull request as draft October 26, 2023 15:04
@akihikokuroda
Copy link
Collaborator Author

I have to add the config to the existing program path.

@akihikokuroda akihikokuroda marked this pull request as ready for review October 26, 2023 18:37
@psschwei
Copy link
Collaborator

LGTM

@Tansito
Copy link
Member

Tansito commented Nov 3, 2023

Agree to attach the configuration to a Job!

@IceKhan13
Copy link
Member

Woohoo! Config to Job connection flawless victory!

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
@akihikokuroda
Copy link
Collaborator Author

ProgramConfig has been changed to JobConfig and it is related to Job. Please review. Thanks!

@IceKhan13
Copy link
Member

I'm on it! Thank you Aki!

null=True,
validators=[
MinValueValidator(0),
MaxValueValidator(settings.RAY_CLUSTER_WORKER_REPLICAS_MAX),
Copy link
Member

@IceKhan13 IceKhan13 Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configuration values in models will create conflict. If I change my settings now it will generate new migration diff and django will be complaining about migrations. Validation for this feilds should live in serializers

user: Any,
cluster_name: Optional[str] = None,
cluster_data: Optional[str] = None,
job_config: Optional[JobConfig] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we accept optional job_config here, but later on we always act like it is not None.

Here we should check for None and fall back to default from settings.

@@ -150,12 +157,18 @@ def run(self, request):
if not serializer.is_valid():
return Response(serializer.errors, status=status.HTTP_400_BAD_REQUEST)

try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we replace it with

jobconfig = None
config_data = request.data.get("config")
if config_data:
    config_serializer = JobConfigSerializer(data=json.loads(config_data))
    if not config_serializer.is_valid():
        return Response(config_serializer.errors, status=status.HTTP_400_BAD_REQUEST)

    jobconfig = config_serializer.save()

then we resolve validation + errors will be in a nice understandable format

QuantumServerlessException: 
| Message: Http bad request.
| Code: 400
| Details: {"workers":["Ensure this value is less than or equal to 5."]}

and we will not longer need save_jobconfig function.

Copy link
Member

@IceKhan13 IceKhan13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there :)

Remaining things to fix:

  • remove settings variables from models and migrations
  • move validation to serializer ref
  • write tests for serializer and api view with configuration

@IceKhan13 IceKhan13 removed their assignment Nov 6, 2023
@akihikokuroda
Copy link
Collaborator Author

@IceKhan13 Thanks for review. I'll make chages.

@IceKhan13
Copy link
Member

No, thank you for handling such a big change!

@akihikokuroda
Copy link
Collaborator Author

The runtime code changes are done. I'm working for the tests.

@akihikokuroda
Copy link
Collaborator Author

There remaining fixes have done. Please take a look! Thanks!

Copy link
Member

@IceKhan13 IceKhan13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Thank you @akihikokuroda for working on it!

@IceKhan13 IceKhan13 added the release candidate PR which is a candidate for next release label Nov 8, 2023
@akihikokuroda
Copy link
Collaborator Author

Thanks!

@IceKhan13 IceKhan13 changed the base branch from main to development November 10, 2023 19:53
@IceKhan13 IceKhan13 deleted the branch Qiskit:development November 13, 2023 14:51
@IceKhan13 IceKhan13 closed this Nov 13, 2023
@IceKhan13
Copy link
Member

IceKhan13 commented Nov 13, 2023

Reopened #1085

Sorry messed up with branches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release candidate PR which is a candidate for next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose ability to set min/max number of workers per cluster
4 participants