Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edit task environment variables #503

Closed
mjpieters opened this issue Feb 26, 2024 · 15 comments · Fixed by #584
Closed

Edit task environment variables #503

mjpieters opened this issue Feb 26, 2024 · 15 comments · Fixed by #584
Labels
f: Help wanted s: Client This issue touches the pueue client s: Daemon This issue touches pueue daemon s: Pueue-lib This issue touches the pueue-lib library t: Feature A new feature that needs implementation

Comments

@mjpieters
Copy link
Contributor

A detailed description of the feature you would like to see added.

Currently, you can edit three aspects of a scheduled task: the command to be run, the path where the task is run and it's label.

Tasks have another component: the environment variables under which the command is executed. Can we have an option to edit these?

A stretch goal would be the ability to add, set or remove environment variables across all tasks or a task group. This lets you do things like set bandwidth limits on a whole group of tasks where the commands in those tasks honour a bandwidth limit set in an environment variable. Powerful stuff!

Explain your usecase of the requested feature

Loads of UNIX command change their behaviour based on the environment variables that are present when they start. Sometimes, we need to change what environment variables are present for a given task, because sometimes circumstances change or we made a mistake with the env vars when the task was added.

The ability to alter environment variables across all tasks is another powerful option. This may need to be a separate command (or even a separate FR!).

Alternatives

I've actively edited the command for tasks to add extra environment variables. This is a poor workaround however, as it is easy to accidentally alter the command itself.

Additional context

No response

@Nukesor
Copy link
Owner

Nukesor commented Feb 27, 2024

It's a bit of an edge-case, but I definitely see how this is a good QoL feature.

I'm thinking about the best way to do this.
My first thought would be to set the key-values directly on the commandline, especially if one wants to set multiple values at the same time (e.g. pueue edit 0 -e KEY=value -e OTHER_KEY=othervalue).
The behavior will then be a bit inconsistent, but I feel like editing environment variables in an editor will be pretty cumbersome as there're usually a lot of variables.

Maybe we should add a new subcommand for this? This would also make it quite easy to edit variables for groups or multiple commands, as i feel like those flags would be very confusing on the current edit command.

@Nukesor Nukesor added t: Feature A new feature that needs implementation s: Pueue-lib This issue touches the pueue-lib library s: Daemon This issue touches pueue daemon s: Client This issue touches the pueue client labels Feb 29, 2024
@mjpieters
Copy link
Contributor Author

Yes, the more I think about this, the more this sounds like a separate subcommand.

@Nukesor
Copy link
Owner

Nukesor commented Mar 10, 2024

If you like, feel free to start hacking on this :) Your contributions have always been top notch and I believe that you'll make the right design decisions ;D

@activatedgeek
Copy link

Just wanted to chime in with another use case.

I currently schedule jobs on a machine with GPUs using pueue. Most times, I need multiple GPUs and set the CUDA_VISIBLE_DEVICES argument with a list of GPU IDs for a particular run. To get the queueing effect for that run jobs serially on one set of GPUs, I create a group with the default parallel 1 but manually attach the variable with a bash function,

function pugrun {
  GPUS=${GPUS:-0}

  pueue group add gpu-${GPUS}

  CUDA_VISIBLE_DEVICES=${GPUS} pueue add -g gpu-${GPUS} -- "${@}"

  unset GPUS
}

Editing/adding environment variables to groups would be a great addition! Thank you for the amazing project!

@Nukesor
Copy link
Owner

Nukesor commented Apr 4, 2024

@activatedgeek It sounds like you could make good use of the https://github.com/Nukesor/pueue/wiki/Get-started#load-balancing feature. This would allow you to do automatic balancing on your gpus for you :)

@activatedgeek
Copy link

Thank you.

My understanding was that this behavior would require me to change existing code to parse out the PUEUE_WORKER_ID and transform it to appropriate values of CUDA_VISIBLE_DEVICES (e.g. something like a round robin over the list of GPUs). To avoid going back and change internal code plumbing, it was the easiest to just modify at pueue add time.

But of course, nothing too complicated since it can be similarly plugged in by instead of calling the GPU job directly, calling another bash script that does the transform to call to the GPU job.

Another tricky scenario with the example was this series of steps, say for group cluster1 with parallel 2 (assuming a machine with 2 GPUs with each job using 1 GPU in a round-robin manner):

Queue experiment 1 on cluster 1
Queue experiment 2 on cluster 1
Queue experiment 3 on cluster 1
Scheduled experiment 1 (PUEUE_WORKER_ID = 0 assigned to GPU 0)
Scheduled experiment 2 (PUEUE_WORKER_ID = 1 assigned to GPU 1)
Completed experiment 2
Scheduled experiment 3 (PUEUE_WORKER_ID = 2 assigned to GPU 0)

Now experiment 1 and experiment 3 both run on GPU 0, and in many of my cases with large models would lead to an out of memory error on GPU 0. Of course, this could be fixed with slightly smarter scheduling at the expense of more plumbing. This is why I took the easier route of limiting each group to schedule on a fix set of GPU IDs.

Let me know if the scenario makes sense, and if I am missing something simpler here.

@Nukesor
Copy link
Owner

Nukesor commented Apr 4, 2024

My understanding was that this behavior would require me to change existing code to parse out the PUEUE_WORKER_ID and transform it to appropriate values of CUDA_VISIBLE_DEVICES (e.g. something like a round robin over the list of GPUs).

Exactly. That's how it was designed to be used, if the Ids weren't directly mappable to the external resources. (Except the round-robin)

Now experiment 1 and experiment 3 both run on GPU 0, and in many of my cases with large models would lead to an out of memory error on GPU 0. Of course, this could be fixed with slightly smarter scheduling at the expense of more plumbing. This is why I took the easier route of limiting each group to schedule on a fix set of GPU IDs.

That's actually not how it would work. In your example, the last line would look like this:

Completed experiment 2
Scheduled experiment 3 (PUEUE_WORKER_ID = 1 assigned to GPU 1)

The ID is not the task id, but rather the internal "worker" or rather "slot" id. A queue with 5 parallel tasks will get 5 slots with ids 0-4. As soon as a task in a queue finishes, its "slot" is freed and the next task then gets that slot.

If it doesn't work this way, this is definitely a bug! That system was specifically designed to handle your usecase. Machines/pools with a set number of limited resources, to which the tasks are then assigned in a greedy manner.

If the current wiki entry doesn't properly convey this behavior, it would be awesome if you could rephrase it! The wiki is free to be edited by anyone :)

@activatedgeek
Copy link

Ahaa! That makes much more sense. Thank you for clarifying.

Let me run a quick test simulating this scenario, and will report back. I'll update the wiki with this example so that the behavior is more explicit.

@activatedgeek
Copy link

Sorry for the delay. I have verified this behavior, and this actually solves my specific problem.

I also realized I missed this:

There'll never be two concurrent tasks with the same $PUEUE_WORKER_ID in the same group.

In that case, the wiki already explains it precisely, and I will skip the wiki edit.

@Nukesor
Copy link
Owner

Nukesor commented Jul 4, 2024

@mjpieters I'm thinking about adding a pueue task command to inspect and eventually manipulate tasks.

The idea came up during #540.

What do you think about a pueue task env $TASK_ID command to inspect the environment.
And respective pueue task env unset $TASK_ID $NAME and pueue task env set $TASK_ID $NAME $VALUE commands?

Is the nesting too big or is this reasonable?

@Nukesor
Copy link
Owner

Nukesor commented Aug 1, 2024

Ping @mjpieters

@mjpieters
Copy link
Contributor Author

mjpieters commented Aug 1, 2024

I love the nesting, actually. Grouping like this can really help with command clarity. E.g. the gh command-line tool for GitHub has 3 levels for most commands.

I would almost say grouping is overdue for pueue; you have commands that operate on either a group or a task, which maybe could do with splitting across command groups, with aliases for backwards compatibility / ease of typing:

  • pueue task (aliases: pueue t)
    • add (aliases: pueue add)
    • remove <TASKIDS>... (aliases: pueue task rm, pueue remove)
    • switch <TASKID1> <TASKID2> (aliases: pueue switch)
    • stash <TASKIDS>... (aliases: pueue stash)
    • enqueue [TASKIDS]... (aliases: pueue enqueue)
    • start [TASKIDS]... (aliases: pueue start)
    • restart [TASKIDS]... (aliases: pueue task re, pueue restart)
    • pause [TASKIDS]... (aliases: pueue pause)
    • kill [TASKIDS]... (aliases: pueue kill)
    • send <TASKID> <INPUT> (aliases: pueue send)
    • edit <TASKID> (aliases: pueue edit)
    • log [TASKIDS]... (aliases: pueue log)
    • follow [TASKID] (aliases: pueue task fo, pueue follow)
    • wait [TASKIDS]... (aliases: pueue wait)
    • env <TASKID>
      • set <TASKID> <ENVVAR> <VALUE>
      • unset <TASKID> <ENVVARS>...
  • pueue group (aliases: pueue g)
    • add <GROUP>
    • remove <GROUPS>... (aliases: pueue group rm)
    • start [GROUPS]... (aliases: pueue start -g/--group, pueue start -a/--all)
    • restart [GROUPS]... (aliases: pueue group re, pueue restart -g/--failed-in-group, pueue restart -a/--all)
    • pause [GROUPS]... (aliases: pueue pause -g/--group, pueue pause -a/--all)
    • kill [GROUPS]... (aliases: pueue kill -g/--group, pueue kill -a/--all)
    • status <GROUPS>... (aliases: pueue status -g/--group)
    • wait [GROUPS]... (aliases: pueue wait -g/--group, pueue wait -a/--all)
    • clean <GROUPS>... (aliases: pueue group cleanup, pueue group clear, pueue clean -g/--group)
    • reset [GROUPS]... *(aliases: pueue reset -g/--group)
    • parallel [GROUP] [PARALLEL_TASKS] (aliases: pueue parallel -g/--group)

and the following first-tier top-level commands:

  • pueue status (aliases: pueue)
  • pueue clean (aliases: pueue cleanup, pueue clear)
  • pueue shutdown
  • pueue completions
  • pueue help

Note that for many commands that accepted a single -g/--group name the above structure allows for multiple groups to be named. Ditto for pueue task env unset <ENVVARS>... to unset multiple environment variables (but I rather name them explicitly and not default to 'all' if you don't name envvars).

@Nukesor
Copy link
Owner

Nukesor commented Aug 2, 2024

I really like the group proposal. Especially, since I intuitively tried to use Pueue like that a few times already :D

I'm not sure about the task subcommand though. I feel conflicted about having the task subcommand space mirror a big part of the top-level subcommand space. It just feels a bit off to have 2 or more different ways of achieving the same thing.

I guess, this would also be confusing, as I would expect different commands to perform different tasks?
I know that there're some tools out there that already work this way, but I personally always found those a bit unintuitive/confusing to use.

But permanently moving all those commands (add, remove, etc.) to the task subcommand space would make the CLI less convenient to use. I still assume that most users won't work with groups on a regular basis, so the main interface should stay as convenient as possible.

In general, I really like the idea to restructure the CLI, especially since the next version will be a breaking major version anyway.

I guess I need to think a bit about your proposal and make my mind up about it.

@mjpieters
Copy link
Contributor Author

Fair enough; it's why I included aliases for the original top-level commands in there (as well as thinking about backwards compatibility). It could perhaps depend on how the aliases are represented when you run pueue --help.

@Nukesor
Copy link
Owner

Nukesor commented Dec 3, 2024

I thought a lot about the CLI redesign and still really wasn't happy with any solution.
In my mind, some solutions made the usage either unintuitive/confusing for newcomers, since there were multiple ways to do one thing. Alternatively, other solutions made the tool harder to use for the 90% usecase.

This issue was kind of blocking me and preventing the 4.0 release, which included a lot of other stuff that needed to be released, so I decided to stick to the old pattern for now and tackle this in a later release :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
f: Help wanted s: Client This issue touches the pueue client s: Daemon This issue touches pueue daemon s: Pueue-lib This issue touches the pueue-lib library t: Feature A new feature that needs implementation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants