Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering pueue status #213

Closed
codekoala opened this issue Jun 23, 2021 · 20 comments · Fixed by #367
Closed

Filtering pueue status #213

codekoala opened this issue Jun 23, 2021 · 20 comments · Fixed by #367
Labels
f: Help wanted t: Feature A new feature that needs implementation

Comments

@codekoala
Copy link

Describe the solution you'd like
It would be great to be able to filter tasks with pueue status (or some other command?), particularly when multiple groups are involved. For example, I have cronjobs that queue up multiple types of tasks. It can be rather chaotic to examine the status for all tasks across all groups.

Some ideas:

  • An option to show the most recent n tasks (with or without --group filtering): pueue status --last 10
  • An option to filter by date/time, with behavior similar to the --delay option for pueue add:
    • pueue status --since 1d
    • pueue status --filter 2021-06-23
    • pueue status --filter 2021-06-01..2021-06-05

Describe alternatives you've considered
I could use pueue status --json and jq or similar, but...

@codekoala codekoala added the t: Feature A new feature that needs implementation label Jun 23, 2021
@Nukesor
Copy link
Owner

Nukesor commented Jun 23, 2021

This might be useful. It's also an client-only thing, so we don't have to touch any daemon/shared library logic in here as well.

Anyhow, I'm a little hesitant to introduce filter logic..
This opens up a huge amount of possible filter clauses, which then needs its own kind of pseudo "query language".

Just in your few examples we had:

  • limit $1 descending, without even specifying the actual filter (running/done/queued included?) (--last)
  • date > now - $1 (--after)
  • date == $1 (--at)
  • date > $1 && date < $2 (--between)

The next step would then probably be:

  • Filter by status
  • Filter by result
  • Filter by command

This stuff usually is a pain in the ass to maintain and really hard to get right.

I really don't want to bloat the commandline parameters with 10 (exaggerated, but not unrealistic) different filter options, where many of those options might not be compatible with other ones. This leads to a lot of complexity and bug potential.

On the other hand, I also don't want to add some kind of "query/filter language", as this is similarily hard to maintain/get right.

To be honest, I just don't feel comfortable to implement and maintain a feature with this level of potential complexity right now. One thing we can be sure about is that more filter options will be added over time, once we started doing this.

Anyhow, maybe I'm just overlooking some super elegant solution, so feel free to share your thoughts about this :) It would be very interesting to see other projects, that have done this in the past.

@Nukesor
Copy link
Owner

Nukesor commented Jun 23, 2021

For your information:

If i think about filter logic, the first thing that pops into my mind is the taskwarrior filter language.
It's horrible and way too complex, but I don't know how to make it better.

@codekoala
Copy link
Author

Thank you for your response! Introducing flexible filtering logic would definitely be a lot more complicated than I had initially considered when I created the ticket. Doing it "right" would be hard.

One thought to handle this is to offload the filtering capabilities to whatever the user chooses then display results using pueue status. For example:

$ pueue status --json | jq 'some filtering logic of my choosing' | pueue status -

Piping externally-filtered json back into pueue status doesn't seem to be the most elegant of options, but the idea behind that would be to get pretty tables with the subset of tasks. It seems like it could be trivial to reuse the display logic from pueue status as another subcommand, like pueue format-status or something more creative. This sort of solution would make it easy for those who are interested to create shell aliases for whatever they want, using tools they're comfortable with, without the burden of complex filtering logic being built into pueue.

Another (perhaps long-term) option might be to embed some scripting library like hlua or deno or whatever, and pass each task object through a user-defined filtering expression for that library. This would be a heavier option, but it seems like it would be easy enough to let users get as crafty as they want to without much more than if evaluate_filter_expression(expr, task).

@Nukesor
Copy link
Owner

Nukesor commented Sep 12, 2021

Hm. I would definitely go for the long-term option.
One of the main guidelines of Pueue is, that it should act as a stand-alone binary. Hence, requiring other cli tools such as jq is not an option.

After letting it sink for a few weeks, I would be fine with a well-tested custom query language, as long as it's not too extensive or dynamic. I'm thinking of something like SQL, but mostly the aspect that there is some kind of structure. I.e. What you want to select and a block of conditions with pre-defined operators.
Adding a full-fledged scripting language seems like a bit of overkill, as this introduces many capabilities and a lot of overhead that is simply not needed.

It would be cool to have somebody join the discussion, who has some experience on this topic.

Maybe we could build something fancy with Pest. I was looking for an excuse to play around with this for quite some time.

@codekoala
Copy link
Author

Using something like Pest to handle a SQL-esque filter language sounds really interesting to me! I would totally be on board with that! I'm not sure when I'll have time to help with that sort of implementation, but I will try :)

@d33tah
Copy link

d33tah commented Nov 17, 2021

Supporting --last! Also --running-only would be a nice thing to have.

@stranger-danger-zamu
Copy link

stranger-danger-zamu commented Jan 13, 2022

One of the main guidelines of Pueue is, that it should act as a stand-alone binary. Hence, requiring other cli tools such as jq is not an option.

This is fine, but even without boutique tools like jq, having a structured output format (like JSON) allows for a lot more flexibility to the user (scripting being a big one for me). Additionally, you can easily do most filtering (not sorting!) with just grep or sed. Sometimes it's nice to not have to look at documentation to figure out how to filter but instead just use grep for a quick and dirty one-liner.

As for the long term solution, make it an actual filtering language (ie. only inclusion/exclusion) and have sorting be separate (ie.
--sort-by <attrib> [asc|desc]).

@Nukesor
Copy link
Owner

Nukesor commented Jan 14, 2022

To be honest, I think I misunderstood the original proposal of @codekoala.
From the looks of my response, it seems like I thought that he proposed to include calls of jq inside of pueue?
I don't know how I came up with that idea, since his example and explanation are pretty much obvious and on point.

I guess a format-status command could be somewhat useful until there's a proper filtering language implemented.
All the subcommand had to do would be to read the state, deserialize it and display it as usual.

Though, I would like to have some comment on that subcommand which notifies the user that this command might be removed in the future.

@codekoala
Copy link
Author

I guess a format-status command could be somewhat useful until there's a proper filtering language implemented.
All it the subcommand had to do would be to read the state, deserialize it and display it as usual.

This sounds like an excellent compromise to me! It seems like all of the interesting functionality is already built, and it would just need to be exposed as a new subcommand that reads from a file/stdin.

@Nukesor Nukesor mentioned this issue Jan 14, 2022
3 tasks
@Nukesor
Copy link
Owner

Nukesor commented Jan 14, 2022

#282 implements the temporary format-status feature as discussed.

I tested it with a simple pueue status -j | pueue format-status and it works as expected :)

Could you verify whether this suits your needs?

@codekoala
Copy link
Author

codekoala commented Jan 14, 2022

I can verify that the pueue status -j | pueue format-status command does work for me as well. It appears that this will work quite nicely, thank you!

That said, I seem to have underestimated the complexity of a jq expression that would handle filtering and return data suitable for format-status, haha. Admittedly, my jq-fu is relatively rudimentary. Normally I am able to extract the information I want form a JSON document. However, in this scenario, I need to figure out how to modify the tasks object inside the document, returning everything else (such as settings.client, settings.daemon, etc) in order for the JSON being consumed by pueue format-status to be well-formed. This isn't necessarily pueue's concern, as I could handle the filtering using any number of tools. It just happens to be more complicated than I had originally anticipated when I proposed this approach.

Would you consider having pueue format-status expect the input file/stdin to contain individual task objects, such as:

{"id": 0, "original_command": "cowsay", "command": "...." .....}
{"id": 1, "original_command": "cowsay", "command": "...." .....}

and use that to reconstruct the form expected by format-status? I can't think of a time when this functionality would be used to do any manipulation on the settings (and expect to then format the output).

@codekoala
Copy link
Author

For context, this is what I've come up with so far for basic task filtering using jq:

$ pueue status --json | jq -c '.tasks | to_entries | map(select(.value.command == "cowsay")) | from_entries'                                                
{"441":{"id":441,"original_command":"cowsay","command":"cowsay","path":"/home/josh/dev/tinker/pueue","envs":{},"group":"default","dependencies":[],"label":null,"status":{"Done":{"Failed":127}},"prev_status":"Queued","start":"2022-01-14T13:22:08.300538697-05:00","end":"2022-01-14T13:22:08.511706488-05:00"}}

This seems like it could just be parsed directly into the tasks part of the JSON dumped by pueue status --json (as opposed to the format I suggested a few minutes ago, which would require additional magic inside pueue). I just learned about the to_entries and from_entries features of jq a few minutes ago. I would love to learn of better options for this if anyone has suggestions!

@Nukesor
Copy link
Owner

Nukesor commented Jan 14, 2022

Working with tasks only would work, but we would lose (some) group headline info. Specifically the amount of parallel tasks per group and the current group status.

Furthermore, if we want to display tasks sorted by groups, we need a little bit of additional logic, as the current display logic heavily relies on the state.groups map.

@Nukesor
Copy link
Owner

Nukesor commented Jan 15, 2022

The newest commit on the branch now accepts a map of tasks.

pueue status --json | jq -c '.tasks' | pueue format-status works, your more complex example @codekoala also works :)

I also found a simple solution to bypass the problem of missing group info.
The new code simply pulls the state a second time and overwrites the full task list with the filtered task list of the user :D
Kinda hacky, but it works well enough.

@Nukesor
Copy link
Owner

Nukesor commented Jan 16, 2022

@stranger-danger-zamu @codekoala
Just let me know if this feature is now to your liking :)
I'll then go ahead and merge that PR.

@codekoala
Copy link
Author

@Nukesor this is awesome! Thank you! I can confirm that this is behaving more like what I had in mind when I initially proposed the feature. The only thing that isn't quite as I had expected is the fact that tasks is a map rather than a list, but I certainly understand why this is the case.

I, personally, consider this feature to be "implemented" with these changes. I don't know whether you would want to keep this ticket open to track eventual work on built-in filtering we discussed earlier in the thread. Perhaps that would merit its own ticket?

Thank you again for building this out!

@Nukesor
Copy link
Owner

Nukesor commented Jan 16, 2022

Perfect, I'm going to merge this then!
I also patched the command so that it can accept both, a map and a list of tasks. This should improve the ergonomics of this command a little.

Anyhow, let's keep this issue open for a possible built-in filtering logic :)

@kjhoerr
Copy link

kjhoerr commented Feb 1, 2022

Don't know if anybody else would be interested in this. I use nushell frequently, and converting pueue to work with nushell isn't too much work. I have this alias for pueue status:

alias spu = (
  pueue status --json
    | from json
    | get tasks
    | pivot
    | get Column1
    | each {
      # Map proper dates for 'start' and 'end'
      let start_col = (if ($it.start | empty?) != $true {
        ($it.start | str to-datetime)
      } { $nothing });
      let end_col = (if ($it.end | empty?) != $true {
        ($it.end | str to-datetime)
      } { $nothing });

      $it | insert start $start_col | insert end $end_col
    } | select id status command path group label start end
    ### To make status accessible, as it can be either a string or row...
    ### Flatten creates two tables with differing status headers:
    | flatten status
    ### Rename merges them with the same column headers again:
    | rename id status command path group label start end
    ### Now status will always be string
)

Then I can do:

$ spu | where status == "Success"
$ spu | compact start | where start < 6hr | sort-by command
$ spu | sort-by -r end | first 5 | select id status path command

Only issue is nu's type constraints are a bit stiff so any nothing values will panic on comparison. Nushell has a ton of commands for managing tables though. From what I understand, it's main data stream is JSON-RPC, so hypothetically it wouldn't take a ton of work to create a plugin or something to integrate with it and fully utilize streams. Would be a lot for a specific toolset though, and I'm fine with using the alias.

@Nukesor
Copy link
Owner

Nukesor commented Sep 1, 2022

Hey everyone :)

I started working on this feature and already implemented the first steps.
I would really like to get some feedback on my current syntax design from you, as I currently don't have a lot of use-cases for this feature.

@codekoala @d33tah @kjhoerr @stranger-danger-zamu

I'll close this issue and continue the discussion on a new ticket for better observability:
#350

@Nukesor Nukesor closed this as completed Sep 1, 2022
@Nukesor Nukesor mentioned this issue Sep 4, 2022
@Nukesor Nukesor mentioned this issue Sep 30, 2022
@Nukesor
Copy link
Owner

Nukesor commented Nov 21, 2022

The v3.0.0 release candidate is now published

It would be awesome if you could give it a spin and check if there're any bugs or issues :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
f: Help wanted t: Feature A new feature that needs implementation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants