Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actions with large outputs fail with OSError: [Errno 7] Argument list too long #1043

Closed
kot0dama opened this issue Oct 12, 2023 · 7 comments
Closed
Assignees
Labels
docs Improvements or additions to documentation small item

Comments

@kot0dama
Copy link

kot0dama commented Oct 12, 2023

We use an internal ops charm allowing the operator to run an action that would display the contents of a file.
If the output of the file exceeds a certain limit the action fails with:

Traceback (most recent call last):                                                                                                                                                                                                                           
  File "/var/lib/juju/agents/unit-test-1/charm/./src/charm.py", line 545, in <module>                                                                                                                                                            
    main(TestCharm)                                                                                                                                                                                                                                
  File "/var/lib/juju/agents/unit-test-1/charm/venv/ops/main.py", line 441, in main                                                                                                                                                              
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-test-1/charm/venv/ops/main.py", line 149, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-test-1/charm/venv/ops/framework.py", line 354, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-test-1/charm/venv/ops/framework.py", line 830, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-test-1/charm/venv/ops/framework.py", line 919, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-test-1/charm/./src/charm.py", line 250, in _on_show_application_logs
    event.set_results({"result": logs})
  File "/var/lib/juju/agents/unit-test-1/charm/venv/ops/charm.py", line 217, in set_results
    self.framework.model._backend.action_set(results)   # pyright: reportPrivateUsage=false
  File "/var/lib/juju/agents/unit-test-1/charm/venv/ops/model.py", line 2859, in action_set
    self._run('action-set', *[f"{k}={v}" for k, v in flat_results.items()])
  File "/var/lib/juju/agents/unit-test-1/charm/venv/ops/model.py", line 2616, in _run
    result = run(args, **kwargs)
  File "/usr/lib/python3.10/subprocess.py", line 503, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.10/subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.10/subprocess.py", line 1863, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: '/var/lib/juju/tools/unit-test-1/action-set'

The limit seems to be per-character around 131060 chars, but this would need confirmation.

As I understand it might be better to limit the output to a reasonable length, this should be handled and not result in an error for the operator. Also, if there is such a limitation, it should probably be documented.

Thank you,
Loïc

@tonyandrewmeyer
Copy link
Contributor

tonyandrewmeyer commented Oct 12, 2023

The limit seems to be per-character around 131060 chars, but this would need confirmation.

It's the OS's limit for command line arguments. For example:

$ grep ARG_MAX /usr/include/linux/limits.h
#define ARG_MAX       131072	/* # bytes of args + environ for exec() */

(This is different than getconf ARG_MAX sometimes for reasons I don't fully understand). The result= takes up a few more of those bytes.

As I understand it might be better to limit the output to a reasonable length, this should be handled and not result in an error for the operator.

I don't think we would want to arbitrarily truncate results. We could raise something more specific, but I believe it's better to have the operation fail if it can't be executed as requested.

Also, if there is such a limitation, it should probably be documented.

+1

For your use case, can you perhaps have the action provide a juju scp command that the user could run that would allow retrieving the file?

@benhoyt benhoyt added docs Improvements or additions to documentation small item labels Oct 12, 2023
@benhoyt
Copy link
Collaborator

benhoyt commented Oct 12, 2023

Yeah, I agree with @tonyandrewmeyer. I think the exception message is relatively clear, but we could document this. I think it'd make sense to document it in the set_results docstring, and perhaps here or here?

@kot0dama
Copy link
Author

kot0dama commented Oct 13, 2023

The limit seems to be per-character around 131060 chars, but this would need confirmation.

It's the OS's limit for command line arguments. For example:

$ grep ARG_MAX /usr/include/linux/limits.h
#define ARG_MAX       131072	/* # bytes of args + environ for exec() */

(This is different than getconf ARG_MAX sometimes for reasons I don't fully understand). The result= takes up a few more of those bytes.

Ah I see! Is there a reason for the result of the juju action to be passed as an argument of an exec() call? Can't we pass on the result of an action through a named pipe or something else that would allow working around this limitation? This might also increase security as we won't be running exec() with random texts.

As I understand it might be better to limit the output to a reasonable length, this should be handled and not result in an error for the operator.

I don't think we would want to arbitrarily truncate results. We could raise something more specific, but I believe it's better to have the operation fail if it can't be executed as requested.

Oh I didn't mean truncate the results arbitrarily, but present to the user running the juju action a clear error.

At the moment, running an action that results in this stacktrace yields some unfriendly error:

$ juju run-action test/leader my-action --wait
test-1:
  UnitId: test/1
  id: "90"
  message: exit status 1
  results:
    ReturnCode: 1
  status: failed
  timing:
    completed: 2023-10-13 08:03:55 +0000 UTC
    enqueued: 2023-10-13 08:03:53 +0000 UTC
    started: 2023-10-13 08:03:53 +0000 UTC

But then maybe this needs filing another bug against juju itself? I think we might need the return code to be specific to a size limit error though for juju command line tool to be able to present a user-friendly message to the operator.

Also, if there is such a limitation, it should probably be documented.

+1

For your use case, can you perhaps have the action provide a juju scp command that the user could run that would allow retrieving the file?

I actually updated the action to tail the file instead of returning it altogether, but that's also an idea, thank you!
Although, the juju action itself wouldn't know that exception happened as it occurs after it has returned the output, so it cannot catch it and suggest an scp command instead.

@jameinel
Copy link
Member

jameinel commented Oct 14, 2023 via email

@jameinel
Copy link
Member

jameinel commented Oct 14, 2023 via email

@benhoyt
Copy link
Collaborator

benhoyt commented Oct 15, 2023

I've opened #1047 to tweak the set_results docstring, and Tony and I have added notes about max size to those linked docs. Per above discussion, I don't think there's anything further to do here, so I'm going to close this. If file transfer is needed, it's best to use another technique like juju scp.

@benhoyt benhoyt closed this as not planned Won't fix, can't repro, duplicate, stale Oct 15, 2023
@kot0dama
Copy link
Author

Thank you for your replies, this is much clearer now.

It would be interesting to add details in the SDK documentation maybe, or somewhere you find it useful, that actions output is stored in the database. This can be important for the charm writers comprehension of the system, and also to avoid storing potentially sensitive data.

Thanks again,
Loïc

benhoyt pushed a commit that referenced this issue Oct 19, 2023
### More extensively document the exceptions that Charmers should be aware of and potentially catching

If we raise an exception when checking an argument's type, and the type hinting covers this, I have not included it in the `Raises` list. I feel that the docs (and a type checker) cover this already, and the `Raises` section should be more focused on exceptions that the Charmer should consider handling.

#908 goes back-and-forth on documenting very common exceptions (like `APIError` and `CommunicationError` in nearly all pebble `Client` methods), and I have as well. ~I'm currently +0 on documenting them, and it seems like it's simpler to add them all in the PR and them remove them than decide to add them after the initial PR, so they are included at the moment. I could easily be convinced that they should be removed and a central doc note added (but I'm not sure where that would be best).~ These have been removed and are centralised in the `Container` and `pebble.Client` docstrings.

Exceptions that are possible but should be rare and indicate a broken system (such as Juju or Pebble returning invalid data or being unresponsive) are not included. It feels like this tips *too* far in the "document everything" direction, and I don't feel like it would be worth Charmers trying to handle these exceptional (no pun intended) circumstances.

Where it was unclear whether it was worth documenting a potential exception - such as not providing any commands in `exec`, which seems more likely to be a coding error than something the Charm should handle, but I could see happening with quite dynamic code - I have generally taken the more defensive approach of including the potential raise.

I wrote a terrible, horrible, no good, very bad charm that causes a large number of these errors, and examined the pebble & Juju code to find information about what might cause issues. However, since I'm still only lightly familiar with Juju/ops, I could have missed some cases.

### Minor tweaks to exception chaining

* If we are raising an exception in an exception handler but the original exception is just an implementation detail, `raise x from None`
* If we are converting an exception to a more specific/localised one and the original exception is relevant, `raise x from e`
* If we are filtering an exception and then re-raising, `raise` or `raise e` (preferably the former, but leaving the latter if already in place)

### Miscellanous

One minor type hinting improvement: if an exception is always raised, use `typing.NoReturn` as the return type.

Drive-bys:
 * Remove one `type: ignore` that is no longer required.
 * Add `ops.` to a couple of examples in the docstrings that were missed in #921

Fixes #908, #1043
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Improvements or additions to documentation small item
Projects
None yet
Development

No branches or pull requests

4 participants