Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Salt exits with status 0 when it fails to render a state #7993

Closed
DemiMarie opened this issue Jan 18, 2023 · 12 comments · Fixed by QubesOS/qubes-salt#2
Closed

Salt exits with status 0 when it fails to render a state #7993

DemiMarie opened this issue Jan 18, 2023 · 12 comments · Fixed by QubesOS/qubes-salt#2
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: mgmt diagnosed Technical diagnosis has been performed (see issue comments). P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. r4.1-fc37-stable r4.2-host-stable r4.2-vm-fc37-stable T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@DemiMarie
Copy link

How to file a helpful issue

Qubes OS release

R4.1

Brief summary

If Salt fails to render a state (because of duplicate keys for example), the exit code of qubesctl is 0.

Steps to reproduce

Create a state that has duplicate keys in a YAML map.

Expected behavior

Non-zero exit code

Actual behavior

Zero exit code

@DemiMarie DemiMarie added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. C: mgmt P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels Jan 18, 2023
@andrewdavidwong andrewdavidwong added the needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. label Jan 18, 2023
@andrewdavidwong andrewdavidwong added this to the Release 4.1 updates milestone Jan 18, 2023
@lkubb
Copy link

lkubb commented Jun 21, 2023

FYI: I just submitted a fix upstream for this issue, it was merged and will be in 3006.2: saltstack/salt#64515

@andrewdavidwong andrewdavidwong added diagnosed Technical diagnosis has been performed (see issue comments). waiting for upstream This issue is waiting for something from an upstream project to arrive in Qubes. Remove when closed. and removed needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels Jun 21, 2023
@DemiMarie
Copy link
Author

Thanks @lkubb! Qubes should probably backport this patch.

@andrewdavidwong andrewdavidwong added pr submitted A pull request has been submitted for this issue. and removed waiting for upstream This issue is waiting for something from an upstream project to arrive in Qubes. Remove when closed. labels Jun 23, 2023
@qubesos-bot
Copy link

Automated announcement from builder-github

The component salt (including package salt) has been pushed to the r4.2 testing repository for the Fedora template.
To test this update, please install it with the following command:

sudo dnf update --enablerepo=qubes-vm-r4.2-current-testing

Changes included in this update

@DemiMarie DemiMarie added the backport pending On closed issues, indicates fix released for newer version will be backported to older version. label Jun 25, 2023
@DemiMarie
Copy link
Author

Adding “backport pending” because R4.1 is still affected.

@marmarek
Copy link
Member

The patch causes regression: saltstack/salt#64575. So, no backporting until it's fixed (and if not fixed soon, I'll need to revert the patch itself).

@lkubb
Copy link

lkubb commented Jun 29, 2023

@marmarek Without having much to go on, I have to assume it's caused by https://github.com/saltstack/salt/blob/c463c94b8d29ba4ebfeb45c36b7d7b6653e9711f/salt/client/ssh/__init__.py#L555-L559 and the remote returning something like {"local": {"retcode": null}} for whatever reason. Otherwise, the retcode that's passed through when calling a wfunc should always be an int.

Would you be able to patch the line and see if that fixes the stacktrace?

diff --git a/salt/client/ssh/__init__.py b/salt/client/ssh/__init__.py
index 0d9ac9509b..88449b4c6e 100644
--- a/salt/client/ssh/__init__.py
+++ b/salt/client/ssh/__init__.py
@@ -554,7 +554,7 @@ class SSH(MultiprocessingStateMixin):
                 ret["ret"] = data["local"]
                 try:
                     # Ensure a reported local retcode is kept
-                    retcode = data["local"]["retcode"]
+                    retcode = int(data["local"]["retcode"])
                 except (KeyError, TypeError):
                     pass
             else:

Either way it would be very helpful to see the final output, so if the above does not work, you could just force the retcode to be an int where the stacktrace happens.

diff --git a/salt/client/ssh/__init__.py b/salt/client/ssh/__init__.py
index 0d9ac9509b..e04d9c5234 100644
--- a/salt/client/ssh/__init__.py
+++ b/salt/client/ssh/__init__.py
@@ -807,7 +807,7 @@ class SSH(MultiprocessingStateMixin):
         final_exit = 0
         for ret, retcode in self.handle_ssh():
             host = next(iter(ret))
-            final_exit = max(final_exit, retcode)
+            final_exit = max(final_exit, int(retcode))

             self.cache_job(jid, host, ret[host], fun)
             ret = self.key_deploy(host, ret)

I have a Qubes dev setup as well, so if it's something I can reproduce there without much hassle, I'm happy to do the above myself.

@marmarek
Copy link
Member

Unfortunately, I do not have reliable reproducer. I noticed it on one test run (given number of test runs since applying the patch, the reproduction rate is about 10%, could be worse).
The int() wont work, since int(None) is TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'...

I have to assume it's caused by https://github.com/saltstack/salt/blob/c463c94b8d29ba4ebfeb45c36b7d7b6653e9711f/salt/client/ssh/__init__.py#L555-L559

Yes, I think so.

and the remote returning something like {"local": {"retcode": null}} for whatever reason. Otherwise, the retcode that's passed through when calling a wfunc should always be an int.

Unfortunately, I don't have returned data in any log, so cannot say for sure. But likely something like this.

@marmarek marmarek reopened this Jun 29, 2023
@lkubb
Copy link

lkubb commented Jun 29, 2023

You're right, it should be or 0 of course. I will submit a PR upstream that should fix the stacktrace in those cases, but the exact cause is a mystery to me currently.

@marmarek
Copy link
Member

I'm not sure if or 0 is the best approach - it will pretend everything is okay when you don't get exit code. Maybe better treat it as a failure instead? (until real reason for this failure is found)

@lkubb
Copy link

lkubb commented Jun 29, 2023

I agree, 0 would have been a quick fix for testing only. That's very impractical in CI ofc.

I actually reworked the salt-ssh error handling a lot since submitting this PR to fix other issues, so my picture of the current code is a bit hazy, but forcing it to 1 in those cases should be fine.

Edit: In the meantime, I'm quite certain I found the source of the None retcodes. The corresponding fix is included in the regression fix PR as well.

@andrewdavidwong andrewdavidwong added needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. and removed diagnosed Technical diagnosis has been performed (see issue comments). pr submitted A pull request has been submitted for this issue. backport pending On closed issues, indicates fix released for newer version will be backported to older version. labels Jun 29, 2023
@qubesos-bot
Copy link

Automated announcement from builder-github

The component salt (including package salt) has been pushed to the r4.2 testing repository for the Fedora template.
To test this update, please install it with the following command:

sudo dnf update --enablerepo=qubes-vm-r4.2-current-testing

Changes included in this update

@andrewdavidwong andrewdavidwong added diagnosed Technical diagnosis has been performed (see issue comments). and removed needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels Jul 4, 2023
@qubesos-bot
Copy link

Automated announcement from builder-github

The component salt (including package salt) has been pushed to the r4.2 stable repository for the Fedora template.
To install this update, please use the standard update command:

sudo dnf update

Changes included in this update

@andrewdavidwong andrewdavidwong added the affects-4.1 This issue affects Qubes OS 4.1. label Aug 8, 2023
marmarek added a commit to marmarek/qubes-salt that referenced this issue Nov 29, 2023
Drop patches included upstream already.
Fix for saltstack/salt#64575 was just merged upstream, but isn't
included in 3006.4 yet, so keep it here.

QubesOS/qubes-issues#7993
marmarek added a commit to marmarek/qubes-salt that referenced this issue Nov 29, 2023
Drop patches included upstream already.
Fix for saltstack/salt#64575 was just merged upstream, but isn't
included in 3006.4 yet, so keep it here.

QubesOS/qubes-issues#7993
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: mgmt diagnosed Technical diagnosis has been performed (see issue comments). P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. r4.1-fc37-stable r4.2-host-stable r4.2-vm-fc37-stable T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants