Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] salt runner doesn't store results in job cache #61729

Open
aaronknister opened this issue Feb 26, 2022 · 4 comments
Open

[BUG] salt runner doesn't store results in job cache #61729

aaronknister opened this issue Feb 26, 2022 · 4 comments
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE
Milestone

Comments

@aaronknister
Copy link

aaronknister commented Feb 26, 2022

Description
Salt runners don't appear to store results in the job cache. This worked on older versions of salt (~2015 time frame at least).

Querying the result of a salt runner job shows an Error and it's missing expected attributes such as Function:

20220226004548285855:
    ----------
    Error:
        Cannot contact returner or no job with this jid
    Result:
        ----------
        02f3db77ee3b_master:
            ----------
            return:
                ----------
                _stamp:
                    2022-02-26T00:45:48.861710
                fun:
                    runner.jobs.list_jobs
                fun_args:
                jid:
                    20220226004548285855
                return:
                    ----------
                success:
                    True
                user:
                    root
    StartTime:
        2022, Feb 26 00:45:48.285855

Setup

Observed on the salt 3004rc1 container and the tip of master (run inside a container).

Steps to Reproduce the behavior

salt-run -l info jobs.list_jobs 2>&1 | grep 'Runner completed' | awk '{ print $5 }' | xargs --no-run-if-empty -n1 salt-run jobs.print_job

Expected behavior
Output should not contain an Error field and should contain Function attribute (and others) as shown below:

20220225164405632844:
    ----------
    Arguments:
    Function:
        runner.jobs.print_job
    Minions:
    Result:
        ----------
       node231-node1_master:
            ----------
            return:
                ----------
                _stamp:
                    2022-02-26T00:44:05.987420
                fun:
                    runner.jobs.print_job
                jid:
                    20220225164405632844
                return:
                    ----------
                    1:
                        ----------
                        Arguments:
                        Function:
                            unknown-function
                        Result:
                            ----------
                        StartTime:
                        Target:
                            unknown-target
                        Target-type:
                        User:
                            root
                success:
                    True
                user:
                    root
    StartTime:
        2022, Feb 25 16:44:05.632844
    Target:
        node-231-node1_master
    Target-type:
    User:
        root

Screenshots
Console output included above

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:
          Salt: 3003rc1+1813.gf65f953

Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: Not Installed
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.0.3
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.3
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: Not Installed
      pycrypto: 2.6.1
  pycryptodome: 3.14.1
        pygit2: Not Installed
        Python: 3.6.8 (default, Nov 16 2020, 16:55:22)
  python-gnupg: Not Installed
        PyYAML: 6.0
         PyZMQ: 22.3.0
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: centos 7 Core
        locale: ANSI_X3.4-1968
       machine: x86_64
       release: 3.10.0-514.21.1.el7.x86_64
        system: Linux
       version: CentOS Linux 7 Core

Additional context
None at this time

@aaronknister aaronknister added Bug broken, incorrect, or confusing behavior needs-triage labels Feb 26, 2022
@waynew
Copy link
Contributor

waynew commented Mar 1, 2022

FWIW, this definitely shows up in the jobs cache:

❯ salt-run jobs.list_jobs -linfo
[INFO    ] Loading Saltfile from '/home/wayne/salt/Saltfile'
[INFO    ] Runner completed: 20220301002739385748

~/salt via 🐍 v3.10.2 (salt) on ☁️  (us-west-2)
❯ ag 20220301002739385748
var/cache/salt/master/jobs/24/9565d30ed9e273e7b2b5073a84ba050543d1f94df52f7fb4ef9be3ea1db183/jid

But it looks like it's missing any kind of other file with that name or contents:

❯ find var -name \*20220301002739385748\*

that produces nothing as the output.

@waynew waynew added Confirmed Salt engineer has confirmed bug/feature - often including a MCVE and removed needs-triage labels Mar 1, 2022
@waynew waynew added this to the Approved milestone Mar 1, 2022
aaronknister added a commit to aaronknister/salt that referenced this issue Mar 18, 2022
this should fix saltstack#61729

for salt commands run via remote minions, the returner save_load
function is effectively called twice-- once via salt.master.ClearFuncs.publish
when the job starts and then again via store_job called from
salt.master.AESFuncs.return when each minion returns.

store_job when called via the minion return path ("_return") will
call save_load in addition to the returner function when it seems
it should only call the returner.

there's an additional problem which is that for salt runners the
job data is saved via a call to store_job in which case both
the returner *and* save_load should be called at once

this change attempts to give store_job the ability to distinguish
between situations in which it should call save_load or just the
returner by checking for the presence of the "tgt" attribute
@amalaguti
Copy link

Seems to be fixed in 3006.1

$ salt-run jobs.list_jobs search_function='runner.state.*'
20230601095010546742:
    ----------
    Arguments:
    Function:
        runner.state.orch
    StartTime:
        2023, Jun 01 09:50:10.546742
    Target:
        vesselsim-master
    Target-type:
        list
    User:
        master_vesselsim-master
[INFO    ] Runner completed: 20230601145914155122

$ salt-run jobs.list_jobs search_function='runner.state.orch'
20230601095010546742:
    ----------
    Arguments:
    Function:
        runner.state.orch
    StartTime:
        2023, Jun 01 09:50:10.546742
...

@amalaguti
Copy link

Hey some updates,
I tried the commands after re-enabling the raas plugin and the command jobs.list_jobs still don’t work, it does not seem to see the JID for the orchestration was recently executed. But this command with the search_function argument, seems to work,, BUT…. looks like it’s retrieving JIDS up to Aug 07. So at some time it stopped working (I’ve updated raas to 8.13 some time ago, I guess around that time).

salt-run jobs.list_jobs search_function='runner.state.orch'

...
20230807123208967742:
    ----------
    Arguments:
    Function:
        runner.state.orch
    StartTime:
        2023, Aug 07 12:32:08.967742
    Target:
        vesselsim-master
    Target-type:
        list
    User:
        master_vesselsim-master

and one more thing that is different now with raas plugin enabled, this one without the raas plugin did not retrieve the orchestration jid but it was not throwing this error, now it throws this error


# salt-run jobs.last_run -l quiet
20230807142608827745:
    ----------
    Error:
        Cannot contact returner or no job with this jid
    Result:
        ----------
    StartTime:
        2023, Aug 07 14:26:08.827745

So for sure, yes functons are broken, and yes the raas plugin makes a difference

I can see the JIDs in the SSE Activity page. I also tried running the orchestration as a job from SSE, same results

@max-arnold
Copy link
Contributor

I believe this particular issue was fixed in #65023 and is included in Salt 3006.3 and later.

@aaronknister Could you please test it again using the latest Salt version?

@dwoz dwoz unassigned waynew Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants