Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a remotecat command for printing remote file of running calcjobs #4861

Merged
merged 3 commits into from
Dec 14, 2022

Conversation

zhubonan
Copy link
Contributor

This is a command that I found useful - perhaps it is good to commit it to the core.
It can be useful for monitoring the calcjob while they are running.

I have not yet able to add tests for it - the existing test framework for calcjob commands does not include remote contents (the nodes are extracted from an archive). Not sure how to make the test work, perhaps the only way is to mock the getfile method of the RemoteData node?

@zhubonan zhubonan requested a review from ltalirz April 17, 2021 15:04
@ramirezfranciscof
Copy link
Member

ramirezfranciscof commented Apr 19, 2021

Hey @zhubonan , thanks for the contribution! We are currently doing/discussing some changes in how we are handling the connections for external resources (in particular, ssh connections), so we need to consider how this feature fits into all that. Given that it is a command line feature, it might still be useful to have something like it but perhaps the underlying workings might need to be different.

I'm pinging @sphuber and @giovannipizzi if they can take a look. Anyways, I'll try to bring the subject up on the next AiiDA meeting (it is this wednesday at 15:00 CEST, these are open for participation so let me know if you want to be present).

@zhubonan
Copy link
Contributor Author

I see. This is just a small feature that I think might be useful. It does not touch the ssh interface though (only RemoteData.getfile is used).

Feel free to come back to this later.

@mbercx
Copy link
Member

mbercx commented Apr 19, 2021

+1 I'd definitely use this feature. I'm wondering if it can also be extended to monitor the calculation. I typically do this with the tail command, after doing verdi calcjob gotocomputer:

$ tail -f aiida.out 

It would be nice to just be able to do e.g.:

verdi calcjob remotecat --monitor <PK>

@ltalirz
Copy link
Member

ltalirz commented Apr 19, 2021

Hi @zhubonan , thanks for this contribution!

have not yet able to add tests for it - the existing test framework for calcjob commands does not include remote contents (the nodes are extracted from an archive).

You could add a test to the "system tests", some of which run calculations on a slurm docker container on github actions.
See e.g.

def test_leak_ssh_calcjob():
"""Test whether running a CalcJob over SSH leaks memory.
Note: This relies on the 'slurm-ssh' computer being set up.
"""
code = orm.Code(
input_plugin_name='arithmetic.add', remote_computer_exec=[orm.load_computer('slurm-ssh'), '/bin/bash']
)
inputs = {'x': orm.Int(1), 'y': orm.Int(2), 'code': code}
run_finished_ok(ArithmeticAddCalculation, **inputs)
# check that no reference to the process is left in memory
# some delay is necessary in order to allow for all callbacks to finish
process_instances = get_instances(processes.Process, delay=0.2)
assert not process_instances, f'Memory leak: process instances remain in memory: {process_instances}'

I might not be the right person to review this PR but we'll discuss it at the meeting

@ramirezfranciscof
Copy link
Member

It does not touch the ssh interface though (only RemoteData.getfile is used).

Right, so this is one of the things. The RemoteData methods open and close the ssh connection with the external resource every time you call them. We are considering changing this so that you need to first open the transport yourself and then pass the open connection to the RemoteData method you want to use (this is slightly more complicated but would allow users to systematically copy files in a more efficient way that may currently be prohibiting for some use cases). This might require some adaptation of this feature; I just want to make sure we are considering all angles before going through since this is related to a part of the code that is being re-designed.

@sphuber
Copy link
Contributor

sphuber commented Apr 19, 2021

I think that in principle it would be ok to add this command even though we are not yet sure of how the transports may change, since if and when we change it, it should not affect the interface and so the user shouldn't be affected.

That being said: in a sense this command already exists in verdi data remote cat. Now I realize that this command provides some shortcuts as in it automatically gets the PK of the remote_folder and it can get the default output name if the plugin specifies it. So there might be an argument to add this anyway, but then i would consider having the implementation be as close to the other as possible, or at least use the implementation under the hood.

@zhubonan
Copy link
Contributor Author

That being said: in a sense this command already exists in verdi data remote cat.
@sphuber I see. I will reflector the code to use the implement that verdi data remote cat uses.

+1 I'd definitely use this feature. I'm wondering if it can also be extended to monitor the calculation. I typically do this with the tail command, after doing verdi calcjob gotocomputer:

@mbercx I have coded this up as well actually based on gotocomputer. The RemoteData interface does not allow tracking a file easily. I will see if I can add it to the PR as well.

@codecov
Copy link

codecov bot commented Apr 19, 2021

Codecov Report

Base: 79.69% // Head: 80.10% // Increases project coverage by +0.42% 🎉

Coverage data is based on head (9082559) compared to base (e5cfd84).
Patch coverage: 64.29% of modified lines in pull request are covered.

❗ Current head 9082559 differs from pull request most recent head 412964f. Consider uploading reports for the commit 412964f to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4861      +/-   ##
==========================================
+ Coverage   79.69%   80.10%   +0.42%     
==========================================
  Files         545      515      -30     
  Lines       39150    36742    -2408     
==========================================
- Hits        31195    29427    -1768     
+ Misses       7955     7315     -640     
Flag Coverage Δ
django 74.58% <64.29%> (?)
sqlalchemy 73.50% <64.29%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiida/cmdline/commands/cmd_calcjob.py 66.50% <60.00%> (-13.37%) ⬇️
aiida/transports/plugins/local.py 81.41% <100.00%> (-0.93%) ⬇️
aiida/transports/plugins/ssh.py 80.14% <100.00%> (+0.51%) ⬆️
aiida/transports/transport.py 66.22% <100.00%> (-17.19%) ⬇️
aiida/engine/exceptions.py 0.00% <0.00%> (-100.00%) ⬇️
aiida/cmdline/utils/ascii_vis.py 0.00% <0.00%> (-59.67%) ⬇️
aiida/cmdline/commands/cmd_setup.py 50.00% <0.00%> (-45.19%) ⬇️
aiida/orm/nodes/data/array/xy.py 16.18% <0.00%> (-43.82%) ⬇️
aiida/cmdline/params/options/commands/setup.py 56.85% <0.00%> (-36.22%) ⬇️
aiida/cmdline/commands/cmd_devel.py 31.58% <0.00%> (-35.74%) ⬇️
... and 713 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@ramirezfranciscof ramirezfranciscof self-assigned this Apr 28, 2021
@zhubonan zhubonan force-pushed the pr-add-remotecat-command branch from 559c147 to 9770b11 Compare June 14, 2021 19:52
@zhubonan
Copy link
Contributor Author

@ltalirz @sphuber I have updated the code and added a test. This PR is ready to be merged now.

@chrisjsewell
Copy link
Member

chrisjsewell commented Jun 16, 2021

Hey thanks @zhubonan, I'm seeing a lot of lines not covered by tests 😬 would it be possible to look into increasing the test coverage for these scenarios?

image

Copy link
Member

@chrisjsewell chrisjsewell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

increase test coverage?

@zhubonan
Copy link
Contributor Author

zhubonan commented Jun 22, 2021

@chrisjsewell I tried to improve the coverages but it turns out additional changes are needed to support the --monitor option for multiple transport plugins.
I have tweaked the gotocomputer_command method to allow additional arguments to be passed to facilitate running tail -f <filename>. However, this command cannot be tested directly as the os.system will never return I think, just like that for the standard gotocomputercommand.

@sphuber sphuber force-pushed the pr-add-remotecat-command branch 2 times, most recently from 412964f to 0132a8b Compare December 13, 2022 16:38
@sphuber
Copy link
Contributor

sphuber commented Dec 13, 2022

@zhubonan I have rebased the PR to the latest main branch and addressed the merge conflicts. I have also added a commit with some minor improvements. It includes a test for the --monitor option so this should increase the coverage (although I am not sure this is even running on PRs anymore 🤔 )

In looking into the code more closely, I have just a question of (ab)using the Transport.gotocomputer_command. Executing arbitrary commands clearly wasn't the intention. It seems we should rather maybe use something like Transport.exec_command_wait except that won't work in the case of tail -f since it is blocking and we don't want to wait for it to finish. Still I am wondering if we should add a Transport.exec_command instead that just works blockingly and pipes output through. The reason I am a bit hesitant to just use your approach is that we are considering to refactor the Transport interface, since this might become necessary for 2FA that is starting to become common-place and the interface mixes file transport and remote command execution. Not sure if this change would make the redesign more complicated.

@zhubonan
Copy link
Contributor Author

@sphuber Thanks for looking into it. If there is a Transport refecrtoring going to happen probably it does not make sense to modify the interface now to add the --monitor functionality.

Perhaps we can just remove the --monior option for now? It is only a nice to have addition I think.

@sphuber
Copy link
Contributor

sphuber commented Dec 14, 2022

Perhaps we can just remove the --monior option for now? It is only a nice to have addition I think.

Works for me. Let me create a backup branch for you on your fork and then I will remove the option and we can merge this.

@sphuber sphuber force-pushed the pr-add-remotecat-command branch 2 times, most recently from e6e3f76 to 294ed36 Compare December 14, 2022 13:57
zhubonan and others added 2 commits December 14, 2022 16:02
This can be usedful for monitoring the calcjobs while they are
running.
Also adds a test for the `--monitor` option.
@sphuber sphuber force-pushed the pr-add-remotecat-command branch from 294ed36 to 6bcc260 Compare December 14, 2022 15:06
@sphuber sphuber force-pushed the pr-add-remotecat-command branch from 6bcc260 to 46bb011 Compare December 14, 2022 15:07
@sphuber sphuber self-requested a review December 14, 2022 15:27
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zhubonan

@sphuber sphuber merged commit 7df4f60 into aiidateam:main Dec 14, 2022
@sphuber
Copy link
Contributor

sphuber commented Dec 14, 2022

@zhubonan I cannot seem to push the branch with the monitor option code to your remote. But I will add the patch here for posterity:

From 02d9fa15a508cc15513f64387adc7171e16d77cb Mon Sep 17 00:00:00 2001
From: Bonan Zhu <zhubonan@outlook.com>
Date: Sat, 17 Apr 2021 15:57:27 +0100
Subject: [PATCH] Added a remotecat command for printing remote file

This can be usedful for monitoring the calcjobs while they are
running.

Various minor improvements

Also adds a test for the `--monitor` option.
---
 aiida/cmdline/commands/cmd_calcjob.py  | 20 +++++++++++++++++++-
 aiida/transports/plugins/local.py      |  4 ++--
 aiida/transports/plugins/ssh.py        |  4 ++--
 aiida/transports/transport.py          | 12 ++++++++----
 tests/cmdline/commands/test_calcjob.py | 11 ++++++++++-
 tests/transports/test_local.py         |  9 +++++++++
 tests/transports/test_ssh.py           |  9 +++++++++
 7 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/aiida/cmdline/commands/cmd_calcjob.py b/aiida/cmdline/commands/cmd_calcjob.py
index 72cec957f..da542942f 100644
--- a/aiida/cmdline/commands/cmd_calcjob.py
+++ b/aiida/cmdline/commands/cmd_calcjob.py
@@ -124,8 +124,9 @@ def calcjob_inputcat(calcjob, path):
 @verdi_calcjob.command('remotecat')
 @arguments.CALCULATION('calcjob', type=CalculationParamType(sub_classes=('aiida.node:process.calculation.calcjob',)))
 @click.argument('path', type=str, required=False)
+@click.option('--monitor', is_flag=True, default=False, help='Monitor the file using `tail -f` instead.')
 @decorators.with_dbenv()
-def calcjob_remotecat(calcjob, path):
+def calcjob_remotecat(calcjob, path, monitor):
     """Show the contents of a file in the remote working directory.
 
     The file to show can be specified using the PATH argument. If PATH is not specified, the default output file path
@@ -135,8 +136,25 @@ def calcjob_remotecat(calcjob, path):
     import sys
     import tempfile
 
+    from aiida.common.exceptions import NotExistent
+
     remote_folder, path = get_remote_and_path(calcjob, path)
 
+    if monitor:
+        try:
+            transport = calcjob.get_transport()
+        except NotExistent as exception:
+            echo.echo_critical(str(exception))
+
+        remote_workdir = calcjob.get_remote_workdir()
+
+        if not remote_workdir:
+            echo.echo_critical('no remote work directory for this calcjob, maybe it has not yet started running?')
+        cmds = f"-c 'tail -f {path}'"
+        command = transport.gotocomputer_command(remote_workdir, cmds)
+        os.system(command)
+        return
+
     with tempfile.NamedTemporaryFile() as tmp_path:
         try:
             remote_folder.getfile(path, tmp_path.name)
diff --git a/aiida/transports/plugins/local.py b/aiida/transports/plugins/local.py
index c41f4f276..cc56a76a2 100644
--- a/aiida/transports/plugins/local.py
+++ b/aiida/transports/plugins/local.py
@@ -817,7 +817,7 @@ class LocalTransport(Transport):
 
         return retval, output_text, stderr_text
 
-    def gotocomputer_command(self, remotedir):
+    def gotocomputer_command(self, remotedir, extra_args=None):
         """
         Return a string to be run using os.system in order to connect
         via the transport to the remote directory.
@@ -829,7 +829,7 @@ class LocalTransport(Transport):
 
         :param str remotedir: the full path of the remote directory
         """
-        connect_string = self._gotocomputer_string(remotedir)
+        connect_string = self._gotocomputer_string(remotedir, extra_args)
         cmd = f'bash -c {connect_string}'
         return cmd
 
diff --git a/aiida/transports/plugins/ssh.py b/aiida/transports/plugins/ssh.py
index f47c9ef19..4849ce766 100644
--- a/aiida/transports/plugins/ssh.py
+++ b/aiida/transports/plugins/ssh.py
@@ -1482,7 +1482,7 @@ class SshTransport(Transport):  # pylint: disable=too-many-public-methods
 
         return (retval, b''.join(stdout_bytes), b''.join(stderr_bytes))
 
-    def gotocomputer_command(self, remotedir):
+    def gotocomputer_command(self, remotedir, extra_args=None):
         """
         Specific gotocomputer string to connect to a given remote computer via
         ssh and directly go to the calculation folder.
@@ -1505,7 +1505,7 @@ class SshTransport(Transport):  # pylint: disable=too-many-public-methods
 
         further_params_str = ' '.join(further_params)
 
-        connect_string = self._gotocomputer_string(remotedir)
+        connect_string = self._gotocomputer_string(remotedir, extra_args)
         cmd = f'ssh -t {self._machine} {further_params_str} {connect_string}'
         return cmd
 
diff --git a/aiida/transports/transport.py b/aiida/transports/transport.py
index 1f3a01c75..0031e5770 100644
--- a/aiida/transports/transport.py
+++ b/aiida/transports/transport.py
@@ -688,7 +688,7 @@ class Transport(abc.ABC):
         """
 
     @abc.abstractmethod
-    def gotocomputer_command(self, remotedir):
+    def gotocomputer_command(self, remotedir, extra_args=None):
         """
         Return a string to be run using os.system in order to connect
         via the transport to the remote directory.
@@ -700,6 +700,7 @@ class Transport(abc.ABC):
         * A reasonable error message is produced if the folder does not exist
 
         :param str remotedir: the full path of the remote directory
+        :param str extra_args: optional extra arguments to be passed to the shell
         """
 
     @abc.abstractmethod
@@ -815,13 +816,16 @@ class Transport(abc.ABC):
     def has_magic(self, string):
         return self._MAGIC_CHECK.search(string) is not None
 
-    def _gotocomputer_string(self, remotedir):
+    def _gotocomputer_string(self, remotedir, extra_args=None):
         """command executed when goto computer."""
         connect_string = (
             """ "if [ -d {escaped_remotedir} ] ;"""
-            """ then cd {escaped_remotedir} ; {bash_command} ; else echo '  ** The directory' ; """
+            """ then cd {escaped_remotedir} ; {bash_command} {extra_args}; else echo '  ** The directory' ; """
             """echo '  ** {remotedir}' ; echo '  ** seems to have been deleted, I logout...' ; fi" """.format(
-                bash_command=self._bash_command_str, escaped_remotedir="'{}'".format(remotedir), remotedir=remotedir
+                bash_command=self._bash_command_str,
+                extra_args=extra_args or '',
+                escaped_remotedir="'{}'".format(remotedir),
+                remotedir=remotedir
             )
         )
 
diff --git a/tests/cmdline/commands/test_calcjob.py b/tests/cmdline/commands/test_calcjob.py
index a763b1275..801761bf6 100644
--- a/tests/cmdline/commands/test_calcjob.py
+++ b/tests/cmdline/commands/test_calcjob.py
@@ -334,9 +334,11 @@ class TestVerdiCalculation:
         assert len(get_result_lines(result)) == 1
         assert get_result_lines(result)[0] == '5'
 
-    def test_calcjob_remotecat(self):
+    def test_calcjob_remotecat(self, monkeypatch):
         """Test the remotecat command that prints the remote file for a given calcjob"""
         # Specifying no filtering options and no explicit calcjobs should exit with non-zero status
+        import os
+
         options = []
         result = self.cli_runner.invoke(command.calcjob_remotecat, options)
         assert result.exception is not None, result.output
@@ -357,3 +359,10 @@ class TestVerdiCalculation:
         options = [str(self.result_job.uuid), 'fileA.txt']
         result = self.cli_runner.invoke(command.calcjob_remotecat, options)
         assert result.stdout == 'test stringA'
+
+        # To test the ``--monitor`` option, mock the ``os.system`` command and simply print the command it receives.
+        with monkeypatch.context() as ctx:
+            ctx.setattr(os, 'system', lambda x: print(x))  # pylint: disable=unnecessary-lambda
+            options = ['--monitor', str(self.result_job.uuid), 'fileA.txt']
+            result = self.cli_runner.invoke(command.calcjob_remotecat, options)
+            assert "bash -l  -c 'tail -f fileA.txt'" in result.stdout
diff --git a/tests/transports/test_local.py b/tests/transports/test_local.py
index fa88e0046..102f0323d 100644
--- a/tests/transports/test_local.py
+++ b/tests/transports/test_local.py
@@ -61,3 +61,12 @@ def test_gotocomputer():
             """echo '  ** /remote_dir/' ; echo '  ** seems to have been deleted, I logout...' ; fi" """
         )
         assert cmd_str == expected_str
+
+        cmd_str = transport.gotocomputer_command('/remote_dir/', "-c 'echo Hello World'")
+
+        expected_str = (
+            """bash -c  "if [ -d '/remote_dir/' ] ;"""
+            """ then cd '/remote_dir/' ; bash -l  -c 'echo Hello World'; else echo '  ** The directory' ; """
+            """echo '  ** /remote_dir/' ; echo '  ** seems to have been deleted, I logout...' ; fi" """
+        )
+        assert cmd_str == expected_str
diff --git a/tests/transports/test_ssh.py b/tests/transports/test_ssh.py
index 8b2043f87..958748723 100644
--- a/tests/transports/test_ssh.py
+++ b/tests/transports/test_ssh.py
@@ -124,6 +124,15 @@ def test_gotocomputer():
         )
         assert cmd_str == expected_str
 
+        cmd_str = transport.gotocomputer_command('/remote_dir/', '-c "echo Hello World"')
+
+        expected_str = (
+            """ssh -t localhost -o ProxyCommand='ssh -W localhost:22 localhost'  "if [ -d '/remote_dir/' ] ;"""
+            """ then cd '/remote_dir/' ; bash  -c "echo Hello World"; else echo '  ** The directory' ; """
+            """echo '  ** /remote_dir/' ; echo '  ** seems to have been deleted, I logout...' ; fi" """
+        )
+        assert cmd_str == expected_str
+
 
 def test_gotocomputer_proxyjump():
     """Test gotocomputer"""
-- 
2.25.1

@zhubonan
Copy link
Contributor Author

Thanks for the patch 😄 . I think I will just delete my branch since the PR is now merged.

@zhubonan zhubonan deleted the pr-add-remotecat-command branch December 27, 2022 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants