Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSH tunnel support for S3Store #882

Merged
merged 19 commits into from
Nov 21, 2023

Conversation

mjwen
Copy link
Member

@mjwen mjwen commented Nov 7, 2023

Summary

This PR adds the support to use an ssh tunnel for S3Store, which can be useful for self-hosted MinIO. This follows the same pattern as the ssh tunnel support for MongoStore.

  • Created a new file ssh_tunnel.py and moved the SSHTunnel class and related stuff there. It's a bit weird to put it in the monogostore file.
  • Added a new optional parameter local_port for the SSHTunnel class. This is necessary for MinIO, by default which uses the 9000 port. The original implementation that uses an automatically chosen free port will not work in general.
  • Added ssh_tunnel support for the S3Store.
  • Added a tutorial doc to use SSHTunnel.

Copy link

codecov bot commented Nov 7, 2023

Codecov Report

Attention: 16 lines in your changes are missing coverage. Please review.

Comparison is base (e31a8ce) 87.76% compared to head (6cae273) 88.16%.
Report is 10 commits behind head on main.

Files Patch % Lines
src/maggma/stores/ssh_tunnel.py 71.42% 12 Missing ⚠️
src/maggma/stores/aws.py 85.18% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #882      +/-   ##
==========================================
+ Coverage   87.76%   88.16%   +0.39%     
==========================================
  Files          44       45       +1     
  Lines        3614     3633      +19     
==========================================
+ Hits         3172     3203      +31     
+ Misses        442      430      -12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


def _find_free_port(address="0.0.0.0"):
s = socket()
s.bind((address, 0)) # Bind to a free port provided by the host.

Check warning

Code scanning / CodeQL

Binding a socket to all network interfaces Medium

'0.0.0.0' binds a socket to all interfaces.
@rkingsbury
Copy link
Collaborator

Thanks @mjwen , this looks like a nice enhancement. After a quick review, can I ask for two changes?

  1. Can you please expand the test coverage of SSHTunnel? E.g. add tests that attributes get set correctly, etc.? I'd really like to see test coverage for the patch close to 100%; and in any case I don't want the project test coverage to decrease.
  2. In S3Store.connect, please make sure to implement force_reset in a way that is consistent with other stores (see Store.connect: fix force_reset kwarg implementations #879 ; comments welcome on how that works!)

Finally (and this is a bigger ask), if you'd be able to add a page to the docs describing how to connect via SSH, I'm sure many users would appreciate it. But I understand if that's not possible at this time.

@mjwen
Copy link
Member Author

mjwen commented Nov 7, 2023

@rkingsbury thanks for the quick feedback!

Great that the functionality looks alright to you. I will add the enhancements (tests and docs) you mentioned above and ping you once ready.

tests/stores/test_aws.py Fixed Show fixed Hide fixed
tests/stores/test_aws.py Fixed Show fixed Hide fixed
def local_port_available(local_port): # noqa: PT004
"""Fixture to determine if a local port is available to test the SSH tunnel."""
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())

Check failure

Code scanning / CodeQL

Accepting unknown SSH host keys when using Paramiko High test

Setting missing host key policy to AutoAddPolicy may be unsafe.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the tests work if you were to set client.set_missing_host_key_policy(RejectPolicy) ? If so, please do that. If not, it's OK b/c this is just a test file so I'm not so concerned about security issues like this.

See here for more details: https://github.com/materialsproject/maggma/security/code-scanning/138

@mjwen
Copy link
Member Author

mjwen commented Nov 19, 2023

@rkingsbury I've addressed your comments. Sorry for the delay.

A couple of stuff to consider before we think about merging it in?

  1. Seems the test GH action is not set up to enable database server, e.g. localhost:22 cannot be access. As a results, many of the tests (old and the new ones I've added) involving database are skipped. Then the coverage tests cannot cover them as well. We should somehow enable this? I've no idea how to do it in GH action. @jmmshn might have some experience?

  2. I did not find anything on automatic docs build and publish in the docs directory. Is it located somewhere else? The current docs seem old -- some of the md pages are now included, e.g. the docs on FileStore you've added.

@jmmshn
Copy link
Contributor

jmmshn commented Nov 21, 2023

Is there some way to spoof the ssh_tunnel connection?
I think paramiko should not actually be called during any of the CI just like how we are using moto to mock all the bots connections.

So I think if the basic tests are designed only in a way to check the arguments to paramiko then we are OK.

For more complex tests, you can consider mocking the entry functions to the ssh_tunnel and have that turn some kind of a flag, then you just have to check that connecting to a DB properly hits that flag?
Basically, I think our CI should only check that settings are properly passed to SSH tunnel API but we don't need to actually check the SSH tunnels.

@rkingsbury
Copy link
Collaborator

Thanks @mjwen !

1. Seems the test GH action is not set up to enable database server, e.g. localhost:22 cannot be access. As a results, many of the tests (old and the new ones I've added) involving database are skipped. Then the coverage tests cannot cover them as well. We should somehow enable this? I've no idea how to do it in GH action. @jmmshn might have some experience?

I'll defer to Jimmy on this one, and I think the approach he suggested sounds ok (just make sure we are passing the right arguments, not testing the tunnel itself)

2. I did not find anything on automatic docs build and publish in the [docs](https://github.com/materialsproject/maggma/tree/main/docs) directory. Is it located somewhere else? The [current docs](https://materialsproject.github.io/maggma/) seem old -- some of the md pages are now included, e.g. the docs on [FileStore](https://github.com/materialsproject/maggma/blob/main/docs/getting_started/using_file_store.md) you've added.

I'm not sure I follow - as far as I can tell, the current docs you linked are up to date. I think you just need to add modify a .md file in docs/reference (or wherever you think).

I guess the docs build process isn't document, but we use mkdocs. So if you want to preview your changes, you can run mkdocs serve and then it should serve your documentation in the browser, and refresh in real time as you edit. Does that help?

@mjwen
Copy link
Member Author

mjwen commented Nov 21, 2023

@rkingsbury My bad regarding the docs! I was looking for some configure file in the docs directory, but now I see mkdocs.yml is in the project root directory. Updated.

@mjwen
Copy link
Member Author

mjwen commented Nov 21, 2023

@jmmshn Thanks for the suggestions! Sounds like a good compromise for the CI tests. I've added some fake tunnel to test the functionality s3store.

@mjwen
Copy link
Member Author

mjwen commented Nov 21, 2023

@rkingsbury Ready for final review.

Note, the CI won't test some of the attributes assignments of the SSHTunnel because of the way we are setting it up.

Also, the CI tests seem fail. I've taken a quick look at it -- seems related to pyproject.toml and setup.py. A rerun might work?

Copy link
Collaborator

@rkingsbury rkingsbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mjwen , looks good. I just have a few minor typo edits to suggest.

Apologies for the CI issues; apparently there was a setuptools update or something that broke the tests. They should be working now if you pull the latest.

tests/stores/test_aws.py Outdated Show resolved Hide resolved
docs/getting_started/using_ssh_tunnel.md Outdated Show resolved Hide resolved
docs/getting_started/using_ssh_tunnel.md Outdated Show resolved Hide resolved
def local_port_available(local_port): # noqa: PT004
"""Fixture to determine if a local port is available to test the SSH tunnel."""
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the tests work if you were to set client.set_missing_host_key_policy(RejectPolicy) ? If so, please do that. If not, it's OK b/c this is just a test file so I'm not so concerned about security issues like this.

See here for more details: https://github.com/materialsproject/maggma/security/code-scanning/138

@rkingsbury rkingsbury added enhancement docs documentation labels Nov 21, 2023
@rkingsbury rkingsbury self-assigned this Nov 21, 2023
@mjwen
Copy link
Member Author

mjwen commented Nov 21, 2023

I've checked client.set_missing_host_key_policy a bit, and it seems client.set_missing_host_key_policy(RejectPolicy()) is the default and it won't work with the tests, because we need to load the host key info for the tests.

I believe we need to leave it like that.

@rkingsbury rkingsbury merged commit f792dd8 into materialsproject:main Nov 21, 2023
8 of 10 checks passed
@rkingsbury
Copy link
Collaborator

Thanks @mjwen !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants