-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration with {targets} #64
Comments
ProposalOn a second look at the
Questions
P.S.: I am currently just starting to learn these packages but am currently stuck at richfitz/redux#54. |
I suspect that the details of cloud orchestration will vary by user and their needs - are the workers in the cloud while the orchestrating process is on a desktop? Is that also in the cloud? Is the cloud resource some generally-available single large instance or are you using more transient compute like ecs? The second and third questions are easiest so I'll answer them first:
We don't use the cloud much here, but one of our patterns is probably of interest:
I imagine this is something not dissimilar to how someone might want to use these things if you're doing work in the cloud; stand up your compute then work with it. Importantly from rrq's point of view the worker pool can change (shrink, grow, be replaced) at any point, and workers can be told to turn off once they're idle etc. However, at the point that you submit any work to the queue, you must have a working connection to Redis, even if no workers. If you're doing this in the cloud with a controlling process that is local to you, then the other factor that you need to consider is how to secure the communication channel. I've done this in the distant past (2014-2015) with a ssh tunnel and that worked well. This was where many of the ideas came from in the first place but that project is well in the past now and I don't remember the details. The other way we use this is we have a group of docker containers that form an application - we bring up the workers and the redis server at the start of the application life cycle and they live until the entire system is torn down. |
I would ideally like to allow for both, but it may be the case that only the latter is possible. I am not sure yet.
I'm thinking more transient compute, mainly Batch to start, which sits on top of ECS.
I see. That sounds very much like
Exactly.
That's where I am stumped. When you worked with an ssh tunnel, do you remember if you used a cluster or something on the cloud? Between the encapsulation of Batch and my company's infosec wall, I haven't been able to
Haven't much of a the chance to use Docker much, but that sounds promising. How easy is it in practice to make Docker instances talk to each other? Picturing one with the Redis server and the other with the workers. |
For docker, the usual model is one process for one container. It's extremely straightforward to have different docker containers talk to each other (this is really the point of docker, in fact!) or to expose things to the host in a single-node setting. I imagine that a sensible AWS setup would look like
Importantly, you're not connecting to the running jobs, the running jobs are connecting to the redis server |
That setup makes sense. I have read more about AWS and Docker since we last spoke ("AWS in Action" and "Docker Deep Dive"), and it seems like a custom security group could allow the required traffic from cloud instance to cloud instance. But for the SSH tunnel from the local machine to the Redis instance, it seems like I would need to need to expose the user's public IP address in an AWS security group. Would that hard-coded public IP create vulnerabilities if AWS stores it? Would it be more secure to run that traffic through a local container with its own temporary public IP address? If so, would it make sense to run the Redis server in that local container instead of a persistent and potentially expensive cloud instance? Or would that not be worth it because of the increased attacked surface (one tunnel per worker over the public internet)? Do you have any recommendations for resources that would help me learn more about the relevant networking and security fundamentals? I feel like it would help to improve my understanding of IP/TCP, the OSI model, overlays, firewalls, public key cryptography, TLS encryption, and other infrastructure that would help the local machine securely tunnel into the cloud. By the way, what are your plans for CRAN? |
From @mschubert at mschubert/clustermq#208 (comment)
In terms of running tunnels over the public internet, I would feel better if this public IP were a temporary one-time address and the AWS API requests that delivered it were encrypted. But it's entirely possible that I just need to understand more about internet security. |
@wlandau In |
Interesting. Would be great to follow up at mschubert/clustermq#290. |
Hi @richfitz, A lot has changed on my end since we last spoke on this thread. The I know my comments have spurred development in |
Cool, that looks like a really nice package -- a very different model to rrq and I hope that's easier to work around with targets/crew than rrq would have been. In particular, relaxing the requirement for there to be a persistent Redis server will probably make things much easier than it would have been with rrq |
Thanks Rich! Yes, I had actually planned to start short-lived Redis server instances for Also, I am super happy with the blazing fast alternative to heartbeating that @shikokuchuo suggested at wlandau/crew#31 (comment). It's such slick magic that I was considering recommending it for |
The
targets
package currently usesclustermq
andfuture
to send tasks to workers running on traditional clusters. As a next step fortargets
, I aim to support workers running on the cloud (AWS Batch, Fargate, Google Cloud Run, Kubernetes, etc.) just like Airflow, Metaflow, Nextflow, and Prefect. A task queue would be an excellent layer betweentargets
and cloud platforms.Before I learned about
rrq
, I startedcrew
to extend https://www.tidyverse.org/blog/2019/09/callr-task-q/ to other types of workers. There are a couplecallr
-based queues, and thefuture
-based queue seems to makefuture.batchtools
workloads a bit more efficient. That's about as far as I have pursuedcrew
up to this point. Interprocess communication and heartbeating seem like huge challenges given how much more isolated AWS Batch jobs are than jobs on a traditional cluster.So I am wondering if I can use
rrq
fortargets
. Can it support workers on the cloud? There are mentions of AWS in the docs, particularly about the Redis server, and I would like to learn more about how the pieces fit together for a use case like mine.The text was updated successfully, but these errors were encountered: