-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AWS S3 support for rds storr #72
base: master
Are you sure you want to change the base?
Conversation
Hi Ben - thanks so much for this! There are people who have asked for this specifically (for example: #11 (comment)). In general I am totally up for PRs, but I use I set up storr to be somewhat extensible - see here for a Redis storr driver. So the options as I see it are:
|
I see you already made aws.s3 a Suggests - thanks. A fourth option is to make a package |
Agree. aws.s3 is meant to be low-level. Glad to see it being used, though! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments through - most of which are style at the moment. Testing is the big thing that needs doing. Do do this we need:
- a function in a
helper-xxx
file likeskip_if_no_aws_credentials
- full spec test of the new driver - see https://github.com/richfitz/redux/blob/542f10d56fcf87505ea640f420542e297de81391/tests/testthat/test-storr.R#L3-L8 and https://github.com/richfitz/storr/blob/master/tests/testthat/test-auto.R for examples
- Tests of any s3 internals
- Ideally, set up aws credentials (a free tier account ideally) for use on travis
These will be needed/desirable regardless of where this ends up (comments in main discussion)
@@ -22,6 +22,7 @@ Suggests: | |||
knitr, | |||
mockr, | |||
rbenchmark, | |||
testthat (>= 1.0.0) | |||
testthat (>= 1.0.0), | |||
aws.s3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put dependencies alphabetically?
} | ||
|
||
##' @export | ||
##' @rdname storr_rds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be storr_rds_s3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this is why the travis build is failing)
##' @param default_namespace Default namespace (see | ||
##' \code{\link{storr}}). | ||
##' @export | ||
##' @examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section needs redoing
##' exists (or no mangledness if creating a new storr). | ||
##' | ||
##' @title rds object cache driver | ||
##' @param path Path for the store. \code{tempdir()} is a good choice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider @inheritParams storr_rds
}, | ||
|
||
type = function() { | ||
"rds_s3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the type should simply be s3?
## However it will throw an error if the file we are trying to delete looks like a directory. | ||
## S3 has no actual notion of directory, we just fake it using "/". As a result, it's possible to get into a muddle. | ||
## to play it save, line below can be uncommented to force it to delete just the path given, but throw a warning, if it does look like a directory | ||
## can also change to if_dir = "del_recursive" to delete the whole directory with a warning. May never actually show up as an issue, this is just a note. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're the only modifying the bucket is this a real concern? I don't know enough about S3! Can you reflow to 80 characters please?
invisible(exists) | ||
} | ||
|
||
s3_writeLines <- function(text, path, bucket){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space between (
and {
|
||
s3_delete_file <- function(bucket, path, if_dir = c("stop", "del_only_key", "del_recursive")){ | ||
files <- aws.s3::get_bucket_df(bucket = bucket, prefix = path, max = Inf)[["Key"]] | ||
if(length(files) > 1){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General style cleanup through here please
@@ -12,7 +12,7 @@ Simple object cacher for R. `storr` acts as a very simple key-value store (supp | |||
* Fetch from an external source (e.g. website) if a key is not found locally | |||
* Pluggable storage backends - currently | |||
- environment (memory) | |||
- rds (disk) | |||
- rds (disk, AWS S3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it makes sense to think of this as an S3 driver rather than an rds driver
@@ -43,3 +43,15 @@ remotes::install_github("richfitz/storr@develop", upgrade = FALSE) | |||
|
|||
* [storr](https://richfitz.github.io/storr/articles/storr.html) `vignette("storr")` outlines basic use and core implementation details. | |||
* [external](https://richfitz.github.io/storr/articles/external.html) `vignette("external")` shows how to use storr to cache external resources such as files, web resources, etc, using the `storr_external` object. | |||
|
|||
## RDS on AWS S3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an entirely new vignette would be the best solution for documenting this
Thanks so much for the replies and feedback. I'd be happy going down any of the routes that you mentioned - I particularly like your suggestion of storr.remote. While I won't have time to add other remote drivers in the near future, I think I could abstract out all of the aws-specific read/write/list functions, so that they can be easily reproduced for other back ends if others wish to contribute. Assuming that you think a separate package is the way forward, I will setup a new repo and copy this stuff over, taking into account your above recommendations. This is mostly an "evenings" project right now, so I will link you to the new repo once I have created it. I would gratefully take you up on the offer for help in getting this onto CRAN since this would be my first submission :) Thanks again for the great feedback! Ben |
Hello again, Just to let you know, I have created the storr.remote repo here. On the master branch, I have included the stuff in this PR & addressed most of your above comments, although not all yet. I am also working on a branch where I have tried to abstract all s3 file operations into their own R6 class that can be fed into the R6_driver_rds class (I have actually created a version of that called R6_driver_rds_remote). It's not quite working yet, but my hope is that it will be possible to just replicate the s3 file operations class for another backend, and everything else will just plug in. Still no testing or real documentation (I will get there, I promise), and I'm sure lots of other things. Hopefully I can find time to continue in the next few days. Thanks, |
Awesome work! I will look at this early next week 🎉 |
see also: #61 |
Next place to look is: #74 |
OK, I got a bit more dragged into this today than I was expecting to so most of the pieces are in place. This I think is the process from here:
|
Hi @ben-gready - have you had a chance to look at any of this yet? |
Hi there,
I'm not sure whether you're accepting pull requests, and what the proper etiquette is for asking, but I recently forked your (great) repo, and I think I have a useful contribution. I needed a server-agnostic back end that I could access from anywhere, and since I'm using AWS S3 for other stuff, I decided to implement a driver using that. I completely ripped off your rds driver file & added S3 equivalents for reads, writes listing files etc. It seems to work well, although I have not written tests at this point. I've added a little documentation (which I know is currently lacking, and in the wrong place).
Before I spend any more time on cleaning things up, I was wondering whether you're interested in accepting such a pull request in the first place, and if so, whether you'd mind taking a look at what i have done and letting me know where you'd like to see changes, improvements, etc. I have tried to code everything using base functions where possible, so as to not increase package dependencies.
Thanks for your time,
Ben