-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cde package submission #284
Comments
An overnight change to the CDE data has meant that two of the tests fail now - I have just committed a fixed version to the master branch; taken out the fragile tests and also have taken the opportunity to fix some issues highlighted by gp. |
Thanks for your submission @robbriers - Im your handling editor. Thanks for the update on tests and such |
Editor checks:
Editor commentsThanks for your submission @robbriers ! Here's the output from goodpractice. You've already run good practice, and the below only has a note about test coverage. ── GP cde ───────
It is good practice to
✖ write unit tests for all functions, and all package code in general. 89% of code lines are covered by test
cases.
R/get_objectives.r:112:NA
R/get_objectives.r:113:NA
R/get_objectives.r:128:NA
R/get_pa.r:56:NA
R/get_rnag.r:106:NA
... and 28 more lines Seeking reviewers now 🕐 Could you please add the rOpenSci under review badge to your README?
Reviewers:
|
Still looking for the 2nd reviewer ... |
both reviewers assigned |
Great, thanks for the update |
@sckott I'm anticipating some difficulty meeting the April 1st due date and would request moving this ahead one week to April 08. Possible? |
@boshek Okay, date changed |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing:
|
Thanks for the detailed comments - I'll review and start working through. |
@robbriers Please do reach out if anything is unclear. Happy to back and forth on anything as needed. |
@boshek Just a query re license and data. The code is licensed as GPL 3 in the DESCRIPTION file, so I was under the impression that this did not need an explicit LICENSE file to be included? |
I don't think it is absolutely required but I it is a) a generally accepted practice to include a LICENSE file in the repo and b) provides a path for a non R user (not to mention GitHub) to look at the license. That is, someone wanting to find out how this software is licensed only needs to come to repo and doesn't need to know that one needs to look into the DESCRIPTION file.
It definitely does. My understanding is that regardless of how the data in packaged if you include it you need to identify how it is licensed. I would also encourage you to expose this data. As a prospective user I found it confusing that I couldn't find out more on this data. That's why I suggested not only exposing the data but also providing a data-raw folder. An example I've done is as follows. Here is how I create and provide licensing info for an internal data file https://github.com/ropensci/tidyhydat/tree/master/data-raw/HYDAT_internal_data Then here I document the data for context for folks: Does that help? |
Thanks, I have added an explicit licence file. With the data I would prefer to keep the internal df in the sysdata.rda file, but have added a copy of the data with appropriate licence details to a data-raw folder. |
👋 @limnoliver reminder that your review is due in a few days |
@robbriers sounds good. However if that dataframe isn't exposed I think it is worth outlining the workflow on selecting sites. Searching is one option, which you've outlined, but I do see utility in having the site list available for users. |
@boshek if I document and expose the data in /data I seem to run into an issue with |
What if instead of |
no luck. will keep working on it but progressing with data saved internally for now. |
@sckott apologies for the delay on my review, it will be done today! |
@robbriers This is a workaround to
Just add that bit of code to your |
@boshek I just found this as well! Feels a bit hacky but if that is what it takes. Thanks! |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 6 (though this was my first time reviewing for ropensci, so a lot of this was getting up to speed on the review process. Thanks for all the great documentation!) Review CommentsCongrats on providing a useful package that allows for easier and reproducible ways to access and use important water data. Most of my comments are for making it easier for someone who is new to the data get up-to-speed more quickly. I don't think this will take too much effort, and could be as simple as linking to some documentation that likely exists on the WFD website. A few comments: README
I think the documentation could be improved in regards to describing the data source the package is accessing. In the README, it would be useful to have an extra sentence or two that describes the scope of the data - e.g., data collected by X for the purposes of Y within the U.K. etc. I had never heard of the WFD or the Environment Agency, so I had to take one more step to understand where the data are coming from. FunctionalityTo test the functions, I followed the vignettes and created my own examples, and I show those below with comments on individual functions.
|
@limnoliver thanks for the review - I will go through the comments and address! |
thanks for your review @limnoliver ! |
Ok, took a bit longer than planned, but here we go... Documentation This has been updated on the pkgdown pages to point to "Get Started". Vignette and use of this package implies acceptance of these licence conditions to: and use of the data accessed by this package implies acceptance of these licence conditions Revised and expanded to cover data contained within package as well. Consider using remotes for install_github() Revised to use remotes Consider using full argument names in the getting started materials. All examples now use full argument names. Code Kept as they are as I think there is limited scope for confusion with other packages. Consider returning columns as all lower case with underscores instead of decimals All column names now lower case with underscores. This file: https://github.com/robbriers/cde/blob/master/R/sysdata.rda. Internal data in the package should be included in the data folder. Thanks for your help with this: internal data now included in the data folder and documented in terms of structure and content. Raw csv also available on Github repo in data-raw folder. No LICENSE file. The open data license would license the data used but not the code in the package LICENSE file added with appropriate details; documentation also updated to highlight licencing of data in several places. This chunk of code: https://github.com/robbriers/cde/blob/d0e109a2794a2653b15410926a52d2a1f15469bf/R/utils.r#L37-L51
This worries me: https://github.com/robbriers/cde/blob/d0e109a2794a2653b15410926a52d2a1f15469bf/R/utils.r#L94. Could you instead catch the specific error using tryCatch()? tryCatch() now implemented to deal with the warning generated, see https://github.com/robbriers/cde/blob/9fb0be3ac62ce7aa5f9836c7c7c797632bd1e2eb/R/utils.r#L47-L50. A helper function called something like cde_data.table or something like that would be good as you are replicating lots of code here. (https://github.com/robbriers/cde/blob/d0e109a2794a2653b15410926a52d2a1f15469bf/R/utils.r#L94-L114) Code has been broken up into a number of smaller helper functions to reduce replication, see https://github.com/robbriers/cde/blob/9fb0be3ac62ce7aa5f9836c7c7c797632bd1e2eb/R/utils.r#L59-L139 I think you should consider using data.table's progress bar. When downloads are happening, it is ambiguous to the user if the process is working. Profiling the code has highlighted that it is the download and unzip parts of the process where most of the latency is. Consider developing print methods (and a class) for cde objects that only print out 10 rows (or just import tibble and use those print methods) Custom Consider letting your examples run so that they are tested with CI All examples now run. |
and now @limnoliver ... README I think the documentation could be improved in regards to describing the data source the package is accessing. In the README, it would be useful to have an extra sentence or two that describes the scope of the data - e.g., data collected by X for the purposes of Y within the U.K. etc. I had never heard of the WFD or the Environment Agency, so I had to take one more step to understand where the data are coming from. A couple of initial paragraphs have been added to the vignette (and README) giving more detail on the context of the WFD and the Environment Agency, plus links to relevant parts of the EA website for more detail. Functionality lakes <- cde::search_names('Lake', 'name') It might be useful to note in the search_names documentation that the search is case-sensitive. Also, in the Value section of the help file, it would be helpful to the user to see definitions of the columns that are returned.
I agree with the other reviewer that it would be nice to have column names standardized and cleaned up to exclude periods. All column names now standardised as lower case and with underscores. Here as well, it would be useful to have more definitions in the Values section of the help file for get_status. If you don't feel like it's within the scope of the package to define the columns that are returned, a link to metadata would be useful. bolam_oc <- get_status(lakes$OC[lakes$name %in% 'Bolam Lake'], 'OC', level = 'Chemical') See response above - links to full details now provided. I had a hard time understanding "level", and in the get_status help file it says to go to the vignette for more info. I would include a direct link to the vignette section of relevance, or spell out the options in the help file. I would really prefer to use the term 'element' but the EA column is named classification_level, so kept this to reduce potential confusion. There is now a list of the different values of 'level' in the help file, plus a link to the page on the CDE website that explains the different levels of elements used within the classification. plot_status(lakes$OC[lakes$name %in% 'Bolam Lake'], 'OC') It seems like a common workflow would be to get the status data, and then to plot. To do that easily using the package, the user would use get_status and plot_status. I was expecting plot_status to use the output of get_status, but it instead makes a call to get_status within the function and and downloads the data again, so if you want to both see the data in a data frame and plot it, you have to download it twice.
Again, it would be helpful to have more in-depth description of the returned table, or at least a link to a description for how these classifications are made or descriptions of the columns. (same can be said for get_objectives, get_measures, and get_pa) See response above - a link to the full reference table is given in each help file. bolam_oc_meas <- get_measures(lakes$OC[lakes$name %in% 'Bolam Lake'], 'OC') No measures data specified - empty dataframe returnedGood warning in the vignette that this often returns no results! There is only limited measures data on the website, however, there is an issue with the measures data on the CDE website since the last quarterly update - this is being actively worked on by the EA and should be fixed soon. This has been documented in the vignette. |
thanks for your response @robbriers to the reviewers. @boshek and @limnoliver - are you happy with the maintainer's responses? any further comments? |
@sckott I say ship it! @robbriers really nice work on the package and the revisions. Really great effort. |
@limnoliver looks great! |
thanks reviewers! Having a final look ... |
Approved! Thanks again for your submission @robbriers ! And thanks for your reviews @boshek and @limnoliver 👌 To-dos:
We've started putting together a bookdown with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved. The repo is at https://github.com/ropensci/dev_guide Are you interested in doing a blog post for our blog https://ropensci.org/blog/ ? either a short-form intro to it (https://ropensci.org/technotes/) or long-form post with more narrative about its development (https://ropensci.org/blog/). If so, we'll have our community manager @stefaniebutland get in touch with you on that |
Great thanks! I have transferred it over and updated CI etc. Thanks also to @boshek and @limnoliver for such helpful and constructive reviews - it has been a great process and I have learned a lot en route! |
great! glad it's been useful. you have admin access now |
and let Stef now if you want to do a blog post or not |
@robbriers Here are the blog post guidelines https://github.com/ropensci/roweb2#contributing-a-blog-post Happy to answer any questions. |
Submitting Author: Name (@robbriers)
Repository: https://github.com/robbriers/cde
Version submitted: 0.3.0
Editor: @sckott
Reviewer 1: @boshek
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
The package facilitates the download, collation and analysis of Water Framework Directive reporting data from the Environment Agency (England) Catchment Data Explorer website.
Who is the target audience and what are scientific applications of this package?
Any user who wishes to query, download and analyse water quality-related data available from the Catchment Data Explorer website.
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
No
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Technical checks
Confirm each of the following by checking the box. This package:
Publication options
JOSS Options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.MEE Options
Code of conduct
The text was updated successfully, but these errors were encountered: