Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retain online lookup tables for backwards compatibility? #134

Closed
joeroe opened this issue Apr 5, 2021 · 4 comments
Closed

Retain online lookup tables for backwards compatibility? #134

joeroe opened this issue Apr 5, 2021 · 4 comments

Comments

@joeroe
Copy link
Contributor

joeroe commented Apr 5, 2021

#128 removed the lookup tables (e.g. url_reference.csv) from this repository, which means previous versions of the package are no longer functional:

remotes::install_github("ropensci/c14bazAAR@1.3.0", force = TRUE, quiet = TRUE)
c14bazAAR::get_c14data("14sea")
#> Trying to download all dates from the requested databases...
#>   |                                                          |                                                  |   0%  |                                                          |++++++++++++++++++++++++++++++++++++++++++++++++++|  99%
#> Warning in c14bazAAR::get_c14data("14sea"): There were errors:
#> 
#> 14sea --> https://raw.githubusercontent.com/ropensci/c14bazAAR/master/data-raw/url_reference.csv is not available. No internet connection?
#> 
#> Not all data might have been downloaded accurately!
#> Error in c14bazAAR::get_c14data("14sea"): 
#> 
#> Download failed for all databases.

Created on 2021-04-05 by the reprex package (v1.0.0)

While hopefully most users will upgrade to 2.0.0, this is might be a problem for the reproducibility of analyses using previous versions? It's also not necessarily clear to users who encounter this error that they need to upgrade the package to fix it.

@nevrome
Copy link
Member

nevrome commented Apr 6, 2021

Thanks for bringing this up explicitly 👍 - I take this very seriously.

#128 introduced a breaking change (v.2.0.0) and I think it's the superior solution. The old implementation only existed because of the long maintenance cycles for CRAN packages, which should not be updated too often. Now that we abandoned the CRAN version we enjoy more freedom.

So what about the broken pre-2.0.0 versions? Of course we may compromise the reproducibility of some scripts that rely on a specific c14bazAAR version, but c14bazAAR is already from a conceptional point of view not perfectly suitable for long-term reproducibility. We download the data from the raw source databases which might change or go offline any time. We have no control over that. For radon, for example, we even download a daily dump of the database! No reproducibility here. You should always make your code reproducible in a way that does not rely on any foreign archive or data source, except this data source is explicitly committed to long-term storage.

So for c14bazAAR I suggest to use it to download the data, but then only work with a freeze. Just like I did here:

https://github.com/nevrome/cultrans.bronzeageburials.article2019/blob/e4ff688ffd0bdbc9a2be022260e0eaff61a7e0e3/analysis/code/01_data_preparation_scripts/02_prepare_c14_data.R#L14-L16

An easy fix for c14bazAAR now would be to add the old files again to keep the old versions functional. But at some point we have to deprecate them. How long should this cycle be? I don't want to keep things around without a clear deadline. Unfortunately there is no way to warn the users of old versions: We can not monkey patch code on people's devices.

I suggest to add a well visible warning in the README, but still stick with the hard stop for everything pre-2.0.0. I expect the number of users for c14bazAAR to be fairly small anyway and as the rest of the download API did not change, the handful of users who experience this should be able to update very easily.

@nevrome
Copy link
Member

nevrome commented Apr 6, 2021

In #120 you write that you can not update to the latest c14bazAAR version now. Why that, @joeroe?

@joeroe
Copy link
Contributor Author

joeroe commented Apr 6, 2021

That was just me being in a rush and switching between versions in the same session, sorry. It works fine now!

This is a strange situation because it's more than an ordinary breaking change and more like a "retrospective breaking change" (I don't know if there's a term for that). Of course you're right that relying on a live was always a bad idea from a reproducibility point of view, given the volatility of the underlying databases, and hopefully not many people do that, but this is still a rather radical change even for users using the package to download and cache data.

So would it be worth at least retaining the old versions of url_reference.csv etc. for a short period of time? Perhaps a few months? Just to avoid a sudden shock for people who only updated their installed packages occasionally.

@nevrome
Copy link
Member

nevrome commented Apr 6, 2021

Alright - let's do this then. Doesn't cost much and maybe it helps somebody.

@nevrome nevrome closed this as completed in f43be25 Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants