Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bda - getter function #127

Closed
wants to merge 11 commits into from
Closed

bda - getter function #127

wants to merge 11 commits into from

Conversation

zoometh
Copy link
Contributor

@zoometh zoometh commented Mar 23, 2021

first step to recover the 'bda' C14 database

added values/complete the files:

  • url_references.csv
  • variable_reference.csv
  • material_thesaurus.csv
  • country_thesaurus.csv

added 'bda' to get_c14data()

@dirkseidensticker
Copy link
Contributor

@zoometh what a great starting point. I do not know how far you are to write the actual parser, but maybe you want to consider to have a look a the parser for the euroevol database. This database is also based on multiple individual files that need to be joined.

If you need any assistance, I am there as well. Many thanks for your contribution already 👍

@zoometh
Copy link
Contributor Author

zoometh commented Mar 23, 2021

@dirkseidensticker Yes I've started to write the get_bda() function inspired by the get_euroevol() function. I guess the get_bda() function will be ready within a week. If I have some issues, I'll let you know.

@zoometh
Copy link
Contributor Author

zoometh commented Mar 28, 2021

@dirkseidensticker I've finally created the get_bda function. The joins between the 3 unzipped .xlsx files (Dates, Sites, Occupations) works but I was unable to check it with the c14bazAAR package since the url_references.csv file appears to be removed (temporally) from the package, maybe because of my previous commit.. This is the first time I used branches, and my skills with GitHub are still basics. Whatever, I can still help out.

@dirkseidensticker
Copy link
Contributor

@zoometh we made some changes within #128 and the latest v2 release. We removed the url table which one had to update on the remote. Could you pull the changes and add your url's to the new db_info_table.csv table? get_db_url("[the database name]") refers now to this table.

Removing the odd url_references.csv reference was something we wanted to do for some time, sorry that it intervened with your PR.

@zoometh
Copy link
Contributor Author

zoometh commented Mar 29, 2021

I've just added a line in db_info_table.csv to register the bda database, here


unzip(temp, files="BDA-Table_Dates.xlsx", exdir=td, overwrite=TRUE)
c14dates <- readxl::read_excel(paste0(td,"/BDA-Table_Dates.xlsx"))
unzip(temp, files="BDA-Table_Occupations.xlsx", exdir=td, overwrite=TRUE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add utils::unzip(...

c14dates <- readxl::read_excel(paste0(td,"/BDA-Table_Dates.xlsx"))
unzip(temp, files="BDA-Table_Occupations.xlsx", exdir=td, overwrite=TRUE)
c14occupa <- readxl::read_excel(paste0(td,"/BDA-Table_Occupations.xlsx"))
unzip(temp, files="BDA-Table_Sites.xlsx", exdir=td, overwrite=TRUE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add utils::unzip(...

c14occupa <- readxl::read_excel(paste0(td,"/BDA-Table_Occupations.xlsx"))
unzip(temp, files="BDA-Table_Sites.xlsx", exdir=td, overwrite=TRUE)
c14sites <- readxl::read_excel(paste0(td,"/BDA-Table_Sites.xlsx"))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add

names(c14dates) <- gsub("\u00E9", "e", names(c14dates))
names(c14occupa) <- gsub("\u00E9", "e", names(c14occupa))
names(c14sites) <- gsub("\u00E9", "e", names(c14sites))

in order to replace non-ascii characters

dplyr::left_join(c14sites, by = c("id_site_associé" = "id_sites")) %>%
dplyr::left_join(c14occupa, by = c("id_occupation_liée" = "id_occupations_2019")) %>%
dplyr::transmute(
method = .data[["Méthode"]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change "Méthode" into "Methode"

site = .data[["Nom_site"]],
sitetype = .data[["Nature_site"]],
feature = .data[["num_couche"]],
period = .data[["période"]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change "période" into ""periode"

feature = .data[["num_couche"]],
period = .data[["période"]],
culture = .data[["culture"]],
material = .data[["Matériel_daté"]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change "Matériel_daté" into `"Matèriel_date"

period = .data[["période"]],
culture = .data[["culture"]],
material = .data[["Matériel_daté"]],
species = .data[["Matériel_daté_précision"]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change "Matériel_daté_précision" into "Materiel_date_precision"

culture = .data[["culture"]],
material = .data[["Matériel_daté"]],
species = .data[["Matériel_daté_précision"]],
region = .data[["Région"]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change "Région"into "Region"

lat = .data[["Latitude"]],
lon = .data[["Longitude"]],
shortref = .data[["Ref_biblio"]],
comment = .data[["Fiabilité.x"]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change "Fiabilité.x" into "Fiabilite.x"

c14sites <- readxl::read_excel(paste0(td,"/BDA-Table_Sites.xlsx"))

bda <- c14dates %>%
dplyr::left_join(c14sites, by = c("id_site_associé" = "id_sites")) %>%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change "id_site_associé" into "id_site_associe"


bda <- c14dates %>%
dplyr::left_join(c14sites, by = c("id_site_associé" = "id_sites")) %>%
dplyr::left_join(c14occupa, by = c("id_occupation_liée" = "id_occupations_2019")) %>%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change "id_occupation_liée" into "id_occupation_liee"

@@ -21,3 +21,4 @@ emedyd,2017,1,https://discovery.ucl.ac.uk/id/eprint/1570274/1/robertsetal17.zip
katsianis,2020-08-20,1,https://rdr.ucl.ac.uk/ndownloader/files/23166314
rapanui,2020-08-21,1,https://github.com/clipo/rapanui-radiocarbon/archive/master.zip
mesorad,2020-09-01,1,https://github.com/eehh-stanford/price2020/raw/master/MesoRAD-v.1.1_FINAL_no_locations.xlsx
bda,2021-03-23,1,https://api.nakala.fr/data/10.34847/nkl.dde9fnm8/189e04d917ffd68352a389006f357b58efda855e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line should be removed, as it is solved with #130

Copy link
Contributor

@dirkseidensticker dirkseidensticker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dear @zoometh , please excuse the long waiting time. I checked you parser, which looks (nearly) perfect! Congratulations and many thanks for your efforts from @nevrome and myself.

While running the checks, I noticed that you kept the french "é" in the column header. While this does not trigger an error, it flags a warning. Could you replace those characters?

Also, I noticed that you PR #130 was a bit redundant. Just keep in mind that when you add the url to the reference table, you would need to perform a "Clean and Rebuild" (in RStudio within the "More" menue within the "Build" tab. That triggers that the helper indices, like db_info_table are build again. Maybe we should add this to the README? @nevrome what do you think?

@zoometh
Copy link
Contributor Author

zoometh commented Apr 1, 2021 via email

@dirkseidensticker
Copy link
Contributor

... Just keep in mind that when you add the url to the reference table, you would need to perform a "Clean and Rebuild" (in RStudio within the "More" menue within the "Build" tab. That triggers that the helper indices, like db_info_table are build again. Maybe we should add this to the README? @nevrome what do you think?

I missed this one yesterday: you need to run data-raw/data_prep.R to integrate the url added to the reference table (see no8 in Adding database getter functions).

@nevrome
Copy link
Member

nevrome commented Apr 3, 2021

Yes - it's maybe a bit tricky that no5, 6 and 7 don't do anything without 8, but I didn't want to be redundant

@zoometh zoometh closed this Apr 4, 2021
@zoometh zoometh reopened this Apr 4, 2021
Copy link
Contributor Author

@zoometh zoometh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's all fine. What will be the next step @dirkseidensticker ?

@nevrome nevrome mentioned this pull request Apr 4, 2021
Merged
@nevrome nevrome closed this Apr 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants