Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsers for Palmisano's datasets #120

Closed
joeroe opened this issue Jan 20, 2021 · 9 comments
Closed

Parsers for Palmisano's datasets #120

joeroe opened this issue Jan 20, 2021 · 9 comments

Comments

@joeroe
Copy link
Contributor

joeroe commented Jan 20, 2021

This paper includes 920 dates from Northern Mesopotamia and the Levant, 6,000–3,000 BP. At a rough estimate just over half would be new additions to c14bazAAR:

# https://doi.org/10.1371/journal.pone.0244871.s001
palmisano <- readxl::read_xlsx("journal.pone.0244871.s001.xlsx", sheet = "C14 Dataset")
c14baz <- c14bazAAR::get_c14data("all")

sum(!palmisano$LabID %in% c14baz$labnr)
#> [1] 556

The data is available as a supplementary .xlsx file or a CSV in the Zenodo archive. Worth including? #2

@nevrome
Copy link
Member

nevrome commented Jan 20, 2021

I think so, yes. Good spotting! @apalmisano82 is one of our most reliable data contributors, actually. He recently made us aware of yet another valuable paper with data here: https://zenodo.org/record/4322979

We already have one of his datasets in c14bazAAR and I wonder what's the best way to structure these datasets in the future. We would probably have to add a parser function for each paper? How should we call them? We'll probably accumulate even more duplication like this.

Or did you maybe consider to collect all your data across different papers in an open repository, @apalmisano82? Maybe something like @dirkseidensticker maintains for his Africa projects. Could simplify your data management and would be perfect for us 👍

@joeroe
Copy link
Contributor Author

joeroe commented Jan 21, 2021

Two, I think: there's also emedyd, which come to think of it is almost fully superseded by the QSR paper:

# https://zenodo.org/record/4322979
qsr <- readr::read_csv("Palmisano_etal_data_and_code/csv/dates.csv")
emedyd <- c14bazAAR::get_c14data("emedyd")
c14baz <- dplyr::filter(c14bazAAR::get_c14data("all"), sourcedb != "emedyd")

sum(!emedyd$LabID %in% qsr$LabID)
# [1] 62

not_in_qsr <- emedyd$LabID[!emedyd$LabID %in% qsr$LabID]
sum(!not_in_qsr %in% c14baz)
# [1] 62

@joeroe
Copy link
Contributor Author

joeroe commented Apr 5, 2021

#131 adds @apalmisano82's NERD (https://github.com/apalmisano82/NERD), which in the current version is identical to the dataset from this QSR paper.

# QSR: https://zenodo.org/record/4322979
qsr <- readr::read_csv("~/downloads/Palmisano_etal_data_and_code/csv/dates.csv")
data(emedyd, package = "rcarbon")
nerd <- readr::read_csv("https://raw.githubusercontent.com/apalmisano82/NERD/main/nerd.csv")

all(qsr$LabID == nerd$LabID, na.rm = TRUE)
#> [1] TRUE

There is still the small number of dates (62) in emedyd that aren't in NERD, but I suspect most of these will be covered by other databases (unfortunately can't verify right now this because of #134), so maybe it's time to deprecate get_emedyd()?

emedyd[!emedyd$LabID %in% nerd$LabID,]
#>             LabID   CRA Error       Material          Species
#> 87953    SMU-2373 14500   190       charcoal             <NA>
#> 74426    OxA-2142 15160   190       charcoal             <NA>
#> 80387     OxA-869 13260   200       charcoal             <NA>
#> 68840      ODTU-2  9510   100       charcoal             <NA>
#> 18547    DRI-3255  8755   111           <NA>             <NA>
#> 49015   KIA-38007  9065    35           bone             <NA>
#> 74429    OxA-2143 16230   200       charcoal             <NA>
#> 60338     Ly-2809  9835    55          grain         Cerealia
#> 80158    OxA-8407 15860   100       charcoal             <NA>
#> 80772    OxA-9264 15920   100       charcoal             <NA>
#> 80773    OxA-9265 16740   100       charcoal             <NA>
#> 80774    OxA-9266 16750    90       charcoal             <NA>
#> 74687   OxA-22273 15890    90       charcoal   Chenopodiaceae
#> 74688   OxA-22274 15770    80       charcoal            dicot
#> 74689   OxA-22275 16145    75       charcoal            dicot
#> 74693   OxA-22287 15980    60       charcoal   Chenopodiaceae
#> 74694   OxA-22288 16275    60       charcoal   Chenopodiaceae
#> 74695   OxA-22289 16300    65       charcoal            dicot
#> 74696   OxA-22290 16200    65       charcoal   Chenopodiaceae
#> 85140      Q-3072  9840   120           bone             <NA>
#> 85141      Q-3073 10620   125           bone             <NA>
#> 85142      Q-3074 12200   140           bone             <NA>
#> 74061   OxA-20552 15750    75       charcoal             <NA>
#> 57072    Ly-11622 16560    70       charcoal             <NA>
#> 86741    RT-15076  8080    90           <NA>             <NA>
#> 86688     RT-1246 15550   130       charcoal             <NA>
#> 78009    OxA-5177 15460   160       charcoal             <NA>
#> 78010    OxA-5178 16420   180       charcoal             <NA>
#> 78011    OxA-5179 16440   160       charcoal             <NA>
#> 84441    Pta-2158 14130   160       charcoal             <NA>
#> 84442    Pta-2159 13390   120       charcoal             <NA>
#> 43517      I-7031 15460   200           <NA>             <NA>
#> 84489    Pta-3403 16100   150       eggshell Struthio camelus
#> 84507    Pta-3702 15800   160       eggshell Struthio camelus
#> 86673    RT-1072N 16200   170           <NA>             <NA>
#> 97082      TO-987 11170   100           bone          Gazella
#> 97083      TO-989 13110   130           bone             <NA>
#> 97084      TO-991 14850   160           bone             <NA>
#> 60333     Ly-2805  9705    60          seeds             <NA>
#> 60334     Ly-2806  9690    60          seeds             <NA>
#> 60335     Ly-2807  9705    55          seeds             <NA>
#> 60337     Ly-2808  9685    55          seeds             <NA>
#> 60411     Ly-2860  9185    55 organic matter             <NA>
#> 67764   NUT-22023  7670    45       charcoal             <NA>
#> 67765   NUT-22024  7730    80       charcoal             <NA>
#> 67766   NUT-22106  8660   100       charcoal             <NA>
#> 67767   NUT-22109  8390    50       charcoal             <NA>
#> 60273     Ly-2756  9235    45       charcoal             <NA>
#> 61182     Ly-3465  9220    45          seeds             <NA>
#> 61183     Ly-3466  9020    45       charcoal             <NA>
#> 61184     Ly-3467  9170    40       charcoal             <NA>
#> 61181     Ly-3464  9445    45           seed             <NA>
#> 10190  Beta-57898  9010   100       sediment             <NA>
#> 76323    OxA-2835 15190   130       charcoal             <NA>
#> 76326    OxA-2838 15050   160       charcoal             <NA>
#> 76329    OxA-2841 15730   130       charcoal             <NA>
#> 76353    OxA-2870 15450   130       charcoal             <NA>
#> 108821    Wk-7005 14052    94       charcoal             <NA>
#> 78063     OxA-524 15520   200       charcoal             <NA>
#> 78073     OxA-525 16010   200       charcoal             <NA>
#> 61644     Ly-3911 11970    60       charcoal             <NA>
#> 61645     Ly-3912 11860    60       charcoal             <NA>
#>                        SiteName Country Longitude Latitude Region
#> 87953      Arabi I, Wadi Feiran      EG   33.4990  28.7800      1
#> 74426        Azariq 13, W Negev      IL   34.4167  30.9500      1
#> 80387                  Azraq 17      JO   35.0105  29.5269      1
#> 68840                    Cayonu      TR   39.7264  38.2164      2
#> 18547                 Ghuwayr 1      JO   35.5061  30.6231      1
#> 49015              Gobekli Tepe      TR   38.9225  37.2231      2
#> 74429              Hamifgash IV      IL   34.5833  31.1833      1
#> 60338             Jerf el Ahmar      SY   38.2083  36.3917      2
#> 80158           Karain Magarasi      TR   30.5708  37.0776      3
#> 80772           Karain Magarasi      TR   30.5708  37.0778      3
#> 80773           Karain Magarasi      TR   30.5708  37.0778      3
#> 80774           Karain Magarasi      TR   30.5708  37.0778      3
#> 74687               Kharaneh IV      JO   36.4542  31.7237      1
#> 74688               Kharaneh IV      JO   36.4542  31.7237      1
#> 74689               Kharaneh IV      JO   36.4542  31.7237      1
#> 74693               Kharaneh IV      JO   36.4542  31.7237      1
#> 74694               Kharaneh IV      JO   36.4542  31.7237      1
#> 74695               Kharaneh IV      JO   36.4542  31.7237      1
#> 74696               Kharaneh IV      JO   36.4542  31.7237      1
#> 85140               Kharaneh IV      JO   36.4500  31.7300      1
#> 85141               Kharaneh IV      JO   36.4500  31.7300      1
#> 85142               Kharaneh IV      JO   36.4500  31.7300      1
#> 74061     Moghr El Ahwal Cave 3      LB   35.8824  34.2846      1
#> 57072                  Mureybet      SY   38.0906  36.0683      2
#> 86741             Nahal Issaron      IL   35.0300  29.9000      1
#> 86688                  Ohalo II      IL   35.5700  32.7138      1
#> 78009          Okuzini Magarasi      TR   30.5760  37.0890      3
#> 78010          Okuzini Magarasi      TR   30.5760  37.0890      3
#> 78011          Okuzini Magarasi      TR   30.5760  37.0890      3
#> 84441           Qadesh Barnea 8      EG   34.4220  30.6480      1
#> 84442           Qadesh Barnea 8      EG   34.4220  30.6480      1
#> 43517              Rakefet Cave      IL   35.0725  32.6547      1
#> 84489                Shunera 16      IL   34.6000  30.9500      1
#> 84507                Shunera 16      IL   34.6000  30.9500      1
#> 86673                Shunera 16      IL   34.6000  30.9500      1
#> 97082           Tabaqat al-Buma      JO   35.7100  32.5300      1
#> 97083           Tabaqat al-Buma      JO   35.7100  32.5300      1
#> 97084           Tabaqat al-Buma      JO   35.7100  32.5300      1
#> 60333               Tell 'Abr 3      SY   38.0864  36.6819      2
#> 60334               Tell 'Abr 3      SY   38.0864  36.6819      2
#> 60335               Tell 'Abr 3      SY   38.0864  36.6819      2
#> 60337               Tell 'Abr 3      SY   38.0864  36.6819      2
#> 60411         Tell Ain el-Kerkh      SY   36.4657  35.8196      2
#> 67764         Tell Ain el-Kerkh      SY   36.4657  35.8196      2
#> 67765         Tell Ain el-Kerkh      SY   36.4657  35.8196      2
#> 67766         Tell Ain el-Kerkh      SY   36.4657  35.8196      2
#> 67767         Tell Ain el-Kerkh      SY   36.4657  35.8196      2
#> 60273                Tell Aswad      SY   36.5500  33.4042      1
#> 61182                Tell Aswad      SY   36.5500  33.4042      1
#> 61183                Tell Aswad      SY   36.5500  33.4042      1
#> 61184                Tell Aswad      SY   36.5500  33.4042      1
#> 61181    Tell Dja'de el-Mughara      SY   38.1833  36.3833      2
#> 10190  Tor al-Tareeq (WHS 1065)      JO   35.9200  30.8700      1
#> 76323           Urkanar-Rub IIa      PS   35.4300  32.0600      1
#> 76326           Urkanar-Rub IIa      PS   35.4300  32.0600      1
#> 76329           Urkanar-Rub IIa      PS   35.4300  32.0600      1
#> 76353         Wadi Fazael 10/11      PS   35.4330  32.0330      1
#> 108821            Wadi Hisban 2      JO   35.7000  31.8200      1
#> 78063              Wadi Jilat 6      JO   36.4640  31.5220      1
#> 78073              Wadi Jilat 6      JO   36.4640  31.5220      1
#> 61644                    Zaquma      JO   35.6816  32.1867      1
#> 61645                    Zaquma      JO   35.6816  32.1867      1

Created on 2021-04-05 by the reprex package (v1.0.0)

@joeroe joeroe changed the title Parser for Palmisano et al. 2021 Parsers for Palmisano's datasets Apr 5, 2021
@joeroe
Copy link
Contributor Author

joeroe commented Apr 6, 2021

Following on from the above, all but 13 of the 62 lab IDs from emedyd that are not in NERD are already covered by other databases:

diff <- emedyd[!emedyd$LabID %in% nerd$LabID,]
everything <- c14bazAAR::get_all_dates()
everything <- everything[everything$sourcedb != "emedyd",]

# Lab IDs from emedyd that aren't in NERD or any other database
diff[!diff$LabID %in% everything$labnr,]
#>           LabID   CRA Error Material        Species               SiteName
#> 68840    ODTU-2  9510   100 charcoal           <NA>                 Cayonu
#> 74687 OxA-22273 15890    90 charcoal Chenopodiaceae            Kharaneh IV
#> 74688 OxA-22274 15770    80 charcoal          dicot            Kharaneh IV
#> 74689 OxA-22275 16145    75 charcoal          dicot            Kharaneh IV
#> 74693 OxA-22287 15980    60 charcoal Chenopodiaceae            Kharaneh IV
#> 74694 OxA-22288 16275    60 charcoal Chenopodiaceae            Kharaneh IV
#> 74695 OxA-22289 16300    65 charcoal          dicot            Kharaneh IV
#> 74696 OxA-22290 16200    65 charcoal Chenopodiaceae            Kharaneh IV
#> 74061 OxA-20552 15750    75 charcoal           <NA>  Moghr El Ahwal Cave 3
#> 57072  Ly-11622 16560    70 charcoal           <NA>               Mureybet
#> 61181   Ly-3464  9445    45     seed           <NA> Tell Dja'de el-Mughara
#> 61644   Ly-3911 11970    60 charcoal           <NA>                 Zaquma
#> 61645   Ly-3912 11860    60 charcoal           <NA>                 Zaquma

And of these:

  • The OxA-* dates from KHIV are in IntChron, so should be retrieved when I finally get around do doing a PR for IntChron parser #115
  • The Ly-3* dates are actually in NERD, just recoded as Lyon-3*
  • ODTU-2 is in NERD as ODTÜ-2

That just leaves Ly-11622 as the only truly missing one. Presumably it was omitted from NERD because it is outside their date range, and indeed it is an obvious outlier for PPNA Mureybet.

@nevrome
Copy link
Member

nevrome commented Apr 25, 2021

Thank you very much for doing the research, @joeroe!!

I generally think removing get_emedyd is a good idea - less to maintain. What do you think, @dirkseidensticker? We may have to consider that we focused on the unit dataset and not so much individual date so far with our decentralized approach. But I think in this case NERD is designed specifically as a superset of previous datasets. So it might be fair to deprecate the old parser.

I would replace get_emedyd with a message to switch to get_nerd.

@apalmisano82
Copy link

apalmisano82 commented Apr 25, 2021 via email

@nevrome
Copy link
Member

nevrome commented Apr 25, 2021

Alright! #136 implements the change.

@nevrome
Copy link
Member

nevrome commented May 8, 2021

Ok - I consider this sufficiently solved now. Thanks to all of you!

@nevrome nevrome closed this as completed May 8, 2021
@apalmisano82
Copy link

apalmisano82 commented May 8, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants