Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown profiler got from ArgoIndex(index_file='bgc-s').load() #420

Closed
gmaze opened this issue Dec 16, 2024 Discussed in #419 · 5 comments · Fixed by #421
Closed

Unknown profiler got from ArgoIndex(index_file='bgc-s').load() #420

gmaze opened this issue Dec 16, 2024 Discussed in #419 · 5 comments · Fixed by #421
Assignees
Labels
argo-BGC About biogeochemical variables bug Something isn't working

Comments

@gmaze
Copy link
Member

gmaze commented Dec 16, 2024

Discussed in #419

Originally posted by cywhale December 14, 2024

from argopy import DataFetcher, ArgoNVSReferenceTables, ArgoIndex
import argopy
sidx = ArgoIndex(index_file='bgc-s').load() 
dfs = sidx.to_dataframe()
dfs[["profiler_code", "profiler"]]

Got all Unknown in profiler column:

profiler_code | profiler
-- | --
846 | Unknown
846 | Unknown
846 | Unknown
846 | Unknown
846 | Unknown
... | ...
834 | Unknown

I'm not sure if profiler_code and profiler comes from R08 table:

test_profiler = ArgoNVSReferenceTables().tbl('R08')
print(test_profiler)

Got:
altLabel prefLabel
0 890 PROVOR_III - Jumbo, SBE
1 891 PROVOR_III - Jumbo, RBR
2 841 PROVOR float with SBE conductivity sensor
3 838 ARVOR-D deep float with SBE conductivity sensor
4 831 PALACE float
5 889 PROVOR_V - Jumbo, RBR

If it's profiler in this table, then it should have values, not to be Unknown.
I had tried argopy==0.1.17 and 1.0.0, both got the same results. But I'm sure this dataframe have not-unknown values in profiler column maybe several months ago at last time I checked it.
Is there something wrong in my code as I using argopy? Thanks.

@gmaze gmaze added bug Something isn't working argo-BGC About biogeochemical variables labels Dec 16, 2024
@gmaze gmaze self-assigned this Dec 16, 2024
@gmaze
Copy link
Member Author

gmaze commented Dec 16, 2024

Bug origin

The deep cause for this bug is not from argopy:

When loading the reference table 8 from the NVS server, casting profiler code as integers fails because one float does not return an altLabel (the "copex" profiler type).

this can be seen in here: https://vocab.nerc.ac.uk/collection/R08/current/?_profile=nvs&_mediatype=application/ld+json
in the "COPEX" profiler part, there is an unexpected "skos:altLabel": "",

then when argopy create a pandas dataframe from this json, the altLabel column is no longer casted as integers, but rather as object.

which makes the ArgoIndex.to_dataframe to fail to properly map profile profiler_code on profiler

Bug fix

From argopy this can be fixed with more tests on the reference table loading, or on the mapping function

From the Argo Vocabulary server, there is an issue, because although the COPEX profile has no altLabel on the json file, there is on ID reported on https://vocab.nerc.ac.uk/collection/R08/current/ that is 871

@cywhale
Copy link

cywhale commented Dec 16, 2024

then when argopy create a pandas dataframe from this json, the altLabel column is no longer casted as integers, but rather as object.

Actually I found it's type of object, not simply int or str, that's really confused me when I wanted to fix it internally in my app:

    profilerRef = ArgoNVSReferenceTables().tbl('R08')
    profiler_mapping = profilerRef.set_index('altLabel')['prefLabel'].to_dict()

    #...skip...
    dfs = sidx.to_dataframe()

    # replace Unknown profiler with R08 reference table
    dfs['profiler'] = dfs['profiler_code'].astype(str).map(profiler_mapping).fillna('Unknown')

@gmaze
Copy link
Member Author

gmaze commented Dec 16, 2024

it's an object because of the missing field sent by the NVS server

I just raised this on the appropriate repo: OneArgo/ArgoVocabs#144 (comment)

It's coming from a new float recently added (COPEX, added on Nov. 11)

In the mean time, the design in argopy could be made more robust to this kind of errors

@gmaze
Copy link
Member Author

gmaze commented Dec 16, 2024

Currently argopy reads the profiler ID from the skos:altLabel property of the NVS json output.

This fails here because the COPEX has an empty skos:altLabel. Although the COPEX float has an ID reported

A more robust approach for Argopy would be to extract the ID value from the @id field that looks like this:
http://vocab.nerc.ac.uk/collection/R08/current/871/

@gmaze
Copy link
Member Author

gmaze commented Dec 16, 2024

@cywhale fix available on the master branch until next release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
argo-BGC About biogeochemical variables bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants