Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add bids metadata extraction #432

Open
satra opened this issue Mar 1, 2021 · 5 comments
Open

add bids metadata extraction #432

satra opened this issue Mar 1, 2021 · 5 comments

Comments

@satra
Copy link
Member

satra commented Mar 1, 2021

as we deal with non-nwb dandisets, would be good to add bids metadata extraction, which may require parsing the tree (to get age, sex from participants.tsv, etc.,.).

@yarikoptic
Copy link
Member

I think this functionality should as much as possible align/reuse with https://github.com/datalad/datalad-neuroimaging/blob/master/datalad_neuroimaging/extractors/bids.py which ATM is just a dump of metadata as provided by pybids.
But IIRC @mih mentioned that in the scope of ebrains openminds he is consider (or just advising?) to provide more "tight" harmonization. @mih could you briefly chime in on the plans on that end here? (or just add references)

@satra
Copy link
Member Author

satra commented Mar 1, 2021

alignment is good, but we will want to fill in the fields of our asset metadata structure as well about participants and biosamples.

@mih
Copy link

mih commented Mar 1, 2021

What I was talking about in that meeting was that a bids2openminds conversion is taking place outside the scope of a metadata extractor. An extractor should report "as-is". If the metadata source (like BIDS), it not "semantically clean", a subsequent (and updatable) transformation can be used to yield a "better" (or just different) record.

I realized at some point that doing the standardization at the level of an extractor implies that any application of updates to that standardization requires actual data access, and also makes metadata extraction an inherently open-ended process. Adding the possibility to for customizable transformations of metadata seems much more practical, when data access is complicated (which it seems to be for most datasets).

@satra
Copy link
Member Author

satra commented Jun 9, 2021

@yarikoptic - perhaps we can add some bids support in the short term with respect to participant id and a few other things.

@mih - in our case metadata extraction is performed at the point of validation/upload so access is there. in the future we may want to extend the schema, for which we would indeed need to pull in the directory structure (especially for bids, where the inheritance principle does apply for some metadata).

@yarikoptic
Copy link
Member

yeah, I guess we shouldn't postpone for too long. I do not think we should at this point anyhow to amalgam data + sidecar files into a single asset, so we will keep it KISS and have an asset per each file, be it a data, sidecar, or metadata. dataset_description.json will also be a first-class-citizen and have an asset.

Do you know perspective datasets which would be uploaded and should be BIDS?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants