-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runprov extractor #433
Merged
Runprov extractor #433
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This ports most functionality from datalad_metalad.extractors.core into a script that also then adds translation into the catalog schema. The script receives a path to a datalad dataset as parameter and outputs a metadata record that can immediatelt be added toa catalog. In this way, the dependence on metalad is removed and the explicit Translator functionality of the catalog (which also depends on jq bindings) does not have to be used. The reason for doing this is to have a self-contained script that could in future just be ripped and replaced with whatever new functionality supercedes this.
this commit adds a script that runs a slightly refactored version of the 'runprov' extractor in datalad-metalad. Additionally, it translates the output of that code into a metadata record that is compliant witht datalad-catalog's dataset schema, such that the script's output can be directly 'catalog-added' as an entry to an existing catalog. The main reason for porting this functionality is to have a self-contained script inside the package that makes dependence on metalad unnecessary.
✅ Deploy Preview for datalad-catalog canceled.
|
The following script:
from argparse import ArgumentParser
import json
from pathlib import Path
from datalad_catalog.extractors import (
catalog_core,
catalog_runprov,
)
from datalad_catalog.constraints import EnsureWebCatalog
from datalad_next.constraints.dataset import EnsureDataset
def get_metadata_records(dataset):
""""""
# first get core dataset-level metadata
core_record = catalog_core.get_catalog_metadata(dataset)
# then get runprov dataset-level metadata
runprov_record = catalog_runprov.get_catalog_metadata(
source_dataset=dataset,
process_type='dataset')
# return both
return core_record, runprov_record
def add_to_catalog(records, catalog):
from datalad.api import (
catalog_add,
catalog_set,
)
# Add metadata to the catalog
for r in records:
catalog_add(
catalog=catalog,
metadata=json.dumps(r),
)
if __name__ == "__main__":
parser = ArgumentParser()
parser.add_argument(
"dataset_path", type=str, help="Path to the datalad dataset",
)
parser.add_argument(
"catalog_path", type=str, help="Path to the catalog",
)
args = parser.parse_args()
# Ensure is a dataset
ds = EnsureDataset(
installed=True, purpose="extract metadata", require_id=True
)(args.dataset_path).ds
# Ensure is a catalog
catalog = EnsureWebCatalog()(args.catalog_path)
core_record, runprov_record = get_metadata_records(ds)
print(json.dumps(core_record))
print("\n")
print(json.dumps(runprov_record))
# Add metadata to catalog
add_to_catalog([core_record, runprov_record], catalog) |
the script shows how core and runprov metadata can be extracted from a datalad dataset, translated into the catalog schema, and added to an existing catalog
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #432
This ports functionality from datalad-metalad's extractors:
core
andrunprov
The former was added previously to the
abcdj
branch and is now cherry-picked into this branch.The latter is newly added as a script
catalog_runprov
that runs a slightly refactored version of the 'runprov' extractor in datalad-metalad. Additionally, it translates the output of that code into a metadata record that is compliant witht datalad-catalog's dataset schema, such that the script's output can be directly 'catalog-added' as an entry to an existing catalog.The main reason for porting this functionality here is to have self-contained scripts inside the package that makes dependence on metalad unnecessary.