-
Notifications
You must be signed in to change notification settings - Fork 17
Inputhons
"Inputhon" is our super-fancy name for a type of a hackathon where the persons responsible for a centre's recommendations for data deposition formats meet for (say) an hour in order to prepare or update their centre's content for the SIS.
Please note: the content of this document is still being formed. A pilot inputhon at the IDS is going to be held in July 2023 and feedback from there will make it to here. But feedback is very welcome at any point -- click to open a new github issue to let us know what's wrong or what needs improving.
The goal is to (ideally) end the event with a submission of a pull request against one of the files in https://github.com/clarin-eric/standards/tree/formats/SIS/clarin/data/recommendations (note that it's not the master branch).
Post-event, the centre can either
- point its users to the SIS (recommended, because of the data aggregation that happens there), or else
- re-use the same data (note: you don't want to maintain two copies of recommendations, do you) by pulling them out of the SIS via its API (an example is supplied; essentially, you just need to style the data according to your site's make-up).
For CLARIN B-centres which need to undergo (re-)certification,
- storing format recommendations in the SIS satisfies the relevant CoreTrustSeal recommendation (see section 8 (R08, "Deposit & Appraisal") of the Extended Guidance), which checks, a.o., whether the repository offers a list of preferred formats.
- Incidentally, two bullets down, R08 asks about info on "the approach towards digital objects that are deposited in non-preferred formats" -- that information can also be provided by the SIS, both in the general section describing the centre and/or in comments on formats, especially those labelled as "discouraged" (="non-preferred", in CTS lingo).
For other centres/repositories, storing the information is a way to:
- get that done in a uniform format, and along a tested route;
- be able to use a clean template and/or examples provided in the recommendations by other centres;
- obtain statistics based on the aggregated information
- not bother about displaying the information...
- at all, if the centre/repository points to the SIS for that purpose, or
- much, if the centre takes the data via the SIS API and applies its own (or provided) CSS styling to it.
These steps are optional but advisable. If they seem like too much time investment, skip them. But we would appreciate if you could go via pull requests, also for the sake of keeping track of the project's history.
- tell us about the intention to hold an inputhon, so that we can make sure that the centre is represented in the system, and that at least a skeletal recommendations file for it exists
- we can then also at least try to make ourselves available for consultation over zoom, etc.
- fork the SIS, clone your own repo instance, install eXist and the SIS
- optionally, you might want to integrate that new DB instance with your oXygen editor (yes, there's a lot of assumptions here), because then you will be able to visualise your changes just by dragging the recommendations file from oXygen's project panel to the DB connection panel (and refreshing the local SIS instance in the browser). Please do not worry if this paragraph is not clear to you.
The recommended way is to look at the SIS/clarin/data/recommendations/ directory, and locate your centre's data. For example for the IDS, the document is IDS-recommendation.xml
. Please bear in mind that the same centre may use different names across different RIs, so search also for the alternatives. We're not yet sure how to handle that kind of variation and your opinion on this matter may help.
If you can't locate your centre, please let us know, either by e-mail (see the "About" page of the SIS) or by posting an issue.
If you don't want to bother with cloning the SIS repository (oh please, do bother...) then locate your centre in the list of centres supported by the SIS. If you can't locate the centre, use the link above to post a github issue.
Once you have located your centre then clock on "download template" (if the page is empty) or "export table to XML" if the table has already been populated. In the latter case, please note that, as a centre representative, you should not feel obligated to keep the content of the existing recommendations if you see a red notice saying "Warning: The recommendations have not been curated yet" -- this in most cases means that we have populated the recommendations ourselves, at the testing stage, with information obtained either from the centre directly by one of the Standards Committee members, or we have (superficially and quickly) interpreted the recommendations posted by your centre by squeezing them into, and smearing them across, the functional domain system that the SIS uses, and by more or less straightforwardly taking the recommendations levels (recommended, acceptable, discouraged) from your centre's documentation. You may want to thoroughly re-examine our choices -- we were only seeding the system.
Have a look at the data domains, see which of them correspond to the functions of the data that your centre is ready to receive. Please read through the descriptions of the particular domains. Treat the domains, together with the three levels of recommendation, as a scaffolding upon which your centre's recommendations will be placed.
- Have a look at the data domains, see which of them correspond to the functions of the data that your centre is ready to receive
- For each of the selected domains, decide which formats are recommended and how (that is,
- if the centre wishes to receive data in that format, it is going to be easy to curate, archive, etc. -- then choose "recommended", or
- if it's an "if you really must" format -- then choose "acceptable";
- you might also want to discourage submissions in some format -- choose "discouraged" in such cases, and do consider providing a short explanation about what is the preferred alternative, if there is any; or mentioning why submissions in the given format are discouraged by the centre.
We suggest that you go domain by domain, and that you work with either fork of the SIS or work in a branch created from the local "formats" branch -- and then make your pull requests against that branch, please. (There are alternatives, to be described later.)
If you take the path of editing the source with an XML editor, you will be able to use the benefit of XML Schema and Schematron -- both are used to constrain the XML you're going to produce, often providing suggestions on the valid values and structures. You will then also be able to use the template provided in each empty recommendations document.
You can simply point your users to your centre's data by using a direct link. For the IDS, you would use https://clarin.ids-mannheim.de/standards/views/view-centre.xq?id=IDS (note the final ID).
You can also retrieve your data via the REST API offered by the SIS. Again, for IDS, you would use, e.g. curl 'https://clarin.ids-mannheim.de/standards/rest/views/recommended-formats-with-search.xq?centre=IDS&domain=1&level=recommended&export=yes'
-- have a look at the API documentation to see what parameters are possible, etc.
You can see a bare-bones result at https://github.com/IDS-Mannheim/IDS-Mannheim.github.io and the corresponding raw webpage is available at https://ids-mannheim.github.io/standards/ . If you would like to contribute a CSS (or XSL) stylesheet to render the info in a nicer way, please feel welcome to contact us.