A toolchain to export the mapping between two id's (e.g. PIDs and SAP--numbers) related to certain islandora entities (e.g. authors)
This toolchain assumes that you have the following installed on your server (it might work with different versions, but was not tested):
In the following, we shall assume that you want to install the toolchain on your drupal-substite "institute". Words in angle brackets indicate the respective values you have decided on (do not forget to insert the values that apply to your installation). Although you can install this differently, we explain only one easy version that suited our needs.
N.B.: This toolchain needs to be patched manually if you wish to use it for entities other than authors...
-
Decide on a uuid (you may generate a random one using
/usr/bin/uuidgen -r
) and a file path (these installation instructions assume/var/www/html/sites/<institute>/files/
) and have your IT department restrict access to$base_url/sites/<institute>/files/<uuid>
(allow only your personal workstations and those servers that need the exported information). Here,$base_url
refers to the url from which your subsite is accessed from the outside (say,https://<islandora.somewhere.com>/<institute>
). -
Go to
/var/www/html/sites/<institute>/files/
and clone this repository into the<uuid>
folder (you will need to let github know your ssh-key†!):cd /var/www/html/sites/<institute>/files/ if [ -z "$SSH_AUTH_SOCK" ]; then eval $(ssh-agent -s); fi ssh-add ~/.ssh/<your_id_rsa_github_key> /usr/bin/sudo SSH_AUTH_SOCK=$SSH_AUTH_SOCK /usr/bin/git clone git@github.com:/path/to/this/repo/idmapping.git ./<uuid>/
Note that you could clone the repository to any destination; the install script will copy the necessary files to the necessary destination. Watch out for filesystem permissions!
-
Enter the
<uuid>
-directory, change file ownerships and installcd <uuid> /usr/bin/sudo /bin/chown -R apache.apache {*,.g*} /usr/bin/sudo -u apache ./installidmapping.sh -i <institute> -u <uuid> -n <InstituteName> -p <port>
Here,
<InstituteName>
refers to the display name of<institute>
and<port>
to the port through which solr can be accessed on your server. Both parameters are optional and default to "DORA Institute" and "8080", respectively.The above command will create all the necessary files in
<uuid>
; the generated files all have the suffix-<institute>
, with the exception ofidmapping.py
(note that those files are copies of the unsuffixed files, but including the necessary modifications to reflect the choices of<institute>
,<uuid>
, etc...; be careful that this might not work for you --- the install script is intended to work on our installation). -
Go to
https://<islandora.somewhere.com>/<institute>/admin/config/workflow/rules/components
and import the componentcomponent_to_set_cronjob_for_author_idmapping_export-<institute>.rulecomponent
.Note (2020-June-05/FH): Should you get e-mail notifications finally that the
*.sh
scripts cannot be found resp. that the path is not valid, then you should return the installation this step and consider to replace$_SERVER['DOCUMENT_ROOT']
with the actual path (for example/var/www/html
) inside the imported code.
Specially in conjunction with a simultaniously running process specific to an institute (so, not for the main site), performed by Drush or triggered by a Crontab task, and treating author objects as well - then it may happen for intransparent reasons that some server variables are not available temporarily (this may affect PHP or Drupal varaibles perhaps too). -
Go to
https://<islandora.somewhere.com>/<institute>/admin/config/workflow/rules
and import the rulerule_to_set_cronjob_for_author_idmapping_export-<institute>.rule
. -
Test the installation as follows:
-
Inside
/var/www/html/sites/<institute>/files/<uuid>
executels -laFh
and verify that all files are owned by user
apache
and thatidmapping.py
andrunidmapping-<institute>.sh
are user-executable. -
Inside
/var/www/html/sites/<institute>/files/<uuid>
execute/usr/bin/sudo -u apache /usr/bin/crontab -l /usr/bin/sudo -u apache ./runidmapping-<institute>.sh -v -b -a /usr/bin/sudo -u apache /usr/bin/crontab -l
You should see an addition line in the crontab that executes
runidmapping-<institute>.sh
twice. -
Inside
/var/www/html/sites/<institute>/files/<uuid>
execute/usr/bin/sudo -u apache /usr/bin/crontab -l /usr/bin/sudo -u apache ./runidmapping-<institute>.sh -v -b -c /usr/bin/sudo -u apache /usr/bin/crontab -l
The addition line in the crontab should have disappeared.
-
Inside
/var/www/html/sites/<institute>/files/<uuid>
execute/usr/bin/sudo -u apache ./runidmapping-<institute>.sh -v -b ls
You should see the two files
<institute>-authors.xml
and<institute>-authors.xml.<timestamp>
, as well as the log-fileidmapping.log.<timestamp>
, where<timestamp>
is the UTC timestamp of execution (in the format "YYYYmmddTHHMMSSZ
"; note that the timestamps of the two files will differ by a few seconds). -
Add an author-object and check if a line in the crontab that executes
runidmapping-<institute>.sh
twice has been created (use/usr/bin/sudo -u apache /usr/bin/crontab -l
) --- you can delete apache's crontab with/usr/bin/sudo -u apache /usr/bin/crontab -r
and re-do the test for other scenarios like modifying or deleting an author-object. -
Trigger the creation of the abovementioned line in crontab (add, modify or delete an author-object, or run
/var/www/html/sites/<institute>/files/<uuid>/runidmapping-<institute>.sh -v -b -a
) and wait for 11:45pm to see if the file/var/www/html/sites/<institute>/files/<uuid>/<institute>-authors.xml
gets re-created.
-
†You can generate a new ssh-key by executing, e.g., ssh-keygen -t rsa -b 4096 "<your-github-username>@users.noreply.github.com -f ~/.ssh/id_rsa_github
. Afterwards, make sure to upload the public key file (~/.ssh/id_rsa_github.pub
) to https://github.com/settings/keys (log in first).
The file <institute>-authors.xml
, generated when following the above installation instructions, has the following format:
<?xml version="1.0" encoding="utf-8"?>
<institute-authors ts="YYYYmmddTHHMMSSZ">
<IDmapping>
<DORAid>institute-authors:NNNN</DORAid>
<SAPid>NNNN</SAPid>
</IDmapping>
.
.
.
</institute-authors>
Here, NNNN
stands for any number of digits and the literal "institute
" is replaced by your choice of <institute>
(including in the tag-name, which might generate a non-conforming xml
-file if <institute>
contains weird characters).
The file is such that the repository only contains the generic files (where <institute>
="institute
", <uuid>
="00000000-0000-0000-0000-000000000000
", etc...), as well as itself. Specifically, it contains
/*
!/.gitignore
!/README.md
!/LICENSE.md
!/idmapping.py
!/idmappingsetup.xml
!/runidmapping.sh
!/rule_to_set_cronjob_for_author_idmapping_export.rule
!/component_to_set_cronjob_for_author_idmapping_export.rulecomponent
!/installidmapping.sh
This file...
The license under which the toolchain is distributed:
Copyright (c) 2017 d-r-p <d-r-p@users.noreply.github.com>
Permission to use, copy, modify, and distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
The python-script responsible for the export.
Usage: idmapping.py [-v] [-b] [-s setup.xml]
-v
: Be verbose about what the script is doing-b
: Immediately create a backup copy of the exported file (the default filename is<institute>-authors.xml
; an additional copy with suffix.<timestamp>
will be created)-s setup.xml
: Provide an xml-file with various customisations (seeidmappingsetup.xml
below)
This is the dummy setup file that, after modification by the installer script installidmapping.sh
, will be given to idmapping.py
. It contains a short explanatory header, as well as the default values for each setting (meaning that there is no need to hand over this file as is to idmapping.py
):
<?xml version="1.0" encoding="utf8"?>
<IDMAPPINGSETUP>
<!--
REMARKS REGARDING THE TAGS:
- Each of the following tags is optional and defaults to (lowercase tags refer to the current setting):
- <INSTITUTE>: institute
- <NAMESPACE>: <institute>-authors
- <UUID>: 00000000-0000-0000-0000-000000000000
- <SUBSITE>: <institute>
- <PATH>: /var/www/html/sites/<subsite>/files/<uuid> (if <subsite> is blank, "default" will be substituted in the path)
- <FILENAME>: <namespace>.xml
- <SRCIDKEY>: DORAid
- <SOLRFIELDSRCID>: PID
- <DESIDKEY>: SAPid
- <SOLRFIELDDESID>: MADS_u1_ms
- <SOLRBASEURL>: http://localhost:8080/solr/collection1/select
- <ROOTKEY>: <namespace>
- <CONTAINERKEY>: IDmapping
EXAMPLES:
- <INSTITUTE>marsuniversity</INSTITUTE>
<UUID>45cf5181-b077-4e94-b021-e2ccf39508a0</UUID>
- <INSTITUTE>institute</INSTITUTE>
<SUBSITE></SUBSITE>
<NAMESPACE>people</NAMESPACE>
<FILENAME>out.xml</FILENAME>
-->
<INSTITUTE>institute</INSTITUTE>
<NAMESPACE>institute-authors</NAMESPACE>
<UUID>00000000-0000-0000-0000-000000000000</UUID>
<SUBSITE>institute</SUBSITE>
<PATH>/var/www/html/sites/institute/files/00000000-0000-0000-0000-000000000000</PATH>
<FILENAME>institute-authors.xml</FILENAME>
<SRCIDKEY>DORAid</SRCIDKEY>
<SOLRFIELDSRCID>PID</SOLRFIELDDORAID>
<DESIDKEY>SAPid</DESIDKEY>
<SOLRFIELDDESID>MADS_u1_ms</SOLRFIELDSAPID>
<SOLRBASEURL>http://localhost:8080/solr/collection1/select</SOLRBASEURL>
<ROOTKEY>institute-authors</ROOTKEY>
<CONTAINERKEY>IDmapping</CONTAINERKEY>
</IDMAPPINGSETUP>
This is the wrapper script that invokes idmapping.py
and modifies the crontab.
Usage: runidmapping.sh [-v] [-b] [-a|-c|-g]
-v
: Be verbose (transitively!)-b
: Make a backup of the generated file (transitively!)-a
: Schedule the cronjob next 11:45pm-c
: Clean-up crontab (remove all references to executingrunidmapping.sh
with the chosen options)-g
: Get the cron-command
Be careful that transitivity only works if the script is invoked in exactly the same way (also, mind the order of switches). E.g., runidmapping.sh -b -c
requires the cronjob to have been added via runidmapping.sh -b -a
(instead of, say, runidmapping.sh -v -a
).
This is the rule reacting to creation, modification or deletion of an entity object in the appropriate namespace (default: <institute-authors>
). It invokes component_to_set_cronjob_for_author_idmapping_export.rulecomponent
which sets the cronjob (see below).
This is the component invoked by rule_to_set_cronjob_for_author_idmapping_export.rule
. It is responsible for adding an appropriate cronjob (which is done by calling runidmapping.sh -v -b -a
).
This is the install script. It takes an insitute-name and a uuid (as well as some optional parameters) and copies the necessary files to the necessary location, making the necessary modifications.
Usage: installidmapping.sh -i institute -u uuid [-d directory] [-n name] [-p port]
-i institute
: Make all references to<institute>
beinstitute
-u uuid
: Make all references to<uuid>
be uuid (format: 8-4-4-4-12 digits, dash-separated)-d directory
: Install intodirectory
instead of/var/www/html/sites/<institute>/files
-n name
: Make all references to "DORA Institute
" bename
-p port
: Change the default solr-port from8080
toport
- Modify
idmapping.py
to restrict to collections and content models; make this configurable throughidmappingsetup.xml
- Modify
installidmapping.sh
to deal with collections and content models (the script will have to also modifyrule_to_set_cronjob_for_author_idmapping_export.rule
) - Add options to
installidmapping.sh
to remove backup-creation and verbosity - Modify
component_to_set_cronjob_for_author_idmapping_export.rulecomponent
to usedrupal_mail()
This document is Copyright © 2017 by d-r-p
<d-r-p@users.noreply.github.com>
and licensed under CC BY 4.0.