I need a method to add the wine label information from Eric into the label data.
Eric has a spreadsheet, that includes producer, brand, wine type, date
city/region, abv, proof marginalia. This was saved as titles.csv
.
I need to update the jq files first. This script creates a titles.json file from the csv file. It includes titles and other information
PREFIX wine_label: <ark:/87287/d7794w/schema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ucdlib: <http://schema.library.ucdavis.edu/schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX schema: <http://schema.org/>
CONSTRUCT {
?ark a schema:label;
schema:name ?name;
rdfs:label ?labelValue;
wine_label:folder ?old_title;
wine_label:digitization_number ?label_number;
wine_label:producer ?producer;
wine_label:brand ?brand;
wine_label:wine_type ?wine_type;
wine_label:date ?date;
wine_label:city_region ?city_region;
wine_label:ABV ?abv;
wine_label:proof ?proof;
wine_label:marginalia ?marginalia;
wine_label:description ?description;
wine_label:questions ?questions;
.
}
WHERE {
BIND(replace(?url,"https://digital.ucdavis.edu/collection/amerine-wine-labels/labels/label_","") as ?label_number)
BIND('ark:/87287/d7794w' AS ?collection)
BIND(uri(concat("ark:/85140/", replace(?filename,".jpg",""))) AS ?ark)
BIND(coalesce(?producer, "") AS ?p)
BIND(coalesce(concat(" § ",?brand), "") AS ?b)
BIND(coalesce(concat(" § ",?wine_type), "") AS ?w)
BIND(coalesce(concat(" § ",?city_region), "") AS ?c)
BIND(concat(?p,?b,?w,?c) AS ?name)
}
tarql titles.rq titles.csv | \
riot --syntax=turtle --formatted=jsonld | \
jsonld compact -c $(pwd)/context.json > titles.json
Then you can make individual titles.json files with something like:
for i in $(cd items; echo ark:/85140/d4????); do \
echo $i;
jq --arg ark "$i" '.["@graph"][] | select(.["@id"]==$ark) | del(.["@id"]) | del(.["@type"])' titles.json > items/$i/title.json;
done
And you can join these together with the following jq conversion. I’m also removing the isPartOf, and adding a better publisher
.[0]
+
(.[1] |
del(.["schema:isPartOf"]) |
del(.["schema:identifier"][] | select(contains("ark:") | not )))
+ .[2]
+ {
"schema:publisher":{
"@id":"http://id.loc.gov/authorities/names/no2008108707",
"schema:name":"University of California, Davis. General Library. Dept. of Special Collections"
}
}
c=context.json
for d in items/ark:/85140/d4????; do \
echo -n -e "$d\r";\
cp ${d}.jsonld.json ${d}.jsonld.json-;\
jq -s -f add_titles.jq ${c} ${d}.jsonld.json- ${d}/title.json > ${i};
done
In the current metadata format, we tried to include information that the
label describesWine
. Here’s an example:
{
"@graph": [
{
"ucdlib:describesWine": {
"@id": "@base:#wine"
"schema:identifier": [
"label_0041",
"ark:/85140/d4001n"
]
},
{
"@id": "@base:#wine",
"@type": "ucdlib:Wine",
"ucdlib:WineType": {
"@id": "ucdlib:Still"
},
"http://www.wikidata.org/prop/direct/P297": "ES"
}
]
}
We are removing this, and as a result, removing the @graph
component. We
only do this to the records with a @graph
node.
I’ll use this opportunity to add in a new context file.
{
"@context": {
"wine_label": "ark:/87287/d7794w/schema#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"ucdlib": "http://schema.library.ucdavis.edu/schema#",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"schema": "http://schema.org/"
}
}
And you can join these together with the following jq conversion:
.[0]
+
(.[1]["@graph"][0] |
del(.["ucdlib:describesWine"])
)
c=context.json
for i in $(grep -l '@graph' items/ark:/85140/d4????.jsonld.json); do \
echo $i;
cp ${i} ${i}-;\
jq -s -f rm_graph.jq ${c} ${i}- > ${i};\
done
Originally, there was not an identifier for the collection; I have minted, ark:/87287/d7794w for this collection.
Somehow I managed to mess up all the files that I had been adding to the
amermine-wine-labels
metadata file. I got the old data back from sandbox
export, and I’m trying to upload them again.
In the process I noticed that the form of the metadata changes midway through the data. Labels of:
for i in label_[0123]??? label_4[0-8]?? label_490? label_491[0123]; do echo $i; done
have two items within a ["@graph"]
, while,
for i in label_491[456789] label_49[2-9]?? label_[5-9]???; do echo $i; done
Don’t have a graph, and have the information in the root. So, I need to replace what I had before.
for i in label_[0123]??? label_4[0-8]?? label_490? label_491[0123]; do ark=$(jq -r '.["@graph"][0]["schema:identifier"][] | select(.|match("^ark:"))' $i.jsonld.json); mkdir -p $(dirname $ark); mv $i $ark; mv $i.jsonld.json $ark.jsonld.json ; done
for i in label_491[456789] label_49[2-9]?? label_[5-9]???; do ark=$(jq -r '.["schema:identifier"][] | select(.|match("^ark:"))' $i.jsonld.json); mkdir -p $(dirname $ark); mv $i $ark; mv $i.jsonld.json $ark.jsonld.json ; done
This script just changes name of the metadata. Now, I could also rexport the data, because that’s where this original data came from, but by looking at these data files, they are the same, so I’ll just rsync the ones I messed up.
for i in d4*.json; do echo $i; diff ../../../v1/items/ark\:/85140/$i $i; done | less
And now, I think I need to check in a version of the metadata, before I try this again.
I seem to have been doing two things. First, I went through and identified every image that is simply a card label, and not a wine label. I cleverly called the metadata for these label.json, which is pretty dumb. I will renanme these as index-card-label.json which is a bit more understandable.
The way that I would do this was be going back to the directory with the jpegs, and I’d rename the metadata.json file to label.json. Then, I’d remove the metadata.ttl data. Then, I would often copy the metadata from the next label and rewrite the label info. That’s probably to get the metdata for the upcoming labels.
l=3629; cd ../a$l; mv metadata.json label.json; rm metadata.ttl; cat label.json
cp ../a3630/metadata.json label.json; cat label.json
Also, for a few index cards, we only have the thumbnail, not the full index. These are cards a1044, a1070, and a1091. a1044 looks like it says, K,L,M. a1070 says N,O,P. a1091 says Q,R,S,T. These all have a index-card-label card assigned to them.
The last index-card-label in the data is item a3659. After that, either there are no more labels, or else the labels are no longer catalogged.
There are no full images without a thumbnail. Note, there are no sequences missing from the list of items.
However, the labels alone do not seem to indicate where all the breaks exist. We can go through the data, and see where all the changes in metadata occur.
last_metadata='';
cur_folder='folder/'
for a in data/a*; do
b=`basename $a`;
f=${b#a*}
# Maybe a new Folder
if [[ -f $a/metadata.json ]]; then
this_metadata=`tr -d "\n" < $a/metadata.json | sed -e 's/\s//g'`;
if [[ "$this_metadata" != "$last_metadata" ]] ; then
cur_folder=folder/$f
cur_dir=$cur_folder
[[ -d $cur_folder ]] || mkdir $cur_folder;
j=`basename $this_json`;
jq . < $a/metadata.json > $cur_folder/metadata.json
last_metadata=$this_metadata;
fi
if [[ -f $a/full.jpg ]] ; then
cp $a/full.jpg $cur_dir/label_$f.jpg
fi
elif [[ -f $a/index-card-label.json ]]; then
cur_dir=$cur_folder/index_card_$f
[[ -d $cur_dir ]] || mkdir $cur_dir;
echo $cur_dir/metadata.json
jq . < $a/index-card-label.json > $cur_dir/metadata.json
if [[ -f $a/full.jpg ]] ; then
cp $a/full.jpg $cur_dir/index_card_$f.jpg
fi
fi
done
Once I had the json files, I sometimes needed to go back and create versions, since I changed things. For example, when switching to schema.org, I needed to change the language designation, since they use a IETF Standard. `jq` is your friend in this case. For example, here’s that change.
for i in $(find folder -name metadata.json | xargs grep -l language_id ) ; do
mv $i $i.bak;
jq '. |= . + {inLanguage: (.language_id+(if has("country_id") then "-"+.country_id else "" end)),country:.country_id} | del(.language_id, .country_id) ' $i.bak > $i;
done
These ARKs were currently pointing to the labelthis project. They have been updated with the following command. This runs on the metadata.ttl files in the database.
for i in $(find . -name metadata.ttl); do
id=$(sparql -q --data=$i --results=CSV --query=- <<<"prefix : <http://schema.org/> select ?n WHERE { ?s :identifier ?n filter regex(?n,'^ark:') .}" | sed -e 's/\r//g' | tail -1);
http --session=ucd-library POST https://ezid.cdlib.org/id/$id Content-Type:text/plain <<<"_target:https://digital.ucdavis.edu/$id";
done