pip install -r requirements.txt
export PYTHONPATH=.
py.test tests
There are also a set of integration tests that depend on
-
Stanford CoreNLP Server with the BGS custom model for Lexicon and Geochronology named entity recognition. See the instructions in that project for Running CoreNLP in docker using our docker image in Github Container Registry.
-
An RDF graph store with a SPARQL endpoint. We provide configuration for doing this with Apache Jena Fuseki running in docker.
You can see these tests run in a Github Actions workflow with Fuseki and CoreNLP Server running as docker services.
This will run two docker containers, the stratigraph API service and the Fuseki RDF triplestore:
docker-compose up -d
On first run this will build the API container (which you can also build with docker build . -t bgs/strat
. Fuseki (the graph store) will look in ./data/fuseki
for its databases and the Fuseki container will create this directory if it doesn't exist.
At present the easiest way to get running with sample data is to use the same script as the integration tests (which collects a subset of BGS Lexicon of Named Rock Units and Geochronolog data from the SPARQL endpoint at data.bgs.ac.uk and then loads the sample extract of the Jurassic graph included in this repository. (Requires Python 3). This creates the stratigraph
database if not already present.
With Fuseki running locally in docker:
export PYTHONPATH=.
pip install -r requirements.txt
python integration/fuseki_load.py
The ideal is to improve the distribution so that Fuseki loads the whole Lexicon graph on startup, with the ability to reproduce the dataset from scratch using thecustom CoreNLP server container as above.
For the full application, the triplestore should contain the full dataset for the BGS Lexicon and Geochronology Linked Open Data. Both vocabularies can be downloaded in n-triples format; data are made available under the UK Open Government Licence.
With custom Stanford CoreNLP server running in docker:
python scripts/all_triples.py
This will go through the following steps:
- Collect all BGS Lexicon entries for rock units that have text descriptions of their supper/lower boundary relations
- Extract Lexicon names from those text descriptions and link them, where possible, to Linked Data URLs.
- Create a graph of upper/lower boundary links between rock units inferred from the text, and write it as a single file in Turtle format at
./data/all_lexicon.ttl
The Stratigraph application requires this file, and the n-triples for the whole of the BGS Lexicon and Geochronology, imported into the Fuseki triplestore. (See integration/fuseki_load.py
for methods for doing this programmatically, or import the dataset using Fuseki's admin UI)
Text and data is made available through the UK Open Government Licence. Associated code is available through the LGPL v3 open source licence.
- Tim Kearsey
- Jo Walsh
- Rehan Kaleem
- Rachel Heaven
- Vyron Christodoulou