This repository holds the files associated with Chapter 4 of E. Papoutsoglou's thesis.
You can find 3 docker containers for the following:
- A simple FAIR data point for the dataset in question (uses Python 3 / flask)
- A SPARQL server hosting the phenotypic (meta)data and the weather datasets (uses Fuseki)
- Jupyter notebooks for:
- the conversion of a MIAPPE spreadsheet file to RDF (phenotypic metadata)
- the conversion of tabular phenotypic data files to RDF
- the exploration of the produced RDF as pulled from container (2), and some visualizations.
The contents of this repository are organized by container. The all_containers folder has 4 subdirectories.
-
-
data-original: Files assembled manually, resembling what could be expected expected outputs from data collection done by researchers.
- phenotypic: Files holding data from field phenotyping experiments with the CxE population.
- data_1999NL.csv: Data from the 1999 experiment conducted in the Netherlands by B.C. Celis-Gamboa.
- data_2003VE.csv: Data from the 2003 experiment conducted in Venezuela.
- data_2004Fin.csv, data_2005Fin.csv: Data from the 2004 and 2005 experiments conducted in Finland by A. Zaban.
- data_2010ET.csv: Data from the 1999 experiment conducted in Ethiopia by P.X. Hurtado-Lopez.
- weather:
- Tab-separated files with data about the photoperiod (time between sunrise and sunset) for the experiments. The contents have been retrieved from timeanddate.com
- Tab-separated files with data about the temperature (daily average) for the experiments. The temperature records have been copied from P.X Hurtado-Lopez's files.
- MIAPPE_CxE_v1.1.xlsx: MIAPPE spreadsheet (see here) holding the metadata from the 5 experiments, as retrieved from the literature.
- station_metadata.ttl: TTL file with metadata about the 4 weather stations that would theoretically hold the weather data.
- phenotypic: Files holding data from field phenotyping experiments with the CxE population.
-
data-generated: RDF (TTL) files that have been derived based on those in the data-original folder, to be imported into the SPARQL endpoint. Note that all weather data has been aggregated also into a single file, all_weather.ttl.
-
pheno_meta-data_excel_to_rdf.ipynb: Jupyter notebook that transforms the phenotypic data (from data-original) to RDF. Note that only the parts necessary for the present data have been implemented, i.e. empty MIAPPE sections and fields are not tackled here.
-
Explore_data.ipynb: Jupyter notebook that presents an exploration of the phenotypic data and metadata available, combines it and concludes with visualizations.
-
pheno_setup.ttl and weather_setup.ttl: Configuration files used by the triple store (Fuseki) to create the datasets.
-
requirements_jupyter.txt: Python libraries required for the Jupyter notebooks in this repository.
-
-
server-fdp: Files required for the container hosting the FAIR Data Point (FDP).
- fdp.ttl, catalog.ttl, dataset.ttl, distribution.ttl: Specifications (formatted as TTL) for each level of the FDP, based on the FDP recommendations.
- templates folder: HTML files based on the TTL files, so that the visual syntax formatting is preserved and links can be clickable [todo].
- fdp.py: Simple Flask server (Python 3.4+) to host the FDP files.
- requirements_fdp.txt and Dockerfile: Files to help set up the docker container.
-
server-fuseki: Files required for the container hosting the SPARQL endpoint(s).
- Dockerfile: File to help set up the docker container.
-
server-jupyter: Files required for the container hosting the Jupyter notebooks.
- requirements.txt and Dockerfile: File to help set up the docker container.
These containers have been tested with Docker version 20.10.x. Make sure that a compatible version of the software is installed on your computer.
You can start the relevant processes (jupyter notebook, FDP server, Fuseki SPARQL endpoint) with the docker-compose up
command at the root of this repository.
The services will become available at:
- FDP server: localhost:43131/FDP
A user can explore the three implemented endpoints:- Phenotypic data catalog: localhost:43131/FDP/catalog/phenotypic.ttl
- Dataset 1 in the phenotypic data catalog: localhost:43131/FDP/dataset/Dataset_1.ttl
- SPARQL distribution of Dataset 1: localhost:43131/FDP/distribution/Pheno_dataset_1_sparql.ttl
- Fuseki SPARQL server: localhost:43030
Username / password:admin
/pw123
- Jupyter notebooks: localhost:48888/?token=cxe
The password for the notebooks iscxe
, though this should be automatically entered/skipped when using the above link.
Two notebooks are available:Explore_data.ipynb
: The notebook that pulls phenotypic and weather (meta)data from the SPARQL endpoint and creates visualizations.pheno_meta-data_excel_to_rdf.ipynb
: The notebook that makes the conversion for MIAPPE metadata from the spreadsheet, and for data from tabular data into RDF.
Note that any changes made to the files on the docker containers are not persistent.
When the Docker containers are created, the following will happen:
- The
.ttl
files indata-generated
will be loaded into the SPARQL endpoint. - The FDP will become accessible.
- The Juputer notebooks will become executable. They use the data in
data-original
to produce the files indata-generated
.
New data can be uploaded to the SPARQL endpoint through its GUI.
The containers can be stopped by pressing Ctrl + C
twice, and then removed with docker-compose down
.
The images created can be deleted with:
FOR /f "tokens=*" %i IN ('docker images -q cxe') DO docker rmi %i
(on Windows)docker rmi $(docker images \-q test)
(otherwise)