- Introduction
- Installation and setup
- Directory structure
- Content overview
- Citation
- Acknowledgements
- License
This repository provides underlying code and materials for the paper 'Station to Station: Linking and Enriching Historical British Railway Data'.
The StopsGB dataset is available on the British Library research repository.
-
We recommend installation via Anaconda. Refer to Anaconda website and follow the instructions.
-
Create a new environment:
conda create -n py39station python=3.9
- Activate the environment:
conda activate py39station
- Clone the repository:
git clone https://github.com/Living-with-machines/station-to-station.git
- Install the requirements:
cd /path/to/my/station-to-station
pip install -r requirements.txt
- Install python-levenshtein separately with conda:
conda install -c conda-forge python-levenshtein
- To allow the newly created
py39station
environment to show up in the notebooks, run:
python -m ipykernel install --user --name py39station --display-name "Python (py39station)"
Our code assumes the following directory structure:
station-to-station/
├── processed/
│ ├── deezymatch/
│ ├── quicks/
│ ├── ranklib/
│ ├── resolution/
│ └── wikidata/
├── resources/
│ ├── deezymatch/
│ ├── geonames/
│ ├── geoshapefiles/
│ ├── quicks/
│ ├── ranklib/
│ ├── wikidata/
│ ├── wikigaz/
│ └── wikipedia/
├── quicks/
├── wikidata/
├── deezymatch/
└── linking/
└── tools/
This is a summary of the contents of each folder:
- Resources, inputs and outputs:
resources/
: folder where resources required to run the experiments are stored.processed/
: folder where processed data, resources, and results are stored.
- Processing code:
quick/
: code for parsing and processing Quick's Chronology.wikidata
: code for processing Wikidata, to be used in the linking experiments.deezymatch
: code to create the DeezyMatch datasets and models used for linking.
- Linking code:
linking
: code for reproducing the experiments and for linking StopsGB to Wikidata.
To run the linking experiments, follow the instructions in this order:
- Prepare the resources → resources readme.
- Process Wikidata → Wikidata readme.
- Create DeezyMatch datasets and models → DeezyMatch readme.
- Reproduce the linking experiments → Readme: reproduce linking experiments.
To create the full StopsGB
, follow the instructions in this order:
- Prepare the
resources
folder → resources readme. - Process Wikidata → Wikidata readme.
- Create DeezyMatch datasets and models → DeezyMatch readme.
- Process Quick's Chronology into StopsGB → Quicks readme.
- Resolve and georeference StopsGB → Readme: create StopsGB.
Please acknowledge our work if you use the code or derived data, by citing:
Mariona Coll Ardanuy, Kaspar Beelen, Jon Lawrence, Katherine McDonough, Federico Nanni, Joshua Rhodes, Giorgia Tolfo, and Daniel C.S. Wilson. "Station to Station: Linking and Enriching Historical British Railway Data." In Computational Humanities Research (CHR2021). 2021.
@inproceedings{lwm-station-to-station-2021,
title = "Station to Station: Linking and Enriching Historical British Railway Data",
author = "Coll Ardanuy, Mariona and
Beelen, Kaspar and
Lawrence, Jon and
McDonough, Katherine and
Nanni, Federico and
Rhodes, Joshua and
Tolfo, Giorgia and
Wilson, Daniel CS",
booktitle = "Computational Humanities Research",
year = "2021",
}
- Conceptualization: Katherine McDonough, Jon Lawrence and Daniel C. S. Wilson.
- Methodology: Mariona Coll Ardanuy, Federico Nanni and Kaspar Beelen.
- Implementation: Mariona Coll Ardanuy, Federico Nanni, Kaspar Beelen and Giorgia Tolfo.
- Reproducibility: Federico Nanni and Mariona Coll Ardanuy.
- Historical Analysis: Kaspar Beelen, Katherine McDonough, Jon Lawrence, Joshua Rhodes and Daniel C. S. Wilson.
- Data Acquisition and Curation: Daniel C. S. Wilson, Mariona Coll Ardanuy, Giorgia Tolfo and Federico Nanni.
- Annotation: Jon Lawrence and Katherine McDonough.
- Project Management: Mariona Coll Ardanuy.
- Writing and Editing: all authors.
Original data from Railway Passenger Stations in Great Britain: a Chronology by Michael Quick. Used with permission from The Railway and Canal Historical Society ©.
Work for this paper was produced as part of Living with Machines. This project, funded by the UK Research and Innovation (UKRI) Strategic Priority Fund, is a multidisciplinary collaboration delivered by the Arts and Humanities Research Council (AHRC), with The Alan Turing Institute, the British Library and the Universities of Cambridge, East Anglia, Exeter, and Queen Mary University of London.
The source code is licensed under MIT License.
Copyright © 2021 The Alan Turing Institute, British Library Board, Queen Mary University of London, University of Exeter, University of East Anglia and University of Cambridge.