A light-weight, fast geocoder for Python using DuckDB. Try it out online at Huggingface
Whereabouts is an open-source geocoding library for Python, allowing you to geocode and standardize address data all within your own environment:
Features:
- Two line installation
- No additional database setup required. Uses DuckDB to run all queries
- No need to send data to an external geocoding API
- Fast (Geocode 1000s / sec depending on your setup)
- Robust to typographical errors
- Python 3.8+
- requirements.txt (found in repo)
whereabouts can be installed either from this repo using pip / uv / conda
pip install whereabouts
You will need a geocoding database to match addresses against. You can either download a pre-built database or create your own using a dataset of high quality reference addresses for a given country, state or other geographic region.
Pre-built geocoding database are available from Huggingface. The list of available databases can be found here
As an example, to install the small size geocoder database for all of Australia:
python -m whereabouts download au_all_sm
Rather than using a pre-built database, you can create your own geocoder database if you have your own address file. This file should be a single csv or parquet file with the following columns:
Column name | Description | Data type |
---|---|---|
ADDRESS_DETAIL_PID | Unique identifier for address | int |
ADDRESS_LABEL | The full address | str |
ADDRESS_SITE_NAME | Name of the site. This is usually null | str |
LOCALITY_NAME | Name of the suburb or locality | str |
POSTCODE | Postcode of address | int |
STATE | State | str |
LATITUDE | Latitude of geocoded address | float |
LONGITUDE | Longitude of geocoded address | float |
These fields should be specified in a setup.yml
file. Once the setup.yml
is created and a reference dataset is available, the geocoding database can be created:
python -m whereabouts setup_geocoder setup.yml
Geocode a list of addresses
from whereabouts.Matcher import Matcher
matcher = Matcher(db_name='au_all_sm')
matcher.geocode(addresslist, how='standard')
For more accurate geocoding you can use trigram phrases rather than token phrases. Note you will need one of the large databases to use trigram geocoding.
matcher.geocode(addresslist, how='trigram')
The algorithm employs simple record linkage techniques, making it suitable for implementation in around 10 lines of SQL. It is based on the following papers
Work in progress: https://whereabouts.readthedocs.io/en/latest/
- Additional countries (US, NZ, France, UK)
- Geocode street corners
- Geocode individual suburb, street name pairs (without house numbers)