Skip to content

Commit

Permalink
Merge pull request #475 from pelias/feat/download-form-s3
Browse files Browse the repository at this point in the history
Allow download from s3
  • Loading branch information
orangejulius authored Sep 28, 2020
2 parents 62fe236 + 498c75f commit d89f586
Show file tree
Hide file tree
Showing 4 changed files with 28 additions and 4 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ FROM pelias/baseimage

# downloader apt dependencies
# note: this is done in one command in order to keep down the size of intermediate containers
RUN apt-get update && apt-get install -y unzip && rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y unzip awscli && rm -rf /var/lib/apt/lists/*

# change working dir
ENV WORKDIR /code/pelias/openaddresses
Expand Down
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,23 @@ downloading customized data. Paths are supported (for example,
`https://yourhost.com/path/to/your/data`), but must not end with a trailing
slash.

S3 buckets are supported. Files will be downloaded using aws-cli.

For example: `s3://data.openaddresses.io`.

Note: When using s3, you might need authentcation (IAM instance role, env vars, etc.)

### `imports.openaddresses.s3Options`

* Required: no

If `imports.openaddresses.dataHost` is an s3 bucket, this will add options to the command.
For example: `--profile my-profile`

This is useful, for example, when downloading from `s3://data.openaddresses.io`,
as they require the requester to pay for data transfer.
You can then use the following option: `--request-payer`

## Parallel Importing

Because OpenAddresses consists of many small files, this importer can be configured to run several instances in parallel that coordinate to import all the data.
Expand Down
1 change: 1 addition & 0 deletions schema.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ module.exports = Joi.object().keys({
files: Joi.array().items(Joi.string()),
datapath: Joi.string().required(true),
dataHost: Joi.string(),
s3Options: Joi.string(),
adminLookup: Joi.boolean(),
missingFilesAreFatal: Joi.boolean().default(false).truthy('yes').falsy('no')
}).unknown(false)
Expand Down
12 changes: 9 additions & 3 deletions utils/download_all.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ const async = require('async');
const fs = require('fs-extra');
const temp = require('temp');
const logger = require('pelias-logger').get('openaddresses-download');
const _ = require('lodash');

function downloadAll(config, callback) {
logger.info('Attempting to download all data');
Expand All @@ -25,12 +26,12 @@ function downloadAll(config, callback) {
// all share-alike data
`${dataHost}/openaddr-collected-global-sa.zip`
],
downloadBundle.bind(null, targetDir),
downloadBundle.bind(null, targetDir, config),
callback);
});
}

function downloadBundle(targetDir, sourceUrl, callback) {
function downloadBundle(targetDir, config, sourceUrl, callback) {

const tmpZipFile = temp.path({suffix: '.zip'});

Expand All @@ -39,7 +40,12 @@ function downloadBundle(targetDir, sourceUrl, callback) {
// download the zip file into the temp directory
(callback) => {
logger.debug(`downloading ${sourceUrl}`);
child_process.exec(`curl -s -L -X GET -o ${tmpZipFile} ${sourceUrl}`, callback);
if (_.startsWith(sourceUrl, 's3://')) {
const s3Options = config.imports.openaddresses.s3Options || '';
child_process.exec(`aws s3 cp ${sourceUrl} ${tmpZipFile} ${s3Options}`, callback);
} else {
child_process.exec(`curl -s -L -X GET -o ${tmpZipFile} ${sourceUrl}`, callback);
}
},
// unzip file into target directory
(callback) => {
Expand Down

0 comments on commit d89f586

Please sign in to comment.