Exporters provide a flexible way to export data from multiple sources to multiple destinations, allowing filtering and transforming the data.
This Github repository is used as a central repository.
Full documentation can be found here http://exporters.readthedocs.io/en/latest/
First of all, we recommend to create a virtualenv:
virtualenv exporters source exporters/bin/activate
Installing:
pip install exporters
- Then, we can create our first configuration object and store it in a file called config.json.
- This configuration will read from an s3 bucket and store it in our filesystem, exporting only the records which have United States in field country:
{
"reader": {
"name": "exporters.readers.s3_reader.S3Reader",
"options": {
"bucket": "YOUR_BUCKET",
"aws_access_key_id": "YOUR_ACCESS_KEY",
"aws_secret_access_key": "YOUR_SECRET_KEY",
"prefix": "exporters-tutorial/sample-dataset"
}
},
"filter": {
"name": "exporters.filters.key_value_regex_filter.KeyValueRegexFilter",
"options": {
"keys": [
{"name": "country", "value": "United States"}
]
}
},
"writer":{
"name": "exporters.writers.fs_writer.FSWriter",
"options": {
"filebase": "/tmp/output/"
}
}
}
We can use the provided script to run this export:
python bin/export.py --config config.json
The export can be run using exporters as a library:
from exporters import BasicExporter
exporter = BasicExporter.from_file_configuration('config.json')
exporter.export()
Let's suppose we have a pickle file with a previously failed export job. If we want to resume it we must run the export script:
python bin/export.py --resume pickle://pickle-file.pickle