Utilities for ETL scripts for Gobierto
Edit .env.example
and copy it to .env
or .rbenv-vars
with the expected values.
This gem relies heavily in gobierto_budgets_data
Downloads a file from an external URL
Usage:
ruby gobierto-etl-utils/operations/download/run.rb "https://input.json" /tmp/output.json [--compatible]
Output:
- File with the content of input URL
Downloads content of an endpoint and saves it into a local file
Usage:
ruby $DEV_DIR/gobierto-etl-utils/api-download/run.rb [options]
--source-url SOURCE_URL Url of the resource to be downloaded. This field is required
--output-file OUTPUT_FILE Path of the file to save the downloaded data. This field is required
--bearer-token BEARER_TOKEN Bearer token to be sent in the request header. Ignored if blank
--compatible COMPATIBLE Use and old cipher, necessary for some connections. False by default
-h, --help Prints this help
Output:
- File with the endpoint content
Downloads the files from S3 folder
Usage:
ruby gobierto-etl-utils/operations/download-s3/run.rb "dir1/dir2" /tmp/output_folder
Output:
- Files in the output folder
Uploads a file to S3 gobierto-data bucket
Usage:
ruby gobierto-etl-utils/operations/upload-s3/run.rb /tmp/foo/execution_status.yml gobierto-etl-gencat/status/last_execution.yml
Output:
- Path to the uploaded file
Checks if a JSON file is valid JSON
Usage:
ruby gobierto-etl-utils/operations/check-json/run.rb /tmp/input.json
Output:
- Returns exit code 0 if valid file
- Returns exit -1 if invalid file
Checks if a CSV file is valid CSV
Usage:
ruby gobierto-etl-utils/operations/check-csv/run.rb /tmp/input.csv
Output:
- Returns exit code 0 if valid file
- Returns exit -1 if invalid file
Converts a file into UTF-8. By default it expects the encoding to be ISO-8859-1
Usage:
`ruby gobierto-etl-utils/operations/convert-to-utf8/run.rb input_file.json output_file.json
Output:
- The input file in UTF-8 encoding
Prepares a directory to be used during the ETL. Removes it and creates it.
Usage:
ruby gobierto-etl-utils/operations/prepare-working-directory/run.rb /tmp/foo
Runs a query in Oracle using sqlplus
utility
Usage:
$DEV_DIR/gobierto-etl-utils/operations/run-oracle-query/run.rb "CONNECTION STRING" input.sql $WORKING_DIR/output.csv
Where:
- first argument: the connection string
- second argument: a file with the query to execute
- third argument: the output file (in CSV format)
Calculates the CSV and JSON files for the open data section of the given sites with the organization ID provided.
This operation is a Gobierto runner.
Usage:
/path/to/gobierto bin runner ruby gobierto-etl-utils/operations/gobierto_budgets/annual_data/run.rb "2011 2012" organization_ids.txt
Where:
list of years
is a list of yearsorganization_ids.txt
is a plain text file with an organization ID per row
Output:
- Files are generated in Gobierto
Calculates the bubbles JSON file for a set of organization IDs.
Usage:
ruby gobierto-etl-utils/operations/gobierto_budgets/bubles/run.rb organization_ids.txt
Where:
organization_ids.txt
is a plain text file with an organization ID per row
Output:
- JSON files with bubbles required data uploaded to S3.
Clear all the budgets data from an organization
Usage:
ruby gobierto-etl-utils/operations/gobierto_budgets/clear-budgets/run.rb organization_ids.txt
Where:
organization_ids.txt
is a plain text file with an organization ID per row
Output:
- No output is expected. Data is removed from Elasticsearch
Deletes total budgets data for a given set of years and a list of organizations.
Usage:
ruby gobierto-etl-utils/operations/gobierto_budgets/delete_total_budget/run.rb "2010 2012" organization_ids.txt
Where:
<years list>
is a list of years separated by a spaceorganization_ids.txt
is a plain text file with an organization ID per row
Output:
- No output is expected. Data is removed from Elasticsearch
Calculates total budgets data for a given set of years and a list of organizations.
Usage:
ruby gobierto-etl-utils/operations/gobierto_budgets/update_total_budget/run.rb "2010 2012" organization_ids.txt
Where:
<years list>
is a list of years separated by a spaceorganization_ids.txt
is a plain text file with an organization ID per row
Output:
- No output is expected. Data is created/updated in Elasticsearch
Deletes from Populate Data the providers of an organization / location
Usage:
`ruby gobierto-etl-utils/operations/gobierto_budgets/clear-previous-providers/run.rb 8019
Where:
8019
is the provider ID
Output:
- No output is expected. Data is removed from Elasticsearch
Imports planned budgets from JSON file
Usage:
`ruby gobierto-etl-utils/operations/gobierto_budgets/import-planned-budgets/run.rb input.json
Where:
input.json
is a JSON file with budgets data<year>
is the year of the data
Output:
- No output is expected. Data is created/updated in Elasticsearch
Imports planned budgets updated from JSON file
Usage:
`ruby gobierto-etl-utils/operations/gobierto_budgets/import-planned-budgets-updated/run.rb input.json
Where:
input.json
is a JSON file with budgets data<year>
is the year of the data
Output:
- No output is expected. Data is created/updated in Elasticsearch
Imports executed budgets from JSON file
Usage:
`ruby gobierto-etl-utils/operations/gobierto_budgets/import-executed-budgets/run.rb input.json
Where:
input.json
is a JSON file with budgets data<year>
is the year of the data
Output:
- No output is expected. Data is created/updated in Elasticsearch
Publishes an activity in the sites configured with the organization ID.
This operation is a Gobierto runner.
Usage:
/path/to/gobierto bin runner ruby gobierto-etl-utils/operations/gobierto/publish-activity/run.rb budgets_updated organization_ids.txt
Where:
budgets_updated
is a valid Activity from Gobiertoorganization_ids.txt
is a plain text file with an organization ID per row
Output:
- Activity is published in Gobierto
Clears rails cache of a site and module namespace.
This operation is a Gobierto runner.
Usage:
/path/to/gobierto bin runner ruby gobierto-etl-utils/operations/gobierto/clear-cache/run.rb --site-organization-id "INE_CODE" --namespace "GobiertoBudgets"
Output:
- Cache is cleared
Uploads (creates or updates) a dataset in Gobierto data
Usage:
ruby $DEV_DIR/gobierto-etl-utils/operations/gobierto_data/upload-dataset/run.rb [options]
(*) all parameters are required except those with default value
--api-token API_TOKEN Gobierto Data API Token
--gobierto-url GOBIERTO_URL Gobierto Data URL (protocol + host, i.e http://datos.gobierto.es/
--name DATASET_NAME Dataset name
--slug DATASET_SLUG Dataset slug
--table-name TABLE_NAME Dataset table-name
--file-path FILE_PATH Data file path
--append APPEND Append existing dataset (true or false). By default false
--visibility-level VISIBILITY_LEVEL
Dataset visibility level (draft or active). By default active
--csv-separator SEPARATOR CSV separator. By default ','
-d, --debug Run with debug mode enabled
-h, --help Prints this help
Output:
The command responds with a debug log with the HTTP responses
This is not an operation but the query to extract contracts from Gobierto Data. It contains a variable <PLACE_ID>
that needs to be replaced by the corresponding INE code. Then it can be converted to URI params to be used in the import. Example:
QUERY=`sed "s/<PLACE_ID>/${ALCOBENDAS_INE_CODE}/g" ${ETL_UTILS}/operations/gobierto_data/extract-tenders/query.sql | jq -s -R -r @uri`
Where sed replaces <PLACE_ID>
by the value of the enviornment variable ${ALCOBENDAS_INE_CODE}
This is not an operation but the query to extract tenders from Gobierto Data. Please refer to extract-contracts for a detailed example.