Gobierto ETL utils

Utilities for ETL scripts for Gobierto

Setup

Edit .env.example and copy it to .env or .rbenv-vars with the expected values.

This gem relies heavily in gobierto_budgets_data

Available operations

common/download

Downloads a file from an external URL

Usage:

ruby gobierto-etl-utils/operations/download/run.rb "https://input.json" /tmp/output.json [--compatible]

Output:

File with the content of input URL

common/api-download

Downloads content of an endpoint and saves it into a local file

Usage:

ruby $DEV_DIR/gobierto-etl-utils/api-download/run.rb [options]

        --source-url SOURCE_URL      Url of the resource to be downloaded. This field is required
        --output-file OUTPUT_FILE    Path of the file to save the downloaded data. This field is required
        --bearer-token BEARER_TOKEN  Bearer token to be sent in the request header. Ignored if blank
        --compatible COMPATIBLE      Use and old cipher, necessary for some connections. False by default
    -h, --help                       Prints this help

Output:

File with the endpoint content

commond/download S3

Downloads the files from S3 folder

Usage:

ruby gobierto-etl-utils/operations/download-s3/run.rb "dir1/dir2" /tmp/output_folder

Output:

Files in the output folder

common/upload S3

Uploads a file to S3 gobierto-data bucket

Usage:

ruby gobierto-etl-utils/operations/upload-s3/run.rb /tmp/foo/execution_status.yml gobierto-etl-gencat/status/last_execution.yml

Output:

Path to the uploaded file

common/check-json

Checks if a JSON file is valid JSON

Usage:

ruby gobierto-etl-utils/operations/check-json/run.rb /tmp/input.json

Output:

Returns exit code 0 if valid file
Returns exit -1 if invalid file

common/check-csv

Checks if a CSV file is valid CSV

Usage:

ruby gobierto-etl-utils/operations/check-csv/run.rb /tmp/input.csv

Output:

Returns exit code 0 if valid file
Returns exit -1 if invalid file

common/convert to UTF-8

Converts a file into UTF-8. By default it expects the encoding to be ISO-8859-1

Usage:

`ruby gobierto-etl-utils/operations/convert-to-utf8/run.rb input_file.json output_file.json

Output:

The input file in UTF-8 encoding

common/prepare working directory

Prepares a directory to be used during the ETL. Removes it and creates it.

Usage:

ruby gobierto-etl-utils/operations/prepare-working-directory/run.rb /tmp/foo

common/run Oracle query

Runs a query in Oracle using sqlplus utility

Usage:

$DEV_DIR/gobierto-etl-utils/operations/run-oracle-query/run.rb "CONNECTION STRING" input.sql $WORKING_DIR/output.csv

Where:

first argument: the connection string
second argument: a file with the query to execute
third argument: the output file (in CSV format)

gobierto-budgets/annual data

Calculates the CSV and JSON files for the open data section of the given sites with the organization ID provided.

This operation is a Gobierto runner.

Usage:

/path/to/gobierto bin runner ruby gobierto-etl-utils/operations/gobierto_budgets/annual_data/run.rb "2011 2012" organization_ids.txt

Where:

list of years is a list of years
organization_ids.txt is a plain text file with an organization ID per row

Output:

Files are generated in Gobierto

gobierto-budgets/bubbles

Calculates the bubbles JSON file for a set of organization IDs.

Usage:

ruby gobierto-etl-utils/operations/gobierto_budgets/bubles/run.rb organization_ids.txt

Where:

organization_ids.txt is a plain text file with an organization ID per row

Output:

JSON files with bubbles required data uploaded to S3.

gobierto-budgets/clear budgets

Clear all the budgets data from an organization

Usage:

ruby gobierto-etl-utils/operations/gobierto_budgets/clear-budgets/run.rb organization_ids.txt

Where:

organization_ids.txt is a plain text file with an organization ID per row

Output:

No output is expected. Data is removed from Elasticsearch

gobierto-budgets/delete total budgets

Deletes total budgets data for a given set of years and a list of organizations.

Usage:

ruby gobierto-etl-utils/operations/gobierto_budgets/delete_total_budget/run.rb "2010 2012" organization_ids.txt

Where:

<years list> is a list of years separated by a space
organization_ids.txt is a plain text file with an organization ID per row

Output:

No output is expected. Data is removed from Elasticsearch

gobierto-budgets/calculate total budget

Calculates total budgets data for a given set of years and a list of organizations.

Usage:

ruby gobierto-etl-utils/operations/gobierto_budgets/update_total_budget/run.rb "2010 2012" organization_ids.txt

Where:

<years list> is a list of years separated by a space
organization_ids.txt is a plain text file with an organization ID per row

Output:

No output is expected. Data is created/updated in Elasticsearch

gobierto-budgets/clear previous providers

Deletes from Populate Data the providers of an organization / location

Usage:

`ruby gobierto-etl-utils/operations/gobierto_budgets/clear-previous-providers/run.rb 8019

Where:

8019 is the provider ID

Output:

No output is expected. Data is removed from Elasticsearch

gobierto-budgets/import planned budgets

Imports planned budgets from JSON file

Usage:

`ruby gobierto-etl-utils/operations/gobierto_budgets/import-planned-budgets/run.rb input.json

Where:

input.json is a JSON file with budgets data
<year> is the year of the data

Output:

No output is expected. Data is created/updated in Elasticsearch

gobierto-budgets/import planned budgets updated

Imports planned budgets updated from JSON file

Usage:

`ruby gobierto-etl-utils/operations/gobierto_budgets/import-planned-budgets-updated/run.rb input.json

Where:

input.json is a JSON file with budgets data
<year> is the year of the data

Output:

No output is expected. Data is created/updated in Elasticsearch

gobierto-budgets/mport executed budgets

Imports executed budgets from JSON file

Usage:

`ruby gobierto-etl-utils/operations/gobierto_budgets/import-executed-budgets/run.rb input.json

Where:

input.json is a JSON file with budgets data
<year> is the year of the data

Output:

No output is expected. Data is created/updated in Elasticsearch

gobierto/publish activity

Publishes an activity in the sites configured with the organization ID.

This operation is a Gobierto runner.

Usage:

/path/to/gobierto bin runner ruby gobierto-etl-utils/operations/gobierto/publish-activity/run.rb budgets_updated organization_ids.txt

Where:

budgets_updated is a valid Activity from Gobierto
organization_ids.txt is a plain text file with an organization ID per row

Output:

Activity is published in Gobierto

gobierto/clear cache

Clears rails cache of a site and module namespace.

This operation is a Gobierto runner.

Usage:

/path/to/gobierto bin runner ruby gobierto-etl-utils/operations/gobierto/clear-cache/run.rb --site-organization-id "INE_CODE" --namespace "GobiertoBudgets"

Output:

Cache is cleared

gobierto-data/upload dataset

Uploads (creates or updates) a dataset in Gobierto data

Usage:

ruby $DEV_DIR/gobierto-etl-utils/operations/gobierto_data/upload-dataset/run.rb [options]

       (*) all parameters are required except those with default value
        --api-token API_TOKEN        Gobierto Data API Token
        --gobierto-url GOBIERTO_URL  Gobierto Data URL (protocol + host, i.e http://datos.gobierto.es/
        --name DATASET_NAME          Dataset name
        --slug DATASET_SLUG          Dataset slug
        --table-name TABLE_NAME      Dataset table-name
        --file-path FILE_PATH        Data file path
        --append APPEND              Append existing dataset (true or false). By default false
        --visibility-level VISIBILITY_LEVEL
                                     Dataset visibility level (draft or active). By default active
        --csv-separator SEPARATOR    CSV separator. By default ','
    -d, --debug                      Run with debug mode enabled
    -h, --help                       Prints this help

Output:

The command responds with a debug log with the HTTP responses

gobierto-data/extract-contracts

This is not an operation but the query to extract contracts from Gobierto Data. It contains a variable <PLACE_ID> that needs to be replaced by the corresponding INE code. Then it can be converted to URI params to be used in the import. Example:

QUERY=`sed "s/<PLACE_ID>/${ALCOBENDAS_INE_CODE}/g" ${ETL_UTILS}/operations/gobierto_data/extract-tenders/query.sql | jq -s -R -r @uri`

Where sed replaces <PLACE_ID> by the value of the enviornment variable ${ALCOBENDAS_INE_CODE}

gobierto-data/extract-tenders

This is not an operation but the query to extract tenders from Gobierto Data. Please refer to extract-contracts for a detailed example.

Name		Name	Last commit message	Last commit date
Latest commit History 513 Commits
lib		lib
operations		operations
pipelines		pipelines
.env.example		.env.example
.gitignore		.gitignore
.rubocop.yml		.rubocop.yml
.ruby-version		.ruby-version
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
schema.json		schema.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gobierto ETL utils

Setup

Available operations

common/download

common/api-download

commond/download S3

common/upload S3

common/check-json

common/check-csv

common/convert to UTF-8

common/prepare working directory

common/run Oracle query

gobierto-budgets/annual data

gobierto-budgets/bubbles

gobierto-budgets/clear budgets

gobierto-budgets/delete total budgets

gobierto-budgets/calculate total budget

gobierto-budgets/clear previous providers

gobierto-budgets/import planned budgets

gobierto-budgets/import planned budgets updated

gobierto-budgets/mport executed budgets

gobierto/publish activity

gobierto/clear cache

gobierto-data/upload dataset

gobierto-data/extract-contracts

gobierto-data/extract-tenders

About

Releases

Packages

Contributors 8

Languages

PopulateTools/gobierto-etl-utils

Folders and files

Latest commit

History

Repository files navigation

Gobierto ETL utils

Setup

Available operations

common/download

common/api-download

commond/download S3

common/upload S3

common/check-json

common/check-csv

common/convert to UTF-8

common/prepare working directory

common/run Oracle query

gobierto-budgets/annual data

gobierto-budgets/bubbles

gobierto-budgets/clear budgets

gobierto-budgets/delete total budgets

gobierto-budgets/calculate total budget

gobierto-budgets/clear previous providers

gobierto-budgets/import planned budgets

gobierto-budgets/import planned budgets updated

gobierto-budgets/mport executed budgets

gobierto/publish activity

gobierto/clear cache

gobierto-data/upload dataset

gobierto-data/extract-contracts

gobierto-data/extract-tenders

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

Packages