CsvToEs

This is a POC to parse a CSV in a streaming fashion and store a JSON-object derived from a CSV-line in Elasticsearch, using the Bulk-API of Elasticsearch.

Build

mix escript.build

Load a CSV

./csv_to_es <FILENAME>

Limitations

Currently only a ;-separated file is supported which must have a header-line for naming the ES-doc-fields. This project has only been tested with a bagadres-full.csv file downloaded from NLExtract.nl download
Elasticsearch is expected to run at localhost
As we don't create an _id field explicitly, multiple runs of the tool will create duplicates
The batch-size is fixed at 1_000 this figure has been made up with no test or knowledge whatsoever
The time-out of 60s has been chose as "large enough" to avoid timeouts
No error-handling is implemented
The target index is hardcoded to elixir-csv

Tip

Before running this tool you'd best set the number_of_replicas to 0 and the refresh_interval to -1 for the target-elasticsearch-index elixir-csv

Can I use this in Production

Probably not as is

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CsvToEs

Build

Load a CSV

Limitations

Tip

Can I use this in Production

Files

README.md

Latest commit

History

README.md

File metadata and controls

CsvToEs

Build

Load a CSV

Limitations

Tip

Can I use this in Production