Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Data UI - CSV #6541

Closed
Bargs opened this issue Mar 15, 2016 · 4 comments
Closed

Add Data UI - CSV #6541

Bargs opened this issue Mar 15, 2016 · 4 comments
Assignees
Labels
Feature:Add Data Add Data and sample data feature on Home release_note:enhancement

Comments

@Bargs
Copy link
Contributor

Bargs commented Mar 15, 2016

Related to #5974

The goal is to create a UI that makes it easy to upload CSV files to Elasticsearch and start playing with them in Kibana without having to use any external tools like Logstash.

Questions

  • How many different escape/quote schemes do we want to support? It's easy enough to switch out quote characters, but parsing quotes vs an escape character like \ will require slightly different code, and I'm not sure what other schemes are common in CSVs. Perhaps there's a pre-existing library I can use that will cover all the common cases.
  • Given we're dealing with a finite amount of data, should the pattern review step bother with index rotation and wildcard patterns? For a single CSV it seems needlessly complicated, but it might be useful is the user is going to upload additional data on a continuing basis.
  • As a follow up to that question, do we need a screen for uploading CSVs to existing ingest configs (pattern + pipeline + template)? Perhaps that's suited to a separate issue and PR.

Mockups and Workflow

Step 0 - New option on the Add Data landing page

The CSV wizard will appear as an additional option on the Add Data landing page.

Step 0

Step 1 - Upload

The first screen will allow the user to select a CSV file for upload and show them a preview in tabular format. The preview will initially show a limited number of rows from the file, potentially with the option to page through. The user will have an option to pick a delimiter and quote/escape character. We might try to auto detect these.

Step 1

Step 2 - Create a pipeline

This step allows the user to build an ingest pipeline to manipulate their data before it gets indexed in Elasticsearch. Works just like the pipeline step in the tail a file wizard #5974.

Step 2

Step 3 - Creating a Kibana Index Pattern

Create a Kibana Index pattern based on the sample output from the ingest pipeline. Allows the user to specify an index pattern name, a timestamp field, and the mapping type of all their fields. Works just like the matching step in the tail a file wizard #5974.

Step 3

Step 4 - Done

Once the user clicks save in the previous step, Kibana will do a number of things in the background:

  • Create an index template
  • Create the ingest pipeline
  • Create a Kibana index pattern
  • Send the CSV data to elasticsearch using the given index name and pipeline

This page will probably need some sort of progress indicator in case it's working with a large file. Once all the operations are complete we'll enable a button that takes the user to the discover page for their new index pattern.

Step 4

@tbragin
Copy link
Contributor

tbragin commented Mar 15, 2016

My 2c

  • I think for Phase 1 it'd be ok to just dump the contents of the CSV into a single index and not allow user to add documents to existing indices.
  • Longer-term ability to upload multiple CSV files into the same index could definitely be useful.
  • As far as time-based indices, honestly, i'm not sure. My sense is that datasets kept in CSVs are relatively small, and even if there is a time-based component to them, they might do fine in a single index.
  • Regarding quoting, if I understood the question correctly, I've seen single and double quotes as commonly-supported config options during an upload. Some screenshots from other vendors below, if it helps.

This might be a good dataset to play with for the advance use case... multiple files per logical index, date/time stamps, geo coordinates:
http://www1.ncdc.noaa.gov/pub/data/swdi/stormevents/csvfiles/

screen shot 2016-03-15 at 10 50 46 am

screen shot 2016-03-15 at 10 49 26 am

screen shot 2016-03-15 at 10 50 37 am

screen shot 2016-03-15 at 10 50 31 am

@Bargs Bargs added the Feature:Add Data Add Data and sample data feature on Home label Mar 23, 2016
@Bargs
Copy link
Contributor Author

Bargs commented Apr 5, 2016

@tbragin how crucial do you think the customizable quote character is? The library I'm using right now only supports double quotes, and RFC 4180 specifies double quotes should be used, so I'm wondering if it's worth waiting to see if there's any demand for customization before adding it.

@tbragin
Copy link
Contributor

tbragin commented Apr 5, 2016

@Bargs I don't think it's necessary for phase 1 - I'd be fine taking the wait-and-see approach on that one.

@RedCloudDC
Copy link

There is a demand to upload large csv files into kibana without using logstash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Add Data Add Data and sample data feature on Home release_note:enhancement
Projects
None yet
Development

No branches or pull requests

3 participants