-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/2108 - csv parser #4439
Changes from 11 commits
a6c1e2b
c839ce3
3c8cb17
4a07734
48210f5
67f4929
d24e687
e07ed58
edd8afc
b5ff78f
7704f3e
83db721
80135ee
339670f
60761d7
24e38f3
fc36fd5
6e7ec3e
0d7b236
20ed819
5016899
162b092
86d353f
c058db6
4847a59
b408ac4
acc5ea7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,6 +10,7 @@ Telegraf is able to parse the following input data formats into metrics: | |
1. [Collectd](https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md#collectd) | ||
1. [Dropwizard](https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md#dropwizard) | ||
1. [Grok](https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md#grok) | ||
1. [CSV](https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md#csv) | ||
|
||
Telegraf metrics, like InfluxDB | ||
[points](https://docs.influxdata.com/influxdb/v0.10/write_protocols/line/), | ||
|
@@ -761,4 +762,77 @@ HTTPD_ERRORLOG %{HTTPD20_ERRORLOG}|%{HTTPD24_ERRORLOG} | |
## 2. "Canada/Eastern" -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones | ||
## 3. UTC -- or blank/unspecified, will return timestamp in UTC | ||
grok_timezone = "Canada/Eastern" | ||
``` | ||
``` | ||
|
||
# CSV | ||
Parse out metrics from a CSV formatted table. By default, the parser assumes there is no header and | ||
will read data from the first line. If `csv_header` is true, the parser will extract column names from | ||
the first row and will begin parsing data on the second row. | ||
|
||
To assign custom column names, the `csv_data_columns` config is available. If the `csv_data_columns` | ||
config is used, all columns must be named or an error will be thrown. If `csv_header` is set to false, | ||
`csv_data_columns` must be specified. Names listed in `csv_data_columns` will override names extracted | ||
from the header. | ||
|
||
The `csv_tag_columns` and `csv_field_columns` configs are available to add the column data to the metric. | ||
The name used to specify the column is the name in the header, or if specified, the corresponding | ||
name assigned in `csv_data_columns`. If neither config is specified, no data will be added to the metric. | ||
|
||
Additional configs are available to dynamically name metrics and set custom timestamps. If the | ||
`csv_name_column` config is specified, the parser will assign the metric name to the value found | ||
in that column. If the `csv_timestamp_column` is specified, the parser will extract the timestamp from | ||
that column. If `csv_timestamp_column` is specified, the `csv_timestamp_format` must also be specified | ||
or an error will be thrown. | ||
|
||
#### CSV Configuration | ||
```toml | ||
data_format = "csv" | ||
|
||
## Whether or not to treat the first row of data as a header | ||
## By default, the parser assumes there is no header and will parse the | ||
## first row as data. If set to true the parser will treat the first row | ||
## as a header, extract the list of column names, and begin parsing data | ||
## on the second line. If `csv_data_columns` is specified, the column | ||
## names in header will be overridden. | ||
# csv_header = false | ||
|
||
## The seperator between csv fields | ||
## By default, the parser assumes a comma (",") | ||
# csv_delimiter = "," | ||
|
||
## The character reserved for marking a row as a comment row | ||
## Commented rows are skipped and not parsed | ||
# csv_comment = "" | ||
|
||
## If set to true, the parser will remove leading whitespace from fields | ||
## By default, this is false | ||
# csv_trim_space = false | ||
|
||
## For assigning custom names to columns | ||
## If this is specified, all columns must have a name | ||
## ie there should be the same number of names listed | ||
## as there are columns of data | ||
## If `csv_header` is set to false, this config must be used | ||
csv_data_columns = [] | ||
|
||
## Columns listed here will be added as tags | ||
csv_tag_columns = [] | ||
|
||
## Columns listed here will be added as fields | ||
## the field type is infered from the value of the field | ||
csv_field_columns = [] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should add all non-tag columns as fields. If someone wants to skip a field they can use fieldpass/fielddrop |
||
|
||
## The column to extract the name of the metric from | ||
## By default, this is the name of the plugin | ||
## the `name_override` config overrides this | ||
# csv_name_column = "" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Call this |
||
|
||
## The column to extract time information for the metric | ||
## `csv_timestamp_format` must be specified if this is used | ||
# csv_timestamp_column = "" | ||
|
||
## The format of time data extracted from `csv_timestamp_column` | ||
## this must be specified if `csv_timestamp_column` is specified | ||
# csv_timestamp_format = "" | ||
``` | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1399,6 +1399,121 @@ func buildParser(name string, tbl *ast.Table) (parsers.Parser, error) { | |
} | ||
} | ||
|
||
//for csv parser | ||
if node, ok := tbl.Fields["csv_data_columns"]; ok { | ||
if kv, ok := node.(*ast.KeyValue); ok { | ||
if ary, ok := kv.Value.(*ast.Array); ok { | ||
for _, elem := range ary.Value { | ||
if str, ok := elem.(*ast.String); ok { | ||
c.CSVDataColumns = append(c.CSVDataColumns, str.Value) | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
if node, ok := tbl.Fields["csv_tag_columns"]; ok { | ||
if kv, ok := node.(*ast.KeyValue); ok { | ||
if ary, ok := kv.Value.(*ast.Array); ok { | ||
for _, elem := range ary.Value { | ||
if str, ok := elem.(*ast.String); ok { | ||
c.CSVTagColumns = append(c.CSVTagColumns, str.Value) | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
if node, ok := tbl.Fields["csv_field_columns"]; ok { | ||
if kv, ok := node.(*ast.KeyValue); ok { | ||
if ary, ok := kv.Value.(*ast.Array); ok { | ||
for _, elem := range ary.Value { | ||
if str, ok := elem.(*ast.String); ok { | ||
c.CSVFieldColumns = append(c.CSVFieldColumns, str.Value) | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
if node, ok := tbl.Fields["csv_delimiter"]; ok { | ||
if kv, ok := node.(*ast.KeyValue); ok { | ||
if str, ok := kv.Value.(*ast.String); ok { | ||
c.CSVDelimiter = str.Value | ||
} | ||
} | ||
} | ||
|
||
if node, ok := tbl.Fields["csv_comment"]; ok { | ||
if kv, ok := node.(*ast.KeyValue); ok { | ||
if str, ok := kv.Value.(*ast.String); ok { | ||
c.CSVComment = str.Value | ||
} | ||
} | ||
} | ||
|
||
if node, ok := tbl.Fields["csv_name_column"]; ok { | ||
if kv, ok := node.(*ast.KeyValue); ok { | ||
if str, ok := kv.Value.(*ast.String); ok { | ||
c.CSVNameColumn = str.Value | ||
} | ||
} | ||
} | ||
|
||
if node, ok := tbl.Fields["csv_timestamp_column"]; ok { | ||
if kv, ok := node.(*ast.KeyValue); ok { | ||
if str, ok := kv.Value.(*ast.String); ok { | ||
c.CSVTimestampColumn = str.Value | ||
} | ||
} | ||
} | ||
|
||
if node, ok := tbl.Fields["csv_timestamp_format"]; ok { | ||
if kv, ok := node.(*ast.KeyValue); ok { | ||
if str, ok := kv.Value.(*ast.String); ok { | ||
c.CSVTimestampFormat = str.Value | ||
} | ||
} | ||
} | ||
|
||
if node, ok := tbl.Fields["csv_header"]; ok { | ||
if kv, ok := node.(*ast.KeyValue); ok { | ||
if str, ok := kv.Value.(*ast.Boolean); ok { | ||
//for config with no quotes | ||
val, _ := strconv.ParseBool(str.Value) | ||
c.CSVHeader = val | ||
} else { | ||
//for config with quotes | ||
strVal := kv.Value.(*ast.String) | ||
val, err := strconv.ParseBool(strVal.Value) | ||
if err != nil { | ||
log.Printf("E! parsing to bool: %v", err) | ||
} else { | ||
c.CSVHeader = val | ||
} | ||
} | ||
} | ||
} | ||
|
||
if node, ok := tbl.Fields["csv_trim_space"]; ok { | ||
if kv, ok := node.(*ast.KeyValue); ok { | ||
if str, ok := kv.Value.(*ast.Boolean); ok { | ||
//for config with no quotes | ||
val, _ := strconv.ParseBool(str.Value) | ||
c.CSVTrimSpace = val | ||
} else { | ||
//for config with quotes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need to have these else clauses, if its not a bool then it should be an error. This is actually a bug throughout this function, when the type is wrong for the field name it looks like currently we delete the field, when we should return an error and refuse to start Telegraf. |
||
strVal := kv.Value.(*ast.String) | ||
val, err := strconv.ParseBool(strVal.Value) | ||
if err != nil { | ||
log.Printf("E! parsing to bool: %v", err) | ||
} else { | ||
c.CSVTrimSpace = val | ||
} | ||
} | ||
} | ||
} | ||
|
||
c.MetricName = name | ||
|
||
delete(tbl.Fields, "data_format") | ||
|
@@ -1420,6 +1535,14 @@ func buildParser(name string, tbl *ast.Table) (parsers.Parser, error) { | |
delete(tbl.Fields, "grok_custom_patterns") | ||
delete(tbl.Fields, "grok_custom_pattern_files") | ||
delete(tbl.Fields, "grok_timezone") | ||
delete(tbl.Fields, "csv_data_columns") | ||
delete(tbl.Fields, "csv_tag_columns") | ||
delete(tbl.Fields, "csv_field_columns") | ||
delete(tbl.Fields, "csv_name_column") | ||
delete(tbl.Fields, "csv_timestamp_column") | ||
delete(tbl.Fields, "csv_timestamp_format") | ||
delete(tbl.Fields, "csv_delimiter") | ||
delete(tbl.Fields, "csv_header") | ||
|
||
return parsers.NewParser(c) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this
csv_column_names