Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New parse_aws_elb_log Remap function #5365

Closed
binarylogic opened this issue Dec 3, 2020 · 0 comments · Fixed by #5489
Closed

New parse_aws_elb_log Remap function #5365

binarylogic opened this issue Dec 3, 2020 · 0 comments · Fixed by #5489
Assignees
Labels
domain: parsing Anything related to parsing within Vector domain: vrl Anything related to the Vector Remap Language provider: aws Anything `aws` service provider related type: feature A value-adding code addition that introduce new functionality.

Comments

@binarylogic
Copy link
Contributor

binarylogic commented Dec 3, 2020

A remap function to parse AWS ELB logs. This is necessary when using the new aws_s3 source. The logs coming out of that source are in raw format and will need parsing.

Example

Given the following ELB logs (more examples:

http 2018-07-02T22:23:00.186641Z app/my-loadbalancer/50dc6c495c0c9188 
192.168.131.39:2817 10.0.0.1:80 0.000 0.001 0.000 200 200 34 366 
"GET http://www.example.com:80/ HTTP/1.1" "curl/7.46.0" - - 
arn:aws:elasticloadbalancing:us-east-2:123456789012:targetgroup/my-targets/73e2d6bc24d8a067
"Root=1-58337262-36d228ad5d99923122bbe354" "-" "-" 
0 2018-07-02T22:22:48.364000Z "forward" "-" "-" 10.0.0.1:80 200 "-" "-"

A user should be able to parse this with:

. = parse_aws_elb(.message)

This would result in:

{
	"type": "HTTP",
	"timestamp": "2018-07-02T22:23:00.186641Z", // should be a native timestamp
	"elb": "app/my-loadbalancer/50dc6c495c0c9188",
	"client_host": "192.168.131.39:2817",
	"target_host": "10.0.0.1:80",
	"request_processing_time": 0.000,
	"target_processing_time": 0.001,
	"response_processing_time": 0.000,
	"elb_status_code": 200
	"target_status_code": 200,
	"received_bytes": 34,
	"sent_bytes": 366,
	"request_method": "GET",
	"request_url": "http://www.example.com:80/",
	"request_protocol": "HTTP/1.1",
	"user_agent": "curl/7.46.0",
	"ssl_cipher": null,
	"ssl_protocol": null,
	"target_group_arn": "arn:aws:elasticloadbalancing:us-east-2:123456789012:targetgroup/my-targets/73e2d6bc24d8a067",
	"trace_id": "Root=1-58337262-36d228ad5d99923122bbe354",
	"domain_name": null,
	"chosen_cert_arn": null,
	"matched_rule_priority": 0,
	"request_creation_time": "2018-07-02T22:22:48.364000Z", // should be a native timestamp
	"actions_executed": "forward",
	"redirect_url": null,
	"error_reason": null
}

Regex

@jszwedko wrote this in our docs that might be helpful:

To parse AWS load balancer logs, the regex_parser transform can be used:

[transforms.elasticloadbalancing_fields_parsed]
  type = "regex_parser"
  inputs = ["s3"]
  regex = '(?x)^
      (?P<type>[\\w]+)[ ]
      (?P<timestamp>[\\w:.-]+)[ ]
      (?P<elb>[^\\s]+)[ ]
      (?P<client_host>[\\d.:-]+)[ ]
      (?P<target_host>[\\d.:-]+)[ ]
      (?P<request_processing_time>[\\d.-]+)[ ]
      (?P<target_processing_time>[\\d.-]+)[ ]
      (?P<response_processing_time>[\\d.-]+)[ ]
      (?P<elb_status_code>[\\d-]+)[ ]
      (?P<target_status_code>[\\d-]+)[ ]
      (?P<received_bytes>[\\d-]+)[ ]
      (?P<sent_bytes>[\\d-]+)[ ]
      "(?P<request_method>[\\w-]+)[ ]
      (?P<request_url>[^\\s]+)[ ]
      (?P<request_protocol>[^"\\s]+)"[ ]
      "(?P<user_agent>[^"]+)"[ ]
      (?P<ssl_cipher>[^\\s]+)[ ]
      (?P<ssl_protocol>[^\\s]+)[ ]
      (?P<target_group_arn>[\\w.:/-]+)[ ]
      "(?P<trace_id>[^\\s"]+)"[ ]
      "(?P<domain_name>[^\\s"]+)"[ ]
      "(?P<chosen_cert_arn>[\\w:./-]+)"[ ]
      (?P<matched_rule_priority>[\\d-]+)[ ]
      (?P<request_creation_time>[\\w.:-]+)[ ]
      "(?P<actions_executed>[\\w,-]+)"[ ]
      "(?P<redirect_url>[^"]+)"[ ]
      "(?P<error_reason>[^"]+)"'
  field = "message"
  drop_failed = false

  types.received_bytes = "int"
  types.request_processing_time = "float"
  types.sent_bytes = "int"
  types.target_processing_time = "float"
  types.response_processing_time = "float"

[transforms.elasticloadbalancing_url_parsed]
  type = "regex_parser"
  inputs = ["elasticloadbalancing_fields_parsed"]
  regex = '^(?P<url_scheme>[\\w]+)://(?P<url_hostname>[^\\s:/?#]+)(?::(?P<request_port>[\\d-]+))?-?(?:/(?P<url_path>[^\\s?#]*))?(?P<request_url_query>\\?[^\\s#]+)?'
  field = "request_url"
  drop_failed = false
@binarylogic binarylogic added provider: aws Anything `aws` service provider related type: feature A value-adding code addition that introduce new functionality. domain: parsing Anything related to parsing within Vector domain: vrl Anything related to the Vector Remap Language labels Dec 3, 2020
@binarylogic binarylogic changed the title New parse_aws_elb Remap function New parse_aws_elb_log Remap function Dec 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: parsing Anything related to parsing within Vector domain: vrl Anything related to the Vector Remap Language provider: aws Anything `aws` service provider related type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants