Che Logging #10290

yarivlifchuk · 2018-07-05T11:39:46Z

Summary

Logging provides system administrators with information useful for diagnostics and auditing. We propose a logging mechanism that does not require changes to existing Che code. However, we do recommend standardizing the format in which log events are written.
In addition, we propose an option to enable providing additional parameters to log entries in a standard way, to improve supportability.

Technically, the decoupling of the logging mechanism from the code is done by reading standard output on the K8S Pod level. To support this, additional industry-accepted open source components must be deployed to the K8S cluster with special focus on security aspect.

Description

Che epics [Complementary]:
Tracing - #10298, #10288
Monitoring - #10329

Che epics [to be reevaluated]:
Logging - #5483
Logstash - #6537, #7566

Background

Access to the logs of Che agents and applications running within the workspace(aka WS) is required for supportability (analysis, app behavior, monitor), also after the WS was evicted.
Logs should have separate storage and lifecycle independent of nodes and pods.
This concept is called cluster-level-logging which has several common approaches:

Use a node-level logging agent that runs on every node.
Include a dedicated sidecar container for logging in an application pod.
Push logs directly to a backend from within an application.

Using a node level logging agent is the most common and encouraged approach for K8S cluster because it creates only one agent per node and it doesn’t require any installation on each pod (where logged applications are running). It is based on application’s standard output and standard error.
https://kubernetes.io/docs/concepts/cluster-administration/logging

Logging agents (not refer to Che agent)

Common K8S logging agent options:

Stackdriver Logging
Elasticsearch

They both use fluentd as an agent on the node.

In the open source world, the two most-popular data collectors are Logstash and Fluentd. Logstash is most known for being part of the ELK Stack while Fluentd has become increasingly used by communities of users of software such as Docker, GCP, and Elasticsearch.

Logstash and Fluentd are data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it.
Main differences:

Event Routing – Logstash use algorithmic statements while Fluentd uses Tags
Performance – Logstash uses more memory in the node agents

However, the similarities between Logstash and Fluentd are greater than their differences
https://logz.io/blog/fluentd-logstash
https://www.elastic.co/guide/en/logstash/current/introduction.html
https://docs.fluentd.org/v0.12/articles/quickstart
http://larmog.github.io/2016/03/13/elk-cluster-on-kubernetes-on-arm---part-1
http://larmog.github.io/2016/05/02/efk-cluster-on-kubernetes-on-arm---part-2

Common node level agents available for Fluentd

Fluentd
DaemonSet which spawns a pod on each node that reads logs, generated by kubelet, container runtime and containers and sends them to Elasticsearch.
Fluentd is a log collector, processor, and aggregator.
https://logz.io/blog/kubernetes-log-analysis
Fluent-bit (replace the Logstash-forwarder)
Newer agent fully based in the design of Fluentd architecture uses less resources. It is log collector and processor without strong aggregations features such as Fluentd.
https://gist.github.com/StevenACoffman/4e267f0f60c8e7fcb3f77b9e504f3bd7
https://akomljen.com/get-kubernetes-logs-with-efk-stack-in-5-minutes/

Common node level agents available for Logstash

Filebeat
Lightweight way to forward and centralize logs and files. More common outside K8S, but can be used inside K8S to produce to Elasticsearch.
https://www.elastic.co/guide/en/beats/filebeat/current/running-on-kubernetes.html
https://www.elastic.co/blog/shipping-kubernetes-logs-to-elasticsearch-with-filebeat

Container Logs Collection

Cluster level logging collects the standard output and error of the applications running in the containers.
K8S logs the content of the stdout and stderr streams of a pod to a file. It creates one file for each container in a pod. The default location for these files is /var/log/containers. The filename contains: pod name, pod namespace, container name, and container id. The file contains one JSON object per line of the two streams stout and stderr. K8S exposes the content of the log file to clients via its API.

The collection process in Fluentd as an example is done in the following way:
The Fluentd parses the filename of the log file and uses this information to fetch additional metadata from the K8S API. The metadata like labels and annotations are attached to the log event as additional fields so it can be used for search and filter.

The fluentd pod mounts the /var/lib/containers/ host volume to access the logs of all pods scheduled to that Kubelets as well as a host volume for a fluentd position file. This position file saves which log lines are already shipped to the central log store.

Implementation recommendation

Write logs to stdout and stderr by the Che agents and any relevant applications.
Add custom environment params to the log’s records.
There are two kind of custom environment params.
- Mandatory param added to each log’s record, e.g. user’s tenant id.
- Optional param added only to specific log’s record, e.g. API name if relevant or trace_id.
  If the log format uses CSV like, based on delimiter, instead of enhanced JSON or XML format then
  this param need to be added to each log record (with empty value if not relevant).

ibuziuk · 2018-07-05T12:11:54Z

Opentracing epic has been created - #10288

che-bot · 2019-09-07T13:32:04Z

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

slemeur added the kind/epic A long-lived, PM-driven feature request. Must include a checklist of items that must be completed. label Jul 5, 2018

This was referenced Jul 5, 2018

Che Tracing #10298

Closed

Opentracing support for k8s / OpenShift infrastructures in Che #10288

Closed

Che Monitoring #10329

Closed

skabashnyuk changed the title ~~K8S Che6 Logging~~ Che Logging Jan 28, 2019

skabashnyuk mentioned this issue Jan 28, 2019

Generic telemetry events infrastructure #5483

Closed

3 tasks

che-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 7, 2019

che-bot closed this as completed Sep 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Che Logging #10290

Che Logging #10290

yarivlifchuk commented Jul 5, 2018 •

edited

Loading

ibuziuk commented Jul 5, 2018 •

edited

Loading

che-bot commented Sep 7, 2019

Che Logging #10290

Che Logging #10290

Comments

yarivlifchuk commented Jul 5, 2018 • edited Loading

Summary

Description

Background

Logging agents (not refer to Che agent)

Common node level agents available for Fluentd

Common node level agents available for Logstash

Container Logs Collection

Implementation recommendation

ibuziuk commented Jul 5, 2018 • edited Loading

che-bot commented Sep 7, 2019

yarivlifchuk commented Jul 5, 2018 •

edited

Loading

ibuziuk commented Jul 5, 2018 •

edited

Loading