Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Che Tracing #10298

Closed
9 of 20 tasks
yarivlifchuk opened this issue Jul 5, 2018 · 1 comment
Closed
9 of 20 tasks

Che Tracing #10298

yarivlifchuk opened this issue Jul 5, 2018 · 1 comment
Labels
kind/epic A long-lived, PM-driven feature request. Must include a checklist of items that must be completed. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@yarivlifchuk
Copy link

yarivlifchuk commented Jul 5, 2018

Summary

We propose a tracing mechanism that does not require changes to existing Che code. However, we do recommend standardizing the format in which trace are written.
In addition, we propose an option to enable providing additional parameters to trace entries in a standard way, to improve supportability.
To support this, industry-accepted open source components must be deployed to the K8S cluster with special focus on security aspect.

Description

Che epics [Complementary]:
Logging - #10290
Monitoring - #10329
Tracing - #10288
Complementary epic to the opentracing & Jaeger

Che epics [to be reevaluated]:
redhat-developer/rh-che#718
It gives a view on the K8S Tracing best practice and the implication on Che agents and should be complementary epic.

Background

In a distributed system, a trace encapsulates the transaction’s state as it propagates through the system. Tracing helps gather timing data needed to troubleshoot latency problems in microservice architectures, and tells the story of a transaction or workflow as it propagates through a distributed system.
Every transaction might reflect performance anomalies in an early phase when new services are being introduced by independent teams.

Tracers live in the applications. It records timing and metadata about operations that took place. They often instrument libraries, so that their use is transparent to users. For example, an instrumented web server records when it received a request and when it sent a response. The trace data collected is called a Span.
Main benefits of distributed tracing:

  • Distributed context propagation: illuminates the request’s path until its final destination.
  • Out of the box infrastructure overview: how the interactions between services are done and their dependencies.
  • Efficient and fast detection of latency and bottleneck issues.

https://sematext.com/blog/opentracing-distributed-tracing-emerging-industry-standard
http://opentracing.io/documentation/pages/instrumentation

Standard API and Common Tracers

OpenTracing API - offer a consistent, unified and tracer-agnostic instrumentation API for a wide range of frameworks, platforms and programming languages. It abstracts away the differences among numerous tracer implementations, so shifting from an existing one to a new tracer system would only require configuration changes specific to that new tracer.

Common distributed tracers are Zipkin and Jaeger inspired on Google’s Dapper large-scale distributed tracing platform.

  1. Zipkin
    Zipkin is an open-source distributed tracing solution implemented in Java and with OpenTracing compatible API. It manages both the collection and lookup of this data.
    Zipkin component is installed per cluster. It collects and display tracing data. Each node is installed with Linkerd as a DeamonSet that export tracing data.
    https://sematext.com/blog/opentracing-zipkin-as-distributed-tracer

  2. Jaeger
    Inspired by Dapper and OpenZipkin, is a distributed tracing system released as open source by Uber Technologies. Jaeger extends more complex architecture for larger scale of requests and performance.
    Jaeger has better language coverage of OpenTracing-compatible clients, low memory footprint, and a modern, scalable design compared to Zipkin.
    Jaeger Collector and Query is installed per cluster and require a backing storage. The Jaeger Agent is deployed as a DeamonSet and receive traces via UDP. All pods running on a given node will send data to the same agent. If that's not suitable for workload, an alternative is to deploy the agent as a sidecar.
    https://github.com/jaegertracing/jaeger

Implementation recommendation

  1. Write trace based on the relevant library by the Che agents and any relevant Applications.
  2. Add custom environment params to the trace’s records, e.g. user’s tenant id.

Implementation

This was referenced Jul 5, 2018
@slemeur slemeur added the kind/epic A long-lived, PM-driven feature request. Must include a checklist of items that must be completed. label Jul 9, 2018
@skabashnyuk skabashnyuk changed the title K8S Che6 Tracing Che Tracing Jan 28, 2019
@l0rd l0rd mentioned this issue Mar 19, 2019
@che-bot
Copy link
Contributor

che-bot commented Sep 7, 2019

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

@che-bot che-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 7, 2019
@che-bot che-bot closed this as completed Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/epic A long-lived, PM-driven feature request. Must include a checklist of items that must be completed. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants