English | 中文
Opentelemetry for Kitex
OpenTelemetry is an open source observability framework from CNCF that consist of a series of tools, APIs and SDKs, and it enables IT teams to detect, generate, collect, and export remote monitoring data for analysis and understanding of software performance and behavior.
The obs-opentelemetry extension is available in the kitex-contrib, which allows kitex to integrate OpenTelemetry with a simple setup.
- Out-of-the-box default opentelemetry provider
- Support setting via environment variables
- Support server and client kitex rpc tracing
- Support automatic transparent transmission of peer service through meta info
- Support kitex rpc metrics [R.E.D]
- Support service topology map metrics [Service Topology Map]
- Support go runtime metrics
- Extend kitex logger based on logrus and zap
- Implement tracing auto associated logs
import (
...
"github.com/kitex-contrib/obs-opentelemetry/provider"
"github.com/kitex-contrib/obs-opentelemetry/tracing"
)
func main() {
serviceName := "echo"
p := provider.NewOpenTelemetryProvider(
provider.WithServiceName(serviceName),
provider.WithExportEndpoint("localhost:4317"),
provider.WithInsecure(),
)
defer p.Shutdown(context.Background())
svr := echo.NewServer(
new(EchoImpl),
server.WithSuite(tracing.NewServerSuite()),
// Please keep the same as provider.WithServiceName
server.WithServerBasicInfo(&rpcinfo.EndpointBasicInfo{ServiceName: serviceName}),
)
if err := svr.Run(); err != nil {
klog.Fatalf("server stopped with error:", err)
}
}
import (
...
"github.com/kitex-contrib/obs-opentelemetry/provider"
"github.com/kitex-contrib/obs-opentelemetry/tracing"
)
func main(){
serviceName := "echo-client"
p := provider.NewOpenTelemetryProvider(
provider.WithServiceName(serviceName),
provider.WithExportEndpoint("localhost:4317"),
provider.WithInsecure(),
)
defer p.Shutdown(context.Background())
c, err := echo.NewClient(
"echo",
client.WithSuite(tracing.NewClientSuite()),
// Please keep the same as provider.WithServiceName
client.WithClientBasicInfo(&rpcinfo.EndpointBasicInfo{ServiceName: serviceName}),
)
if err != nil {
klog.Fatal(err)
}
}
import (
kitexlogrus "github.com/kitex-contrib/obs-opentelemetry/logging/logrus"
)
func init() {
klog.SetLogger(kitexlogrus.NewLogger())
klog.SetLevel(klog.LevelDebug)
}
// Echo implements the Echo interface.
func (s *EchoImpl) Echo(ctx context.Context, req *api.Request) (resp *api.Response, err error) {
klog.CtxDebugf(ctx, "echo called: %s", req.GetMessage())
return &api.Response{Message: req.Message}, nil
}
{"level":"debug","msg":"echo called: my request","span_id":"056e0cf9a8b2cec3","time":"2022-03-09T02:47:28+08:00","trace_flags":"01","trace_id":"33bdd3c81c9eb6cbc0fbb59c57ce088b"}
Below is a table of RPC server metric instruments.
Name | Instrument | Unit | Unit (UCUM) | Description | Status | Streaming |
---|---|---|---|---|---|---|
rpc.server.duration |
Histogram | milliseconds | ms |
measures duration of inbound RPC | Recommended | N/A. While streaming RPCs may record this metric as start-of-batch to end-of-batch, it's hard to interpret in practice. |
Below is a table of RPC client metric instruments. These apply to traditional RPC usage, not streaming RPCs.
Name | Instrument | Unit | Unit (UCUM) | Description | Status | Streaming |
---|---|---|---|---|---|---|
rpc.client.duration |
Histogram | milliseconds | ms |
measures duration of outbound RPC | Recommended | N/A. While streaming RPCs may record this metric as start-of-batch to end-of-batch, it's hard to interpret in practice. |
The RED Method defines the three key metrics you should measure for every microservice in your architecture. We can
calculate RED based on rpc.server.duration
.
the number of requests, per second, you services are serving.
eg: QPS
sum(rate(rpc_server_duration_count{}[5m])) by (service_name, rpc_method)
the number of failed requests per second.
eg: Error ratio
sum(rate(rpc_server_duration_count{status_code="Error"}[5m])) by (service_name, rpc_method) / sum(rate(rpc_server_duration_count{}[5m])) by (service_name, rpc_method)
distributions of the amount of time each request takes
eg: P99 Latency
histogram_quantile(0.99, sum(rate(rpc_server_duration_bucket{}[5m])) by (le, service_name, rpc_method))
The rpc.server.duration
will record the peer service and the current service dimension. Based on this dimension, we
can aggregate the service topology map
sum(rate(rpc_server_duration_count{}[5m])) by (service_name, peer_service)
Name | Instrument | Unit | Unit (UCUM)) | Description |
---|---|---|---|---|
process.runtime.go.cgo.calls |
Sum | - | - | Number of cgo calls made by the current process. |
process.runtime.go.gc.count |
Sum | - | - | Number of completed garbage collection cycles. |
process.runtime.go.gc.pause_ns |
Histogram | nanosecond | ns |
Amount of nanoseconds in GC stop-the-world pauses. |
process.runtime.go.gc.pause_total_ns |
Histogram | nanosecond | ns |
Cumulative nanoseconds in GC stop-the-world pauses since the program started. |
process.runtime.go.goroutines |
Gauge | - | - | measures duration of outbound RPC. |
process.runtime.go.lookups |
Sum | - | - | Number of pointer lookups performed by the runtime. |
process.runtime.go.mem.heap_alloc |
Gauge | bytes | bytes |
Bytes of allocated heap objects. |
process.runtime.go.mem.heap_idle |
Gauge | bytes | bytes |
Bytes in idle (unused) spans. |
process.runtime.go.mem.heap_inuse |
Gauge | bytes | bytes |
Bytes in in-use spans. |
process.runtime.go.mem.heap_objects |
Gauge | - | - | Number of allocated heap objects. |
process.runtime.go.mem.live_objects |
Gauge | - | - | Number of live objects is the number of cumulative Mallocs - Frees. |
process.runtime.go.mem.heap_released |
Gauge | bytes | bytes |
Bytes of idle spans whose physical memory has been returned to the OS. |
process.runtime.go.mem.heap_sys |
Gauge | bytes | bytes |
Bytes of idle spans whose physical memory has been returned to the OS. |
runtime.uptime |
Sum | ms | ms |
Milliseconds since application was initialized. |
The sdk of OpenTelemetry is fully compatible with 1.X opentelemetry-go. see
maintained by: CoderPoet
Library/Framework | Versions | Notes |
---|---|---|
go.opentelemetry.io/otel | v1.19.0 | |
go.opentelemetry.io/otel/trace | v1.19.0 | |
go.opentelemetry.io/otel/metric | v1.19.0 | |
go.opentelemetry.io/contrib/instrumentation/runtime | v0.45.0 | |
kitex | v0.7.3 |