Make logs more grepable #1008

d-uzlov · 2021-07-06T17:32:44Z

Overview

Right now we have automatic trace logs for network services and registries.
They have some annoying issues:

They use info level
Different requests are interleaved.

These issues make it difficult to read the logs. It's next to impossible to tell what is going on when there are even mere dozens of requests.

Solution

Obviously, we need to change the level for trace logs.
The obvious choice would be trace. However, our "trace" logs are actually not so detailed. If we were to add a lot of logs inside some chain elements, then we would probably want to be able to distinguish out current trace logs from them, so maybe it would be better to log them at debug level.
We can add request ID into logger fields, so interleaving lines of logs can be distinguished by IDs.
We would still have issues with multiline logs, since they only have logger fields in their first line.
Most of our log entries are single lines, and the vast majority of multiline entries are stack traces from errors, so this is not a big problem.

The size increase due to adding ID to each line is minor, in comparison to our current logs size. However, we would be able to use, for example, grep 9763b41a-2ff6-40e3-8475-7d9dae986f13, to reduce the big log file to only the important part.

Implementation

The log level for traces is set here:

sdk/pkg/tools/log/logruslogger/logruslogger.go

Line 217 in d4734b5

s.entry.Infof("%v%s⎆ %v()%v", s.info.incInfo(), prefix, s.operation, s.getSpan())
We can add field into the context here, using log.WithFields (same for other trace elements):

sdk/pkg/networkservice/core/trace/server.go

Line 75 in d4734b5

ctx, finish := withLog(ctx, operation)

The text was updated successfully, but these errors were encountered:

d-uzlov · 2021-07-06T17:35:23Z

I have actually already implemented it locally for debugging of heal issues found during scalability testing. It helps a lot.

denis-tingaikin · 2021-07-06T17:38:38Z

@d-uzlov Be free to open PR!

edwarnicke · 2021-07-21T14:00:39Z

Obviously, we need to change the level for trace logs.
The obvious choice would be trace. However, our "trace" logs are actually not so detailed. If we were to add a lot of logs inside some chain elements, then we would probably want to be able to distinguish out current trace logs from them, so maybe it would be better to log them at debug level.

As a side note, in the sdk-vpp code, I do my internal logs at debug level. That isn't a commentary on what they should be ... just what they are. It can be changed if need be.

edwarnicke · 2021-07-21T14:01:45Z

We can add request ID into logger fields, so interleaving lines of logs can be distinguished by IDs.
We would still have issues with multiline logs, since they only have logger fields in their first line.
Most of our log entries are single lines, and the vast majority of multiline entries are stack traces from errors, so this is not a big problem.

I like this idea very much... but there's one practical corner case... what do we do in the case where we haven't got a Connection ID yet? I suspect this is quite solvable... but will need to be solved :)

d-uzlov · 2021-07-21T14:13:15Z

I like this idea very much... but there's one practical corner case... what do we do in the case where we haven't got a Connection ID yet? I suspect this is quite solvable... but will need to be solved :)

When I was doing my testing, I was getting connection ID in 2 ways:

When I know that something went wrong on a certain pod, I can search for the pod name and find connection path information containing this pod name. The easiest place for it would actually be the diff of the first request, where we change connection path, and that change is logged.
When there is an error in the logs (or just any message we are concerned of), the error line contains connection id, and then I can use grep to see the full history of what happened with the related connection.

denis-tingaikin · 2021-09-08T07:42:14Z

PR #1012 is not looking great for us and we want to consider a list of improvements for logs in term of the issue.

Mixaster995 · 2021-10-08T10:34:41Z

List of PR's to all cmd, allowing usage of log level env variable

denis-tingaikin · 2021-11-02T10:13:45Z

@edwarnicke Finally, all PRs are merged!

@Mixaster995 Thanks!

d-uzlov added the enhancement New feature or request label Jul 6, 2021

d-uzlov self-assigned this Jul 6, 2021

This was referenced Jul 8, 2021

improve logs #1012

Closed

Scalability Testing: System NSM testing networkservicemesh/deployments-k8s#1015

Open

denis-tingaikin added the stability The problem is related to system stability label Sep 8, 2021

Mixaster995 self-assigned this Sep 29, 2021

Mixaster995 mentioned this issue Oct 5, 2021

Changes to logs #1096

Merged

9 tasks

Mixaster995 unassigned d-uzlov Oct 7, 2021

Mixaster995 mentioned this issue Oct 11, 2021

Default log level networkservicemesh/deployments-k8s#3095

Merged

Mixaster995 mentioned this issue Oct 12, 2021

Readme for logs #1102

Merged

5 tasks

denis-tingaikin closed this as completed Nov 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make logs more grepable #1008

Make logs more grepable #1008

d-uzlov commented Jul 6, 2021 •

edited

Loading

d-uzlov commented Jul 6, 2021

denis-tingaikin commented Jul 6, 2021

edwarnicke commented Jul 21, 2021

edwarnicke commented Jul 21, 2021

d-uzlov commented Jul 21, 2021 •

edited

Loading

denis-tingaikin commented Sep 8, 2021

Mixaster995 commented Oct 8, 2021

denis-tingaikin commented Nov 2, 2021

Make logs more grepable #1008

Make logs more grepable #1008

Comments

d-uzlov commented Jul 6, 2021 • edited Loading

Overview

Solution

Implementation

d-uzlov commented Jul 6, 2021

denis-tingaikin commented Jul 6, 2021

edwarnicke commented Jul 21, 2021

edwarnicke commented Jul 21, 2021

d-uzlov commented Jul 21, 2021 • edited Loading

denis-tingaikin commented Sep 8, 2021

Mixaster995 commented Oct 8, 2021

denis-tingaikin commented Nov 2, 2021

d-uzlov commented Jul 6, 2021 •

edited

Loading

d-uzlov commented Jul 21, 2021 •

edited

Loading