Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make logs more grepable #1008

Closed
d-uzlov opened this issue Jul 6, 2021 · 8 comments
Closed

Make logs more grepable #1008

d-uzlov opened this issue Jul 6, 2021 · 8 comments
Assignees
Labels
enhancement New feature or request stability The problem is related to system stability

Comments

@d-uzlov
Copy link
Contributor

d-uzlov commented Jul 6, 2021

Overview

Right now we have automatic trace logs for network services and registries.
They have some annoying issues:

  1. They use info level
  2. Different requests are interleaved.

These issues make it difficult to read the logs. It's next to impossible to tell what is going on when there are even mere dozens of requests.

Solution

  1. Obviously, we need to change the level for trace logs.
    The obvious choice would be trace. However, our "trace" logs are actually not so detailed. If we were to add a lot of logs inside some chain elements, then we would probably want to be able to distinguish out current trace logs from them, so maybe it would be better to log them at debug level.
  2. We can add request ID into logger fields, so interleaving lines of logs can be distinguished by IDs.
    We would still have issues with multiline logs, since they only have logger fields in their first line.
    Most of our log entries are single lines, and the vast majority of multiline entries are stack traces from errors, so this is not a big problem.

The size increase due to adding ID to each line is minor, in comparison to our current logs size. However, we would be able to use, for example, grep 9763b41a-2ff6-40e3-8475-7d9dae986f13, to reduce the big log file to only the important part.

Implementation

  1. The log level for traces is set here:
    s.entry.Infof("%v%s⎆ %v()%v", s.info.incInfo(), prefix, s.operation, s.getSpan())
  2. We can add field into the context here, using log.WithFields (same for other trace elements):
    ctx, finish := withLog(ctx, operation)
@d-uzlov d-uzlov added the enhancement New feature or request label Jul 6, 2021
@d-uzlov d-uzlov self-assigned this Jul 6, 2021
@d-uzlov
Copy link
Contributor Author

d-uzlov commented Jul 6, 2021

I have actually already implemented it locally for debugging of heal issues found during scalability testing. It helps a lot.

@denis-tingaikin
Copy link
Member

@d-uzlov Be free to open PR!

@edwarnicke
Copy link
Member

Obviously, we need to change the level for trace logs.
The obvious choice would be trace. However, our "trace" logs are actually not so detailed. If we were to add a lot of logs inside some chain elements, then we would probably want to be able to distinguish out current trace logs from them, so maybe it would be better to log them at debug level.

As a side note, in the sdk-vpp code, I do my internal logs at debug level. That isn't a commentary on what they should be ... just what they are. It can be changed if need be.

@edwarnicke
Copy link
Member

We can add request ID into logger fields, so interleaving lines of logs can be distinguished by IDs.
We would still have issues with multiline logs, since they only have logger fields in their first line.
Most of our log entries are single lines, and the vast majority of multiline entries are stack traces from errors, so this is not a big problem.

I like this idea very much... but there's one practical corner case... what do we do in the case where we haven't got a Connection ID yet? I suspect this is quite solvable... but will need to be solved :)

@d-uzlov
Copy link
Contributor Author

d-uzlov commented Jul 21, 2021

I like this idea very much... but there's one practical corner case... what do we do in the case where we haven't got a Connection ID yet? I suspect this is quite solvable... but will need to be solved :)

When I was doing my testing, I was getting connection ID in 2 ways:

  1. When I know that something went wrong on a certain pod, I can search for the pod name and find connection path information containing this pod name. The easiest place for it would actually be the diff of the first request, where we change connection path, and that change is logged.
  2. When there is an error in the logs (or just any message we are concerned of), the error line contains connection id, and then I can use grep to see the full history of what happened with the related connection.

@denis-tingaikin denis-tingaikin added the stability The problem is related to system stability label Sep 8, 2021
@denis-tingaikin
Copy link
Member

PR #1012 is not looking great for us and we want to consider a list of improvements for logs in term of the issue.

@denis-tingaikin
Copy link
Member

@edwarnicke Finally, all PRs are merged!

@Mixaster995 Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stability The problem is related to system stability
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants