Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve logs #1012

Closed
wants to merge 9 commits into from
Closed

improve logs #1012

wants to merge 9 commits into from

Conversation

d-uzlov
Copy link
Contributor

@d-uzlov d-uzlov commented Jul 8, 2021

Description

  • Changes log level for chain traces from info to debug
  • Adds connection id to each line where connection id is applicable
  • Changes diff function to google's go-cmp

Here is how nsmgr logs look after these changes: nsmgr-2021-07-08T210938+0700.log
Few examples:

  1. Command grep <id>, for example grep 017302e6-00bb-40e3-b60d-48c157d59e9f, will give you full information about one of the connections
  2. Here is an example of how the new diff function result looks like:
Jul  8 14:07:41.644�[37m [DEBU] [id:nsc-kernel-5c64846f8b-vvbk2-0] [name:nsmgr-42n5j] �[0m(1) ⎆ cmd-nsmgr/pkg/networkservice/common/updatepath/updatePathServer.Request() span=3b6b58f676e54ae0:2016e43e717b762c:07cf4a3b3994e904:1
Jul  8 14:07:41.644�[37m [DEBU] [id:nsc-kernel-5c64846f8b-vvbk2-0] [name:nsmgr-42n5j] �[0m(1.1)   request: connection:{id:"nsc-kernel-5c64846f8b-vvbk2-0"  network_service:"scalability-local-ns-0"  path:{path_segments:{name:"nsc-kernel-5c64846f8b-vvbk2"  id:"nsc-kernel-5c64846f8b-vvbk2-0"}}}  mechanism_preferences:{cls:"LOCAL"  type:"KERNEL"  parameters:{key:"inodeURL"  value:"inode://4/4026533747"}  parameters:{key:"name"  value:"nsm-0"}} span=3b6b58f676e54ae0:2016e43e717b762c:07cf4a3b3994e904:1
Jul  8 14:07:41.645�[37m [DEBU] [id:nsc-kernel-5c64846f8b-vvbk2-0] [name:nsmgr-42n5j] �[0m(1.2)   request-diff:   (*networkservice.NetworkServiceRequest)(Inverse(protocmp.Transform, protocmp.Message{
  	"@type": s"networkservice.NetworkServiceRequest",
  	"connection": protocmp.Message{
  		"@type":           s"connection.Connection",
- 		"id":              string("nsc-kernel-5c64846f8b-vvbk2-0"),
+ 		"id":              string("811ddd8f-2fa6-431b-9c68-6ea9787a916b"),
  		"network_service": string("scalability-local-ns-0"),
  		"path": protocmp.Message{
  			"@type": s"connection.Path",
+ 			"index": uint32(1),
  			"path_segments": []protocmp.Message{
  				{
  					"@type":   s"connection.PathSegment",
+ 					"expires": s"2021-07-08T14:43:44Z",
  					"id":      string("nsc-kernel-5c64846f8b-vvbk2-0"),
  					"name":    string("nsc-kernel-5c64846f8b-vvbk2"),
+ 					"token":   string("eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQiLCJleHAiOjE2MjU3NTY4NjEsInN1YiI6InNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zLXB2bGduL3NhL2RlZmF1bHQifQ.OdKQvFASPnko6ajIihkOu3w4vSMrM5HRoHKglvUdnF0M9hTS_-3E"...),
  				},
+ 				s`{name:"nsmgr-42n5j"  id:"811ddd8f-2fa6-431b-9c68-6ea9787a916b"}`,
  			},
  		},
  	},
  	"mechanism_preferences": []protocmp.Message{{"@type": s"connection.Mechanism", "cls": string("LOCAL"), "parameters": map[string]string{"inodeURL": "inode://4/4026533747", "name": "nsm-0"}, "type": string("KERNEL")}},
  }))
 span=3b6b58f676e54ae0:2016e43e717b762c:07cf4a3b3994e904:1
Jul  8 14:07:41.646�[37m [DEBU] [id:811ddd8f-2fa6-431b-9c68-6ea9787a916b] [name:nsmgr-42n5j] �[0m(2)  ⎆ cmd-nsmgr/pkg/networkservice/common/authorize/authorizeServer.Request() span=3b6b58f676e54ae0:73546bd0f17acfc7:2016e43e717b762c:1
Jul  8 14:07:41.646�[37m [DEBU] [id:811ddd8f-2fa6-431b-9c68-6ea9787a916b] [name:nsmgr-42n5j] �[0m(3)   ⎆ cmd-nsmgr/pkg/networkservice/common/serialize/serializeServer.Request() span=3b6b58f676e54ae0:101643fe9105ba5b:73546bd0f17acfc7:1
Jul  8 14:07:41.646�[37m [DEBU] [id:811ddd8f-2fa6-431b-9c68-6ea9787a916b] [name:nsmgr-42n5j] �[0m(4)    ⎆ cmd-nsmgr/pkg/networkservice/common/timeout/timeoutServer.Request() span=3b6b58f676e54ae0:041ff490c5ab8fcf:101643fe9105ba5b:1
Jul  8 14:07:41.646�[37m [DEBU] [id:811ddd8f-2fa6-431b-9c68-6ea9787a916b] [name:nsmgr-42n5j] �[0m(5)     ⎆ cmd-nsmgr/pkg/networkservice/utils/metadata/metadataServer.Request() span=3b6b58f676e54ae0:22ffe0748ef93866:041ff490c5ab8fcf:1
Jul  8 14:07:41.646�[37m [DEBU] [id:811ddd8f-2fa6-431b-9c68-6ea9787a916b] [name:nsmgr-42n5j] �[0m(6)      ⎆ cmd-nsmgr/pkg/networkservice/common/monitor/monitorServer.Request() span=3b6b58f676e54ae0:3fe8953d481790d8:22ffe0748ef93866:1
Jul  8 14:07:41.646�[37m [DEBU] [id:811ddd8f-2fa6-431b-9c68-6ea9787a916b] [name:nsmgr-42n5j] �[0m(7)       ⎆ cmd-nsmgr/pkg/networkservice/common/updatetoken/updateTokenServer.Request() span=3b6b58f676e54ae0:21d564863d5f7e2e:3fe8953d481790d8:1
Jul  8 14:07:41.648�[37m [DEBU] [id:811ddd8f-2fa6-431b-9c68-6ea9787a916b] [name:nsmgr-42n5j] �[0m(7.1)         request-diff:   (*networkservice.NetworkServiceRequest)(Inverse(protocmp.Transform, protocmp.Message{
  	"@type": s"networkservice.NetworkServiceRequest",
  	"connection": protocmp.Message{
  		"@type":           s"connection.Connection",
  		"id":              string("811ddd8f-2fa6-431b-9c68-6ea9787a916b"),
  		"network_service": string("scalability-local-ns-0"),
  		"path": protocmp.Message{
  			"@type": s"connection.Path",
  			"index": uint32(1),
  			"path_segments": []protocmp.Message{
  				{"@type": s"connection.PathSegment", "expires": protocmp.Message{"@type": s"google.protobuf.Timestamp", "seconds": int64(1625755424)}, "id": string("nsc-kernel-5c64846f8b-vvbk2-0"), "name": string("nsc-kernel-5c64846f8b-vvbk2"), ...},
  				{
  					"@type":   s"connection.PathSegment",
+ 					"expires": s"2021-07-08T14:43:44Z",
  					"id":      string("811ddd8f-2fa6-431b-9c68-6ea9787a916b"),
  					"name":    string("nsmgr-42n5j"),
+ 					"token":   string("eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9ucy1wdmxnbi9zYS9kZWZhdWx0IiwiZXhwIjoxNjI1NzU1NDI0LCJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQifQ.k2ePQyI-2ADQIAYp1RZTelveqwaDurwS32sFaVGDJGuxeJsteVPw"...),
  				},
  			},
  		},
  	},
  	"mechanism_preferences": []protocmp.Message{{"@type": s"connection.Mechanism", "cls": string("LOCAL"), "parameters": map[string]string{"inodeURL": "inode://4/4026533747", "name": "nsm-0"}, "type": string("KERNEL")}},
  }))
New diff function actually doesn't work that well with IDs in each line, because it produces multiline output, only first line of which contains logger fields.

Here is how nsmgr logs with log level set to info look like: nsmgr-2021-07-08T212330+0700.log
These logs are actually not as useful as I had hoped.
We have few issues with them:

  1. Log pollution by "Reporting span" messages
  2. Registry trace chain elements use Object log method, which still uses info level.
    We can probably add log level as an argument for this method.
  3. There are a lot of errors that are not really errors. EOF, "the client connection is closing" next part of the connection was deleted before current one, etc.
  4. We don't do much logging, except by trace element, so it's hard to find anything really useful here.

I'm really not sure if we need to do anything with these issues in context of this task, though. Probably not.

Issue link

Closes #1008

How Has This Been Tested?

  • Added unit testing to cover
  • Tested manually
  • Tested by integration testing
  • Have not tested

Types of changes

  • Bug fix
  • New functionality
  • Documentation
  • Refactoring
  • CI

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
@d-uzlov
Copy link
Contributor Author

d-uzlov commented Aug 2, 2021

Example of final version of logs:

Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] (12.1)              request: {
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 	"connection": {
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 		"id": "1",
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 		"network_service": "ns-7bce0604-11d1-4b9b-acf5-32239f30458c",
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 		"context": {},
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 		"path": {
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 			"path_segments": [
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 				{
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 					"name": "client-9a26d9b2-208b-4eff-9216-7ed287260171",
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 					"id": "1"
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 				}
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 			]
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 		}
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 	},
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 	"mechanism_preferences": [
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 		{
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 			"cls": "LOCAL",
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 			"type": "KERNEL"
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 		}
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 	]
Aug  2 17:27:05.890 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] }
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] (12.2)              request-diff: {
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 	"connection": {
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 		"id": "bee5a07b-fc79-42a7-806f-ccbbed213d42",
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 		"path": {
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 			"index": 1,
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 			"path_segments": {
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 				"+1": {
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 					"name": "nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09",
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 					"id": "bee5a07b-fc79-42a7-806f-ccbbed213d42"
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 				},
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 				"0": {
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 					"expires": {
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 						"nanos": 890670600,
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 						"seconds": 1627903625
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 					},
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 					"token": "TestToken"
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 				}
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 			}
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 		}
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] 	}
Aug  2 17:27:05.891 [DEBU] [id:1] [name:nsmgr-e6a9e599-4598-42c7-a86d-97611ff9bf09] }

@d-uzlov d-uzlov marked this pull request as ready for review August 2, 2021 10:28
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
@glazychev-art glazychev-art self-requested a review August 4, 2021 11:39
@glazychev-art
Copy link
Contributor

glazychev-art commented Aug 4, 2021

Checked PR locally, I think we need additional discussion

@denis-tingaikin
Copy link
Member

@glazychev-art Thanks for checking. Moving this PR to draft.

@d-uzlov , @glazychev-art Let me know if you discuss it already.

@denis-tingaikin denis-tingaikin marked this pull request as draft August 8, 2021 12:01
@glazychev-art
Copy link
Contributor

I have the following suggestions for further improvement:

  1. Set some default logging level. In the previous version, it was enough to call log.EnableTracing(true) to see all of the traces. Because standard logrus logger uses InfoLevel - https://github.com/sirupsen/logrus/blob/master/logger.go#L92. But this patch uses DebugLevel. So, I think we need to set DebugLevel when we create logruslogger, otherwise the user will not get the desired result after log.EnableTracing(true) calling. And probably it will be better to use TraceLevel.
  2. Registry uses Debug to print the start string of the chain element, but Info to print other information. For example:
...
[DEBU] (3)   ⎆ sdk/pkg/registry/common/checkid/setIDServer.Find()
[INFO] (3.1)     find={"network_service_endpoint":{},"watch":true}
...

We need to use the same logic as in networkservice trace chain element.

@edwarnicke
Copy link
Member

Could we please break this up into smaller pieces.

Shifting from INFO to DEBUG is a pretty small move

Adding connection ids is a pretty small move

It looks like they are getting conflated with much larger moves around changing the diff being used, moving to multi-line representations etc... which should be considered separately.

@Bolodya1997
Copy link

@edwarnicke
We have discussed internally and so we want to make the following actions to improve existing logs:

  1. Add [id:Connection.Id] or [id:Endpoint.Name] fields for every logged line, including multiline diffs and error traces.
  2. Add [type:NetworkService], [type:NetworkServiceEndpointRegistry], [type:NetworkServiceRegistry] fields for every logged line.
  3. Change logs to write request, close, register, unregister, find + requestResponse, closeResponse, ... instead of just request + response for all type of events.
  4. Separate logs to the different log levels:
    1. Trace logs (e.g. (1) ⎆ sdk/pkg/networkservice/common/updatepath/updatePathServer.Request() - TRACE level.
    2. All other logs - DEBUG, INFO, ... levels.
  5. To make [4] well working we need to additionally add Config.LogLevel to all cmd- applications and set NSM_LOG_LEVEL=trace for our CI runs.
  6. Add README.md file to sdk/pkg/tools/log with some best practices used to understand logs.

As you mentioned, probably it would be better to split this PR in multiple ones. So we will probably do it in the following order:

  1. PR introducing enhanced formatter (for every line tagging).
  2. PR introducing [1, 2, 3].
  3. PR introducing enhanced diff.
  4. PR introducing [5] + bunch of PRs to all cmd- repositories.
  5. PR introducing [6].

@edwarnicke
Copy link
Member

Change logs to write request, close, register, unregister, find + requestResponse, closeResponse, ... instead of just request + response for all type of events.

What's the plan here for distinguishing between the call of 'request,close, register, unregister, find, etc' and the response?

Knowing you, you've though this all the way through... curious the result :)

@Bolodya1997
Copy link

What's the plan here for distinguishing between the call of 'request,close, register, unregister, find, etc' and the response?

It is actually very easy to change, because it is just a word printed in trace chain element for the corresponding actions. But it is one of the things that makes logs worse for the quick understanding.

@Mixaster995 Mixaster995 mentioned this pull request Oct 5, 2021
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make logs more grepable
6 participants