Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observability #133

Merged
merged 4 commits into from
Mar 6, 2024
Merged

Observability #133

merged 4 commits into from
Mar 6, 2024

Conversation

finn-block
Copy link
Collaborator

@finn-block finn-block commented Feb 22, 2024

This resolves the issues outlined in #132

  • Tracing:
    • configured from environment variable
    • spans have been added on all of the critical paths.
  • Metrics:
    • All of the tracing spans also generate timing and count metrics, which is sufficient for now. No additional metrics have been added
  • Logging:
    • All Debugf/Infof/Warnf logrus calls have been updated to use fields and a static log message

another addition not mentioned in that issue is a simple traffic generator in impl/integrationtest/main.go which generates a DID, PUTs it and GETs it every 30 seconds.

Sample PUT span:

image

@codecov-commenter
Copy link

codecov-commenter commented Feb 23, 2024

Codecov Report

Attention: Patch coverage is 83.82353% with 11 lines in your changes are missing coverage. Please review.

Project coverage is 73.76%. Comparing base (2241dd7) to head (42fa95e).

Files Patch % Lines
impl/pkg/service/pkarr.go 41.17% 10 Missing ⚠️
impl/pkg/server/pkarr.go 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #133      +/-   ##
==========================================
+ Coverage   73.26%   73.76%   +0.49%     
==========================================
  Files          17       17              
  Lines        1343     1395      +52     
==========================================
+ Hits          984     1029      +45     
- Misses        266      273       +7     
  Partials       93       93              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@finn-block finn-block marked this pull request as ready for review March 5, 2024 00:08
)

func init() {
buildInfo, ok := debug.ReadBuildInfo()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting...how does this work?

we do/should have a version set in our config

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read about it here, right now it's just returning (devel) for the current module's version:

Screenshot 2024-03-05 at 10 27 05 AM

I thought that would become a useful name when we add tags, but I just tried adding a tag and it didn't help, so I might try to push that forward a little before merging this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so i thought i had successfully used this in the past but i guess it will never work. Looking at where I thought I had successfully used it, I appear to have some compile-time variable setting to inject a version externally. Will add the same here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

looks slightly better now. GitHub doesn't give us a lot of great options, and because the Dockerfile copies the impl/ not the git root, we can't use git describe or anything like that in the container. For tagged releases, the string telemetry in that screenshot should be the tag name.

metrics generated from traces are probably sufficient for our current needs, so may address metric goals as well
const scopeName = "github.com/TBD54566975/did-dht-method"

var (
Tracer trace.Tracer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, don't love the idea of having a global tracer

can we put this on a struct which SetupTelemetry returns?

Copy link
Collaborator Author

@finn-block finn-block Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally could. Is there a problem with having a global tracer? An alternative to having a tracer that gets passed around everywhere is to initialize a tracer in each package that we can use to instrument that package, which is what the gin middleware does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global state scares me generally because of assumptions it caries - may/not be initialized, different callers using it for different purposes in an unsafe manner. Of course this is negligible given how small the codebase is now, though I view it as a best practice to avoid globally mutable state

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonable. I'll figure out how I want to setup non-global tracer(s)

Copy link
Collaborator Author

@finn-block finn-block Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I refactored a bit to have a GetTracer() function that returns a tracer. There's still one instance that gets initialized on first call, which isnt great but I'm not sure what better options there are. I could make a function + tracer instance like this for each package (then there wouldnt be a global one technically) but I dont see how that's any better. Let me know if you this this is acceptable, if not let's discuss options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember going through a similar type of conversation with @andresuribe87 on something a while back and he convinced me to rip out all the (complex, ugly) singleton code and just make a struct that returns a unique thing, which worked for the 90% case.

I believe what you've done here is fine too, and we should be open to changing it should that assumption be false

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see doing that too. It would involve passing it around through a lot of different functions, which seems annoying. I'm going to merge this as is, but i'm not opposed to moving to that in the future.

Copy link
Member

@decentralgabe decentralgabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving pending global state change

nice work 🎉

@finn-block finn-block merged commit e9e3e1e into main Mar 6, 2024
4 of 5 checks passed
@finn-block finn-block deleted the telemetry branch March 6, 2024 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants