Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTEL for monitoring learning process #238

Open
jlewi opened this issue Sep 13, 2024 · 1 comment
Open

OTEL for monitoring learning process #238

jlewi opened this issue Sep 13, 2024 · 1 comment

Comments

@jlewi
Copy link
Owner

jlewi commented Sep 13, 2024

See: #215 (comment)

We have a user reporting a bug in learning and we don't have the instrumentation in place to allow us to figure out where things are breaking.

There are two ways we could handle this.

  1. Use OTEL to create traces that track the learning process for blocks
  2. Use logging

I'm leaning towards OTEL. The reason being we probably want aggregate statics/down sampling because otherwise we will be wasting lots of storage. Our JSON logs are already quite big and we don't have any sort of log rotation/etc...

I think it might better to reserver JSONLogs for large structured payloads that OTEL doesn't handle well.

The downside of OTEL is I'm not sure if there's a good local option. Of course people could use Honeycomb's free tier.

@sourishkrout any thoughts? Do you have an OTEL backend you already using?

@sourishkrout
Copy link

@sourishkrout any thoughts? Do you have an OTEL backend you already using?

I have used Jaeger successfully with Runme before. It worked well for local-only and is very cost-effective before considering the usual suspects. It's also super easy to export and import traces for simple collab.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants