feat: add local langfuse tracing option #106

ahau-square · 2024-10-02T20:30:10Z

why
The purpose of this PR is to integrate local Langfuse tracing into the project to enhance debugging and monitoring capabilities. Tracing allows developers to observe the flow of execution and diagnose issues more effectively.

what

Exchange Package:
- Defined observe decorator wrapper observe_wrapper to use the observe decarator only if Langfuse local env variables are set
- Add observe decorator to tool calling function and to providers' completion functions

Goose:
- Modifications in to set up Langfuse tracing upon CLI initialization.
- Updates in to trace session-level information.
- Add observe decorator to reply

usage

Developers can use locally hosted Langfuse tracing by applying the observe_wrapper decorator to functions for automatic integration with Langfuse, providing detailed execution insights. Docker is a requirement.
To enable tracing, launch the CLI with the --tracing flag, which will initialize Langfuse with the set env variables

setting up locally hosted Langfuse

Run setup_langfuse.sh script (in packages/exchange/src/exchange/langfuse) to download and deploy the Langfuse docker container with default initialization variables found in the .env.langfuse.local file.
Go to http://localhost:3000/ and log in with the default email/password output by the shell script (or found in the .env.langfuse.local file).
Run Goose with the --tracing flag enabled i.e., goose session start --tracing
When Goose starts you should see the log INFO:exchange.langfuse:Langfuse credentials found. Find traces, if enabled, under http://localhost:3000.
Go to http://localhost:3000/ and you should be able to see your traces

Sample trace viewing:

lifeizhou-ap · 2024-10-03T06:24:59Z

Hey @ahau-square,

The Langfuse UI looks good for users to view the traces! Below is what I found about tracing with Langfuse

Pros

UI is clear easy to browse
Api is easy to use and clean
build-in LLM specific tracing

Cons

Manual steps to setup the Langfuse account
I had a look about how to automate these steps. We could copy the docker compose file in our repo to start the docker container instead of asking the user to clone the langfuse repo. (The drawback is that the docker compose file could be out of date. But in the docker compose file there are not much customised logic, it mainly defined the image and basic configurations)
However, I found it is hard to automate the steps to sign up as a user, create a project and get the api keys. It is a once-off step for the user. If we could automate it, the user will have smoother experiences

Alternative
Open telemetry can integrated with a few tools that visualise the tracing. For example, export to local Zipkin or using otel-tui as @codefromthecrypt suggested. These tools can be started as docker container via scripts without manual steps.

For your reference, this is the PR @codefromthecrypt created for tracing.

lifeizhou-ap · 2024-10-03T06:36:18Z

When the user passes --tracing, but langfuse server is not up. it shows error "ERROR:langfuse:Unexpected error occurred. Please check your request and contact support: https://langfuse.com/support."

Maybe we can validate whether the server is up when the user pass --tracing option and remind the users to start the server

codefromthecrypt · 2024-10-03T07:15:16Z

ironically just got off a call with @marcklingen. So marc, "goose" as you can read in its lovely readme is an inference backed SWE capable buddy. It uses its own LLM abstraction called exchange. I've thought about instrumenting this to use the genai conventions as tagged above. I actually haven't gotten around to spiking that, either.

My thinking is that we can maybe do both (langfuse and otel) until langfuse handles OTLP? Then compare until the latter works as well. wdyt? Do you have an issue to follow-up on that?

Also, if you can help @ahau-square meanwhile with any pointers or any notes about comments above. We've been using python VCR tests at times, so we can probably test things work even without accounts. Depending on how things elaborate I don't mind contributing help on that as it is a good way to learn.

Meanwhile, I think let's see where we get and keep a debug story going, even if it is work in progress

marcklingen · 2024-10-03T08:03:20Z

thanks for tagging me @codefromthecrypt, nice to see you here again and happy to help, "goose" seems cool and I'll try to give it a proper spin later this week

My thinking is that we can maybe do both (langfuse and otel) until langfuse handles OTLP? Then compare until the latter works as well. wdyt? Do you have an issue to follow-up on that?

This seems very reasonable

I am generally very excited about standardization on the instrumentation side via opentelemetry. We are tracking support for OTLP in Langfuse here, feel free to subscribe to the discussion or add your thoughts: https://github.com/orgs/langfuse/discussions/2509

From my point of view, short/mid-term Langfuse benefits from own instrumentation in addition to the standard but we are looking to be compatible for use cases that are standardized.

However, I found it is hard to automate the steps to sign up as a user, create a project and get the api keys. It is a once-off step for the user. If we could automate it, the user will have smoother experiences

Also, if you can help @ahau-square meanwhile with any pointers or any notes about comments above. We've been using python VCR tests at times, so we can probably test things work even without accounts. Depending on how things elaborate I don't mind contributing help on that as it is a good way to learn.

Langfuse supports initialization via environment variables which would overcome the issue of having to create an account via the local langfuse server. I will add this to the default docker compose configuration as well and update this message once done.

Maybe we can validate whether the server is up when the user pass --tracing option and remind the users to start the server

Langfuse client SDKs support auth_check for sync check of credentials and api availability. this could be used at startup here to render a more useful error message.

Side note: I think interesting here would be to return a deep link to the trace in the local langfuse ui on the cli output if tracing is enabled to make it easier to jump to the trace from any invocation of goose for the debugging use case.

marcklingen · 2024-10-03T08:30:26Z

Added support for headless initialization via Docker Compose here: https://github.com/langfuse/langfuse/pull/3568/files.

This change required explicitly handling empty string values for these environment variables, as Docker Compose does not allow adding environment variables that may not exist without setting a default value. Therefore, this depends on the next langfuse release which will include this fix (scheduled for tomorrow).

marcklingen · 2024-10-04T05:38:27Z

we just released the change. let me know if you have any questions, @ahau-square. You can now pass the init envs to the Langfuse docker compose deployment to initialize default credentials.

ahau-square · 2024-10-07T14:47:07Z

we just released the change. let me know if you have any questions, @ahau-square. You can now pass the init envs to the Langfuse docker compose deployment to initialize default credentials.

thanks @marcklingen! Got the default credentials working. Is there a way to bypass the login screen altogether after that? Or do users always need to enter their default credentials to login for the first time?

marcklingen · 2024-10-07T15:30:59Z

There's no way to bypass it. They'd need to sign in with the default credentials to view the data in the UI.

marcklingen · 2024-10-07T18:15:36Z

quick addition to the above @ahau-square

if this is local anyway, you could make all traces public. thereby users could directly jump from a trace url in their console output to viewing the trace in their local langfuse instance without having to sign in. this will limit the features that are available though.

more on this here: https://langfuse.com/docs/tracing-features/url#share-trace-via-url

codefromthecrypt · 2024-10-07T22:53:05Z

nit: is this a chore or a feature? the PR title shows chore, but I suspect the result is more a feature than say, reformat files. wdyt?

lifeizhou-ap · 2024-10-07T23:31:32Z

Nice one! @ahau-square

A couple of suggestions:

Instead of checking out the langfuse repository, how about in the script we copy docker-compose.yaml file from the latest langfuse gitrepo and then start the containe? In this case, user has one step less to setup the langfuse local server and we don't need to have Langfuse code in our local goose project.
The langfuse folder has files with mixed purposes.
- langfuse.py: used in the goose (with exchange) application.
- setup_langfuse.sh: used to start Langfuse server
- env.langfuse.local: used for both goose and Langfuse server

Since Langfuse server is a self contained application, we could create a fold outside of src/exchange (could be under goose such as Langfuse-server, It will contain setup_langfuse.sh, env file containing the variables for starting Langfuse, and docker-compose file

marcklingen · 2024-10-07T23:41:05Z

agree with @lifeizhou-ap, you could just copy/paste the docker compose file from langfuse to make it easier to get started and to set any kinds of default environment variables permanently in the docker compose file.

this should make this more easy to use and prevents leaking files into the local goose folder which are langfuse-specific and not necessary to run the software

lifeizhou-ap · 2024-10-08T00:33:05Z

@ahau-square was asking what kinds of tests should be written for this PR.

In terms of testing, I feel like we can make it simple just to write unit tests

observe_wrapper function. Test cases
- langfuse_context observe function is called if the credential env variable exists apart from executing the function
- if the credential does not exist, it executes function only
maybe we can also write tests on openai.py…. to make sure the decorator is applied when credential is present.

I found it is a bit hard to test at the integration level. Although we can run the tests and start the Langfuse server, we still have to manually login to verify the traced data unless there is existing Langfuse api for us to verify

Another alternative is to test the integration with Langfuse, but it requires us to know the implementation details of the langfuse_context.observe. I feel it is a bit overkill for us to test in this way too. FYI These looks like the relevant implementation and the tests in Langfuse python.

It would be great if you could give us some advice @marcklingen :) Thank you!

marcklingen · 2024-10-08T00:39:31Z

Usually testing this is overkill for most teams as you mentioned.

If you however want to test it, I've seen some teams run an example and then fetch the same trace id (fetch_trace) to check that it includes all of the observations that it should include. This however necessitates that you (1) run langfuse in ci via docker compose, (2) use flush to make sure the events are sent immediately at the end of the test, and (3) wait e.g. 2-5sec in CI as Langfuse does not have read after write consistency on the apis.

packages/langfuse-wrapper/scripts/setup_langfuse.sh

lamchau · 2024-10-09T09:28:02Z

this is awesome! pretty excited to try this out :)

packages/langfuse-wrapper/src/langfuse_wrapper/langfuse_wrapper.py

ahau-square · 2024-10-09T20:37:39Z

I think both way works. I am just curious about it. :)

We need the wrapper in the block plugins repo also since we want to wrap the block provider completions there as well.

packages/langfuse-wrapper/scripts/setup_langfuse.sh

lamchau

lgtm!

baxen

✨ Looks great! working well for me locally

Left some comments about verbosity of outputs and some log handling

packages/langfuse-wrapper/src/langfuse_wrapper/langfuse_wrapper.py

src/goose/cli/session.py

* main: feat: add groq provider (#134) feat: add a deep thinking reasoner model (o1-preview/mini) (#68) fix: use concrete SessionNotifier (#135) feat: add guards to session management (#101) fix: Set default model configuration for the Google provider. (#131) test: convert Google Gemini tests to VCR (#118) chore: Add goose providers list command (#116) docs: working ollama for desktop (#125) docs: format and clean up warnings/errors (#120) docs: update deploy workflow (#124) feat: Implement a goose run command (#121)

marcklingen · 2024-10-10T19:00:00Z

Nice @ahau-square, does it make sense to add a quick get-started to the docs? Happy to help as I really like the use cases of using this fully locally

This reverts commit 56d88a8.

* origin/main: feat: add local langfuse tracing option (#106)

* main: (23 commits) feat: Run with resume session (#153) refactor: move langfuse wrapper to a module in exchange instead of a package (#138) docs: add subheaders to the 'Other ways to run Goose' section (#155) fix: Remove tools from exchange when summarizing files (#157) chore: use primitives instead of typing imports and fixes completion … (#149) chore: make vcr tests pretty-print JSON (#146) chore(release): goose 0.9.5 (#159) chore(release): exchange 0.9.5 (#158) chore: updates ollama default model from mistral-nemo to qwen2.5 (#150) feat: add vision support for Google (#141) fix: session resume with arg handled incorrectly (#145) docs: add release instructions to CONTRIBUTING.md (#143) docs: add link to action, IDE words (#140) docs: goosehints doc fix only (#142) chore(release): release 0.9.4 (#136) revert: "feat: add local langfuse tracing option (#106)" (#137) feat: add local langfuse tracing option (#106) feat: add groq provider (#134) feat: add a deep thinking reasoner model (o1-preview/mini) (#68) fix: use concrete SessionNotifier (#135) ...

marcklingen mentioned this pull request Oct 3, 2024

feat: init envs via docker compose langfuse/langfuse#3568

Merged

ahau-square added 13 commits October 7, 2024 13:39

add langfuse flag and tracing

7807492

update dependencies

77c77b7

update dependencies

6dd4bcb

add langfuse to exchange

3307d36

rebase

f3545f8

add observe decarator to provider completions

c33fe54

remove accidentally adding block plugin to dependencies

47a05f4

automate langfuse setup

32e43fc

ruff format

2fc9079

update where langfuse gets setup

7e987af

split langfuse deployment automation into separate script

209184d

update warning message if langfuse not found

38e741e

fix from merge

f60e5f7

ahau-square force-pushed the ahau/langfuse branch from df2aede to f60e5f7 Compare October 7, 2024 17:53

fix formatting

8ca14e1