Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLOB-1555] LLM Observability writers #4699

Merged
merged 4 commits into from
Sep 25, 2024

Conversation

sabrenner
Copy link
Collaborator

@sabrenner sabrenner commented Sep 18, 2024

What does this PR do?

Adds LLM Observability writers for span events (agentless and agent proxy) as well as evaluation metrics (which write directly to our public API).

Important Notes

  • These writers will run on intervals separate from the main agent exporter, and in a future PR will be initialized in the appropriate spots to start those intervals (as defined in the constructor of the base writer). Because of this, these writers specifically won't interact with any tracer internal exporters, writers, or encoders.
  • We need to make sure unicode special characters are encoded in their \\u form in payload strings. I wasn't sure if there was a cleaner way to do this, so any input on this is appreciated!

Motivation

Merge in incremental change of LLMObs writers into the LLM Observability SDK release branch.

The timeline of changes to merge looks like (in order):

  • Config
  • Writers
  • Tagger
  • Span/Trace processor
  • SDK + initialization
  • Type definitions
  • OpenAI integration

.github/workflows/llmobs.yml Show resolved Hide resolved
.github/workflows/llmobs.yml Show resolved Hide resolved
.github/workflows/llmobs.yml Show resolved Hide resolved
Copy link

github-actions bot commented Sep 18, 2024

Overall package size

Self size: 7.17 MB
Deduped: 62.53 MB
No deduping: 62.81 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/native-appsec | 8.1.1 | 18.67 MB | 18.68 MB | | @datadog/native-iast-taint-tracking | 3.1.0 | 12.27 MB | 12.28 MB | | @datadog/pprof | 5.3.0 | 9.85 MB | 10.22 MB | | protobufjs | 7.2.5 | 2.77 MB | 5.16 MB | | @datadog/native-iast-rewriter | 2.4.1 | 2.14 MB | 2.23 MB | | @opentelemetry/core | 1.14.0 | 872.87 kB | 1.47 MB | | @datadog/native-metrics | 2.0.0 | 898.77 kB | 1.3 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | jsonpath-plus | 9.0.0 | 580.4 kB | 1.03 MB | | import-in-the-middle | 1.8.1 | 71.67 kB | 785.15 kB | | msgpack-lite | 0.1.26 | 201.16 kB | 281.59 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.0 | 109.9 kB | 109.9 kB | | semver | 7.6.3 | 95.82 kB | 95.82 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | lru-cache | 7.14.0 | 74.95 kB | 74.95 kB | | ignore | 5.3.1 | 51.46 kB | 51.46 kB | | int64-buffer | 0.1.10 | 49.18 kB | 49.18 kB | | shell-quote | 1.8.1 | 44.96 kB | 44.96 kB | | istanbul-lib-coverage | 3.2.0 | 29.34 kB | 29.34 kB | | rfdc | 1.3.1 | 25.21 kB | 25.21 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | dc-polyfill | 0.1.4 | 23.1 kB | 23.1 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | path-to-regexp | 0.1.10 | 6.38 kB | 6.38 kB | | module-details-from-path | 1.0.3 | 4.47 kB | 4.47 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@pr-commenter
Copy link

pr-commenter bot commented Sep 18, 2024

Benchmarks

Benchmark execution time: 2024-09-19 18:46:59

Comparing candidate commit ce7e950 in PR branch sabrenner/llmobs-writers with baseline commit 54c8eec in branch sabrenner/llmobs-sdk-release.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 259 metrics, 7 unstable metrics.

@sabrenner sabrenner marked this pull request as ready for review September 19, 2024 14:21
@sabrenner sabrenner requested a review from a team as a code owner September 19, 2024 14:21
Copy link
Contributor

@Yun-Kim Yun-Kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from team mlobs, just some small suggestions / clarification questions

packages/dd-trace/src/llmobs/writers/evaluations.js Outdated Show resolved Hide resolved
packages/dd-trace/src/llmobs/writers/spans/agentless.js Outdated Show resolved Hide resolved
Comment on lines +100 to +103
if (typeof value === 'string') {
return encodeUnicode(value) // serialize unicode characters
}
return value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clarification, can you explain what exactly's happening here? Does json.stringify() get called first then we run the encodeUnicode() helper on the result afterwards?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it gets run as JSON.stringify is happening. when passing a callback function to JSON.stringify, it'll execute that function over any values in the object. since we need to encode unicode characters (ie \u2013) for our decoder on ingestion, this function will make sure we encode those special characters with the correct unicode value (I think json.dumps does this for us on the Python SDK, but JSON.stringify doesn't do it by default here). There might be a better approach for this, will wait for Node.js folks input on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sabrenner sabrenner changed the title [MLOB-1555] add LLMObs writers [MLOB-1555] LLM Observability writers Sep 24, 2024
@sabrenner sabrenner merged commit 5b215f6 into sabrenner/llmobs-sdk-release Sep 25, 2024
175 of 178 checks passed
@sabrenner sabrenner deleted the sabrenner/llmobs-writers branch September 25, 2024 17:17
sabrenner added a commit that referenced this pull request Oct 29, 2024
* [MLOB-1540] add llmobs configuration to global tracer config (#4696)

add llmobs config

* [MLOB-1555] LLM Observability writers (#4699)

LLM Observability writers

* [MLOB-1556] LLM Observability tagger (#4718)

LLM Observability tagger

* [MLOB-1560] LLMObs Span Processor (#4738)

* span processor

* tests

* remove agent exporter log and do not stringify tags

* remove llmobs from exporter tests

* add in default unserializable value

* review comments

* warning log for metric

* todo-ify

* remove some duplicate logic

* decouple llmobs span processing with a channel

* use a static weakmap to store llmobs tags/annotations instead of span tags

* do not register span in map if it does not have an llmobs span kind

* span is passed on an object from sp publisher

* re-clarify TODOs

* only send span in publish

* log multiple warnings and return conditional undefined

* update error logic

* [MLOB-1561] LLM Observability SDK API (#4773)

* wip

* type definitions

* active + try/catch eval metric writer append

* test ts

* use tagger map and processor as a channel subscriber

* change decorate and add in dev changes

* try some api changes

* add decorate to noop

* fix breaking proxy tests

* experimental decorators for TS docs

* api changes, fix unit + e2e tests

* try removing global log mocks

* add some util tests

* remove logger mocks

* add module tests + do not enable when not specified

* fix eval metric integration test

* wip

* memoize getFunctionArguments

* move any subscriber and global writer to the module enablement level instead of sdk

* should fix TS tests

* add ts integration test and fix decorator

* devex for ts versions

* add noop typescript test

* remove startSpan

* remove unneeded change

* dedup decorator code

* Update index.d.ts

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* map metrics names

* change validKind to validateKind and throw

* tagger for metrics follow-up

* review feedback

* add some tests for not auto-annotating in certain cases

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* hard fail instead of soft fail, except for `wrap` span name

* add ml-observability codeowners

* resolve ts test

* update auto-annotation check

* tagger can soft fail

* using custom ASL instance and scope activation

* fix test comments and remove

* address review comments

* remove llmobs.apiKey config, only rely on global

* fix evaulations test

* make llmobs storage accessible

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
rochdev pushed a commit that referenced this pull request Oct 31, 2024
* [MLOB-1540] add llmobs configuration to global tracer config (#4696)

add llmobs config

* [MLOB-1555] LLM Observability writers (#4699)

LLM Observability writers

* [MLOB-1556] LLM Observability tagger (#4718)

LLM Observability tagger

* [MLOB-1560] LLMObs Span Processor (#4738)

* span processor

* tests

* remove agent exporter log and do not stringify tags

* remove llmobs from exporter tests

* add in default unserializable value

* review comments

* warning log for metric

* todo-ify

* remove some duplicate logic

* decouple llmobs span processing with a channel

* use a static weakmap to store llmobs tags/annotations instead of span tags

* do not register span in map if it does not have an llmobs span kind

* span is passed on an object from sp publisher

* re-clarify TODOs

* only send span in publish

* log multiple warnings and return conditional undefined

* update error logic

* [MLOB-1561] LLM Observability SDK API (#4773)

* wip

* type definitions

* active + try/catch eval metric writer append

* test ts

* use tagger map and processor as a channel subscriber

* change decorate and add in dev changes

* try some api changes

* add decorate to noop

* fix breaking proxy tests

* experimental decorators for TS docs

* api changes, fix unit + e2e tests

* try removing global log mocks

* add some util tests

* remove logger mocks

* add module tests + do not enable when not specified

* fix eval metric integration test

* wip

* memoize getFunctionArguments

* move any subscriber and global writer to the module enablement level instead of sdk

* should fix TS tests

* add ts integration test and fix decorator

* devex for ts versions

* add noop typescript test

* remove startSpan

* remove unneeded change

* dedup decorator code

* Update index.d.ts

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* map metrics names

* change validKind to validateKind and throw

* tagger for metrics follow-up

* review feedback

* add some tests for not auto-annotating in certain cases

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* hard fail instead of soft fail, except for `wrap` span name

* add ml-observability codeowners

* resolve ts test

* update auto-annotation check

* tagger can soft fail

* using custom ASL instance and scope activation

* fix test comments and remove

* address review comments

* remove llmobs.apiKey config, only rely on global

* fix evaulations test

* make llmobs storage accessible

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
rochdev pushed a commit that referenced this pull request Oct 31, 2024
* [MLOB-1540] add llmobs configuration to global tracer config (#4696)

add llmobs config

* [MLOB-1555] LLM Observability writers (#4699)

LLM Observability writers

* [MLOB-1556] LLM Observability tagger (#4718)

LLM Observability tagger

* [MLOB-1560] LLMObs Span Processor (#4738)

* span processor

* tests

* remove agent exporter log and do not stringify tags

* remove llmobs from exporter tests

* add in default unserializable value

* review comments

* warning log for metric

* todo-ify

* remove some duplicate logic

* decouple llmobs span processing with a channel

* use a static weakmap to store llmobs tags/annotations instead of span tags

* do not register span in map if it does not have an llmobs span kind

* span is passed on an object from sp publisher

* re-clarify TODOs

* only send span in publish

* log multiple warnings and return conditional undefined

* update error logic

* [MLOB-1561] LLM Observability SDK API (#4773)

* wip

* type definitions

* active + try/catch eval metric writer append

* test ts

* use tagger map and processor as a channel subscriber

* change decorate and add in dev changes

* try some api changes

* add decorate to noop

* fix breaking proxy tests

* experimental decorators for TS docs

* api changes, fix unit + e2e tests

* try removing global log mocks

* add some util tests

* remove logger mocks

* add module tests + do not enable when not specified

* fix eval metric integration test

* wip

* memoize getFunctionArguments

* move any subscriber and global writer to the module enablement level instead of sdk

* should fix TS tests

* add ts integration test and fix decorator

* devex for ts versions

* add noop typescript test

* remove startSpan

* remove unneeded change

* dedup decorator code

* Update index.d.ts

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* map metrics names

* change validKind to validateKind and throw

* tagger for metrics follow-up

* review feedback

* add some tests for not auto-annotating in certain cases

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* hard fail instead of soft fail, except for `wrap` span name

* add ml-observability codeowners

* resolve ts test

* update auto-annotation check

* tagger can soft fail

* using custom ASL instance and scope activation

* fix test comments and remove

* address review comments

* remove llmobs.apiKey config, only rely on global

* fix evaulations test

* make llmobs storage accessible

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
rochdev pushed a commit that referenced this pull request Oct 31, 2024
* [MLOB-1540] add llmobs configuration to global tracer config (#4696)

add llmobs config

* [MLOB-1555] LLM Observability writers (#4699)

LLM Observability writers

* [MLOB-1556] LLM Observability tagger (#4718)

LLM Observability tagger

* [MLOB-1560] LLMObs Span Processor (#4738)

* span processor

* tests

* remove agent exporter log and do not stringify tags

* remove llmobs from exporter tests

* add in default unserializable value

* review comments

* warning log for metric

* todo-ify

* remove some duplicate logic

* decouple llmobs span processing with a channel

* use a static weakmap to store llmobs tags/annotations instead of span tags

* do not register span in map if it does not have an llmobs span kind

* span is passed on an object from sp publisher

* re-clarify TODOs

* only send span in publish

* log multiple warnings and return conditional undefined

* update error logic

* [MLOB-1561] LLM Observability SDK API (#4773)

* wip

* type definitions

* active + try/catch eval metric writer append

* test ts

* use tagger map and processor as a channel subscriber

* change decorate and add in dev changes

* try some api changes

* add decorate to noop

* fix breaking proxy tests

* experimental decorators for TS docs

* api changes, fix unit + e2e tests

* try removing global log mocks

* add some util tests

* remove logger mocks

* add module tests + do not enable when not specified

* fix eval metric integration test

* wip

* memoize getFunctionArguments

* move any subscriber and global writer to the module enablement level instead of sdk

* should fix TS tests

* add ts integration test and fix decorator

* devex for ts versions

* add noop typescript test

* remove startSpan

* remove unneeded change

* dedup decorator code

* Update index.d.ts

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* map metrics names

* change validKind to validateKind and throw

* tagger for metrics follow-up

* review feedback

* add some tests for not auto-annotating in certain cases

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* hard fail instead of soft fail, except for `wrap` span name

* add ml-observability codeowners

* resolve ts test

* update auto-annotation check

* tagger can soft fail

* using custom ASL instance and scope activation

* fix test comments and remove

* address review comments

* remove llmobs.apiKey config, only rely on global

* fix evaulations test

* make llmobs storage accessible

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
rochdev pushed a commit that referenced this pull request Nov 6, 2024
* [MLOB-1540] add llmobs configuration to global tracer config (#4696)

add llmobs config

* [MLOB-1555] LLM Observability writers (#4699)

LLM Observability writers

* [MLOB-1556] LLM Observability tagger (#4718)

LLM Observability tagger

* [MLOB-1560] LLMObs Span Processor (#4738)

* span processor

* tests

* remove agent exporter log and do not stringify tags

* remove llmobs from exporter tests

* add in default unserializable value

* review comments

* warning log for metric

* todo-ify

* remove some duplicate logic

* decouple llmobs span processing with a channel

* use a static weakmap to store llmobs tags/annotations instead of span tags

* do not register span in map if it does not have an llmobs span kind

* span is passed on an object from sp publisher

* re-clarify TODOs

* only send span in publish

* log multiple warnings and return conditional undefined

* update error logic

* [MLOB-1561] LLM Observability SDK API (#4773)

* wip

* type definitions

* active + try/catch eval metric writer append

* test ts

* use tagger map and processor as a channel subscriber

* change decorate and add in dev changes

* try some api changes

* add decorate to noop

* fix breaking proxy tests

* experimental decorators for TS docs

* api changes, fix unit + e2e tests

* try removing global log mocks

* add some util tests

* remove logger mocks

* add module tests + do not enable when not specified

* fix eval metric integration test

* wip

* memoize getFunctionArguments

* move any subscriber and global writer to the module enablement level instead of sdk

* should fix TS tests

* add ts integration test and fix decorator

* devex for ts versions

* add noop typescript test

* remove startSpan

* remove unneeded change

* dedup decorator code

* Update index.d.ts

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* map metrics names

* change validKind to validateKind and throw

* tagger for metrics follow-up

* review feedback

* add some tests for not auto-annotating in certain cases

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* hard fail instead of soft fail, except for `wrap` span name

* add ml-observability codeowners

* resolve ts test

* update auto-annotation check

* tagger can soft fail

* using custom ASL instance and scope activation

* fix test comments and remove

* address review comments

* remove llmobs.apiKey config, only rely on global

* fix evaulations test

* make llmobs storage accessible

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
rochdev pushed a commit that referenced this pull request Nov 6, 2024
* [MLOB-1540] add llmobs configuration to global tracer config (#4696)

add llmobs config

* [MLOB-1555] LLM Observability writers (#4699)

LLM Observability writers

* [MLOB-1556] LLM Observability tagger (#4718)

LLM Observability tagger

* [MLOB-1560] LLMObs Span Processor (#4738)

* span processor

* tests

* remove agent exporter log and do not stringify tags

* remove llmobs from exporter tests

* add in default unserializable value

* review comments

* warning log for metric

* todo-ify

* remove some duplicate logic

* decouple llmobs span processing with a channel

* use a static weakmap to store llmobs tags/annotations instead of span tags

* do not register span in map if it does not have an llmobs span kind

* span is passed on an object from sp publisher

* re-clarify TODOs

* only send span in publish

* log multiple warnings and return conditional undefined

* update error logic

* [MLOB-1561] LLM Observability SDK API (#4773)

* wip

* type definitions

* active + try/catch eval metric writer append

* test ts

* use tagger map and processor as a channel subscriber

* change decorate and add in dev changes

* try some api changes

* add decorate to noop

* fix breaking proxy tests

* experimental decorators for TS docs

* api changes, fix unit + e2e tests

* try removing global log mocks

* add some util tests

* remove logger mocks

* add module tests + do not enable when not specified

* fix eval metric integration test

* wip

* memoize getFunctionArguments

* move any subscriber and global writer to the module enablement level instead of sdk

* should fix TS tests

* add ts integration test and fix decorator

* devex for ts versions

* add noop typescript test

* remove startSpan

* remove unneeded change

* dedup decorator code

* Update index.d.ts

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* map metrics names

* change validKind to validateKind and throw

* tagger for metrics follow-up

* review feedback

* add some tests for not auto-annotating in certain cases

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* hard fail instead of soft fail, except for `wrap` span name

* add ml-observability codeowners

* resolve ts test

* update auto-annotation check

* tagger can soft fail

* using custom ASL instance and scope activation

* fix test comments and remove

* address review comments

* remove llmobs.apiKey config, only rely on global

* fix evaulations test

* make llmobs storage accessible

---------

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants