API: move to api-version=2 for faster streaming LLM responses (#5446) · sourcegraph/cody@26b0a64

Commit

API: move to api-version=2 for faster streaming LLM responses (#5446)

Previously, the Cody IDE client used `api-version=1` for LLM chat
responses, which sent the full LLM response on every streaming change.
This was problematic because it meant we were sending a lot of redundant
traffic between the IDE client and the remote Sourcegraph instance
eventually resulting in a slower user experience.

Now, we use `api-version=2`, which only sends the delta between chunks.
This significantly reduces the amount of characters we're sending over
the wire. For example, for a chat response with 3k output tokens, we
were previously processing up to 1.8m tokens(!!) while now we only
process 3k tokens.

Don't merge until sourcegraph/sourcegraph#293
goes live.

## Test plan

Tested locally and confirmed that both api-version=1 and api-version=2
work as expected.

- [x] Update all HTTP recordings to reflect the new API. This should
give us good test coverage.
- [x] Manually confirm the web extension is still working. Cody Web has
no automated tests, but I ran the demo locally and took this screenshot
of the delta encoding in action
![CleanShot 2024-09-12 at 10 25
52@2x](https://github.com/user-attachments/assets/4189fad2-f23a-4d83-ac1c-96aa462099a2)

## Changelog

* Cody now uses a new LLM API that offers faster performance, especially
for long chat responses. This improvement is only enabled for Claude
models at this point.