Fix local server regressions caused by Jinja PR #3256
Merged
+88
−66
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
1. Don't ignore content of assistant messages in history
This was originally fixed in #2929 (the first time it worked), but was broken again in #3147. We should write an automated test for this so we don't break it again.
This is a simple test of the message history using the local server:
After #2929, the model remembers its name, but on current main it forgets. This PR fixes that. See also: #2602
2. Don't report garbage token counts and incorrect stop reason
Additionally, this PR fixes an uninitialized struct that was causing the token counts to be unreasonably high garbage values and the stop reason to be incorrect. Now all of the local server tests pass (including the test that was previously marked as XFAIL).
3. Don't leave previous conversations in the LLM's context
If you send the my-name-is-Bob test above to the LLM in one request, and then in another request send only this:
The LLM should not mention the Bob test, since it's not part of this conversation. But there was some missing code in the Jinja PR to actually use the server's local list of messages instead of the entire chat view's contents (which reflects previous conversations).