Use Spice #543

biobootloader · 2024-03-22T05:18:31Z

Current status: Mentat runs with both OpenAI and Anthropic, through Spice. LiteLLM proxies and AzureOpenAI should also work, although I haven't tested yet.

Still to do before merging:

Pull Request Checklist

Documentation has been updated, or this change doesn't require that

mentatbot · 2024-03-22T05:19:12Z

mentat/llm_api_handler.py

@@ -44,6 +44,7 @@
 )
 from openai.types.chat.completion_create_params import ResponseFormat
 from PIL import Image
+from spice import Spice, SpiceResponse


It seems like the import for Spice and SpiceResponse is duplicated. It's already imported at the top of the file, so this additional import statement can be removed.

mentat/llm_api_handler.py

mentatbot · 2024-03-22T05:19:15Z

mentat/llm_api_handler.py

@@ -409,52 +392,34 @@ async def call_llm_api(
        start_time = default_timer()
        with sentry_sdk.start_span(description="LLM Call") as span:
            span.set_tag("model", model)
+


The switch to using the Spice client for LLM API calls is a significant change. Ensure that the Spice client is correctly configured to handle all the features and requirements of the previous OpenAI client, including error handling, rate limiting, and response parsing.

mentatbot · 2024-03-22T05:19:17Z

mentat/utils.py

@@ -89,9 +89,9 @@ def create_viewer(transcripts: list[Transcript]) -> Path:


 async def add_newline(
-    iterator: AsyncIterator[ChatCompletionChunk],
+    iterator: AsyncIterator[str],


The add_newline function has been modified to work with a string directly instead of ChatCompletionChunk. This change simplifies the function but requires careful review to ensure it's correctly integrated with the rest of the system, especially where streaming responses are handled.

mentatbot · 2024-03-22T05:19:19Z

requirements.txt

@@ -29,3 +29,4 @@ typing_extensions==4.8.0
 tqdm==4.66.1
 webdriver_manager==4.0.1
 watchfiles==0.21.0
+spice @ git+https://github.com/AbanteAI/spice@main


Adding spice as a dependency directly from a Git repository can introduce instability, as any changes to the main branch of the repository will immediately affect this project. Consider pinning the dependency to a specific commit or version tag to ensure consistent behavior.

mentatbot · 2024-03-22T05:19:21Z

MENTAT CODE REVIEW IN ACTIVE DEVELOPMENT. Only in use on mentat and internal repos.
Please Reply with feedback.

This pull request introduces significant changes, notably the switch to using the Spice client for LLM API calls and modifications to utility functions to simplify their interfaces. While these changes can potentially improve the codebase's maintainability and performance, careful consideration is required to ensure that all features and error handling previously provided by the OpenAI client are adequately covered. Additionally, the direct dependency on the spice Git repository should be managed carefully to avoid potential instability.

mentatbot · 2024-03-22T18:25:31Z

mentat/llm_api_handler.py

@@ -44,6 +44,7 @@
 )
 from openai.types.chat.completion_create_params import ResponseFormat
 from PIL import Image
+from spice import Spice, SpiceResponse


It seems the import for Spice and SpiceResponse is duplicated and already present at the top of the file. Consider removing this additional import to maintain code cleanliness.

mentatbot · 2024-03-22T18:25:33Z

mentat/llm_api_handler.py

+                temperature=config.temperature,
+                response_format=response_format,
+            )
+


Ensure comprehensive testing of the Spice client's integration, focusing on error handling, rate limiting, and response parsing to match or exceed the functionality previously provided by the OpenAI client.

mentatbot · 2024-03-22T18:25:34Z

mentat/utils.py

@@ -89,9 +89,9 @@ def create_viewer(transcripts: list[Transcript]) -> Path:


 async def add_newline(
-    iterator: AsyncIterator[ChatCompletionChunk],
+    iterator: AsyncIterator[str],


After modifying the add_newline function to work with strings directly, verify its integration, especially in scenarios involving streaming responses, to ensure it behaves as expected.

mentatbot · 2024-03-22T18:25:36Z

requirements.txt

@@ -29,3 +29,4 @@ typing_extensions==4.8.0
 tqdm==4.66.1
 webdriver_manager==4.0.1
 watchfiles==0.21.0
+spice @ git+https://github.com/AbanteAI/spice@main


Consider pinning the spice dependency to a specific commit or version tag to ensure stability and predictable behavior across different environments.

mentatbot · 2024-03-22T18:25:38Z

MENTAT CODE REVIEW IN ACTIVE DEVELOPMENT. Only in use on mentat and internal repos.
Please Reply with feedback.

This pull request introduces significant architectural changes, particularly in integrating the Spice client for LLM API calls and simplifying utility function interfaces. While these changes have the potential to enhance the project's maintainability and efficiency, it's crucial to ensure comprehensive testing and consider the stability implications of directly depending on a Git repository for the spice package. Additionally, removing duplicated imports and verifying the correct integration of modified utility functions will help maintain code quality.

mentatbot · 2024-03-22T20:43:23Z

mentat/llm_api_handler.py

@@ -44,6 +44,7 @@
 )
 from openai.types.chat.completion_create_params import ResponseFormat
 from PIL import Image
+from spice import Spice, SpiceResponse


It appears the import for Spice and SpiceResponse is indeed duplicated. Please remove the duplicate import to maintain code cleanliness.

mentatbot · 2024-03-22T20:43:25Z

mentat/llm_api_handler.py

+                stream=stream,
+                temperature=config.temperature,
+                response_format=response_format,
+                logging_callback=cost_tracker.log_api_call_stats,


Given the switch to using the Spice client, it's crucial to ensure that error handling, rate limiting, and response parsing are thoroughly tested and aligned with the previous functionality provided by the OpenAI client.

mentatbot · 2024-03-22T20:43:27Z

mentat/utils.py

@@ -89,9 +89,9 @@ def create_viewer(transcripts: list[Transcript]) -> Path:


 async def add_newline(
-    iterator: AsyncIterator[ChatCompletionChunk],
+    iterator: AsyncIterator[str],


After modifying the add_newline function to work directly with strings, please ensure its integration is thoroughly tested, especially in scenarios involving streaming responses.

mentatbot · 2024-03-22T20:43:28Z

requirements.txt

@@ -29,3 +29,4 @@ typing_extensions==4.8.0
 tqdm==4.66.1
 webdriver_manager==4.0.1
 watchfiles==0.21.0
+spice @ git+https://github.com/AbanteAI/spice@main


To ensure stability and predictable behavior, it's recommended to pin the spice dependency to a specific commit or version tag.

granawkins · 2024-03-25T05:01:42Z

The approach looks good to me, and this as a starting point.

I had to install anthropic manually. Maybe it needs to be in Spice's pyproject?

Also getting these errors after each response:

If Butler and Ragdaemon also use Spice directly (in addition to with Mentat), should they pass around an instance of Spice? Or use their own?

biobootloader · 2024-03-25T18:28:40Z

@granawkins

I had to install anthropic manually. Maybe it needs to be in Spice's pyproject?

it is in spice's requirements.txt, it should install once spice is coming from PyPI instead of github

Also getting these errors after each response

ah yeah OpenAI doesn't send token counts when you stream (anthropic does which is nice!). So those will need to be counted. I'm moving the token counting code from Mentat to Spice now.

If Butler and Ragdaemon also use Spice directly (in addition to with Mentat), should they pass around an instance of Spice? Or use their own?

It should always be fine to use their own instances. The only exception would be if we want Spice to handle rate limiting of calls

mentatbot · 2024-03-27T01:49:18Z

mentat/agent_handler.py

@@ -48,7 +48,7 @@ async def enable_agent_mode(self):
        ]
        model = ctx.config.model
        response = await ctx.llm_api_handler.call_llm_api(messages, model, False)
-        content = response.choices[0].message.content or ""
+        content = response.text


The change to use response.text directly simplifies the code. Ensure that all instances where the LLM API response is processed are updated to this simpler approach.

mentat/conversation.py

mentat/cost_tracker.py

mentatbot · 2024-03-27T01:49:23Z

mentat/embeddings.py

-            embedding_model,
-            start_time - default_timer(),
-        )
+        # TODO: log cost for embedding models properly


The TODO comment about logging costs for embedding models properly is important for cost management. Prioritize this to ensure accurate cost tracking and reporting.

mentatbot · 2024-03-27T01:49:25Z

mentat/feature_filters/llm_feature_filter.py

@@ -96,15 +96,16 @@ async def filter(
            stream=False,
            response_format=ResponseFormat(type="json_object"),
        )
-        message = (llm_response.choices[0].message.content) or ""
+        message = llm_response.text


Given the removal of detailed LLM API call logging, consider implementing a new mechanism to log or monitor these calls for debugging and performance analysis.

mentatbot · 2024-03-27T01:49:27Z

mentat/llm_api_handler.py

@@ -362,39 +363,21 @@ def initialize_client(self):
            self.async_client = AsyncOpenAI(api_key=key, base_url=base_url)
            self.sync_client = OpenAI(api_key=key, base_url=base_url)

+        self.spice_client = Spice()


The initialization of the Spice client is a key part of the integration. Ensure that the client is correctly configured, especially regarding error handling and rate limiting.

mentatbot · 2024-03-27T01:49:28Z

mentat/revisor/revisor.py

@@ -71,7 +71,7 @@ async def revise_edit(file_edit: FileEdit):
        style="info",
    )
    response = await ctx.llm_api_handler.call_llm_api(messages, model=ctx.config.model, stream=False)
-    message = response.choices[0].message.content or ""
+    message = response.text


After the change to use response.text, verify that the revision logic for edits still functions as intended, particularly in terms of parsing and applying the suggested revisions.

mentatbot · 2024-04-03T22:18:40Z

mentat/llm_api_handler.py

-            file=audio_file,
-        )
-        return transcript.text
+        return await self.spice_whisper_client.get_whisper_transcription(audio_path)


Suggested change

return await self.spice_whisper_client.get_whisper_transcription(audio_path)

Ensure to handle the stream parameter correctly across all LLM API calls. This includes verifying that streaming responses are processed accurately and efficiently, without causing delays or data loss.

mentatbot · 2024-04-03T22:18:42Z

mentat/utils.py

-            object=last_chunk.object,
-            system_fingerprint=last_chunk.system_fingerprint,
-        )
+        yield "\n"


Suggested change

yield "\n"

Consider using a more robust method for determining if a file is text-encoded. The current approach might not be efficient for large files.

mentatbot · 2024-04-03T22:18:44Z

requirements.txt

@@ -29,3 +29,4 @@ typing_extensions==4.8.0
 tqdm==4.66.1
 webdriver_manager==4.0.1
 watchfiles==0.21.0
+spiceai==0.1.7


Suggested change

spiceai==0.1.7

spiceai==0.1.7

Now that spiceai is specified, ensure to keep track of new releases for potential updates that could benefit Mentat. Testing the application thoroughly with any new version to confirm compatibility and stability before updating the production environment is crucial.

mentatbot · 2024-04-03T22:18:46Z

mentat/parsers/parser.py

@@ -48,7 +47,7 @@ def get_system_prompt(self) -> str:
    def response_format(self) -> ResponseFormat:
        return ResponseFormat(type="text")

-    async def stream_and_parse_llm_response(self, response: AsyncIterator[ChatCompletionChunk]) -> ParsedLLMResponse:
+    async def stream_and_parse_llm_response(self, response: AsyncIterator[str]) -> ParsedLLMResponse:


Suggested change

async def stream_and_parse_llm_response(self, response: AsyncIterator[str]) -> ParsedLLMResponse:

Given the transition to Spice, ensure that the call_llm_api method's integration with Spice correctly handles all scenarios previously managed with the OpenAI client, especially error handling and rate limiting.

mentatbot · 2024-04-03T22:18:48Z

MENTAT CODE REVIEW IN ACTIVE DEVELOPMENT. Only in use on mentat and internal repos.
Please Reply with feedback.

The pull request introduces significant changes, particularly the integration of the Spice client for LLM API calls. It's crucial to ensure comprehensive testing, especially for error handling, rate limiting, and response parsing. Additionally, managing the direct dependency on the spice Git repository carefully is important to maintain stability. The updates to model compatibility warnings are a positive step towards guiding users more effectively.

.github/workflows/lint_and_test.yml

benchmarks/benchmark_runner.py

benchmarks/context_benchmark.py

benchmarks/exercism_practice.py

benchmarks/run_sample.py

benchmarks/swe_bench_runner.py

docs/source/user/alternative_models.rst

mentatbot · 2024-04-03T23:08:44Z

mentat/agent_handler.py

@@ -48,7 +48,7 @@ async def enable_agent_mode(self):
        ]
        model = ctx.config.model
        response = await ctx.llm_api_handler.call_llm_api(messages, model, False)
-        content = response.choices[0].message.content or ""
+        content = response.text


Switching to response.text for processing the LLM response simplifies the code. Ensure that this change is consistently applied across all similar instances in the codebase.

mentat/conversation.py

mentat/cost_tracker.py

mentat/embeddings.py

mentatbot · 2024-04-03T23:08:52Z

mentat/feature_filters/llm_feature_filter.py

@@ -89,22 +87,13 @@ async def filter(
            )
        )
        selected_refs = list[Path]()
-        start_time = default_timer()
        llm_response = await llm_api_handler.call_llm_api(


Given the removal of detailed LLM API call logging, consider implementing a new mechanism to log or monitor these calls for debugging and performance analysis.

mentat/llm_api_handler.py

mentatbot · 2024-04-03T23:08:56Z

mentat/parsers/json_parser.py

@@ -94,7 +93,7 @@ def line_number_starting_index(self) -> int:
        return 0

    @override
-    async def stream_and_parse_llm_response(self, response: AsyncIterator[ChatCompletionChunk]) -> ParsedLLMResponse:
+    async def stream_and_parse_llm_response(self, response: AsyncIterator[str]) -> ParsedLLMResponse:


It's recommended to add a comment explaining why response_format is being checked and potentially transformed here for clarity and future maintainability.

mentatbot · 2024-04-03T23:08:58Z

mentat/parsers/parser.py

@@ -48,7 +47,7 @@ def get_system_prompt(self) -> str:
    def response_format(self) -> ResponseFormat:
        return ResponseFormat(type="text")

-    async def stream_and_parse_llm_response(self, response: AsyncIterator[ChatCompletionChunk]) -> ParsedLLMResponse:
+    async def stream_and_parse_llm_response(self, response: AsyncIterator[str]) -> ParsedLLMResponse:


Given the transition to Spice, ensure that the call_llm_api method's integration with Spice correctly handles all scenarios previously managed with the OpenAI client, especially error handling and rate limiting.

mentatbot · 2024-04-03T23:09:00Z

mentat/revisor/revisor.py

@@ -67,7 +67,7 @@ async def revise_edit(file_edit: FileEdit):
        style="info",
    )
    response = await ctx.llm_api_handler.call_llm_api(messages, model=ctx.config.model, stream=False)
-    message = response.choices[0].message.content or ""
+    message = response.text


After the change to use response.text, verify that the revision logic for edits still functions as intended, particularly in terms of parsing and applying the suggested revisions.

mentatbot · 2024-04-03T23:09:02Z

mentat/splash_messages.py

@@ -72,14 +72,14 @@ def check_version():
 def check_model():
    ctx = SESSION_CONTEXT.get()
    model = ctx.config.model
-    if "gpt-4" not in model:
+    if "gpt-4" not in model and "opus" not in model:


The update to the model compatibility warning is important for user guidance. Ensure that this message is clear and accurately reflects the models supported by Mentat.

mentatbot · 2024-04-03T23:09:04Z

mentat/utils.py

-            object=last_chunk.object,
-            system_fingerprint=last_chunk.system_fingerprint,
-        )
+        yield "\n"


The change in add_newline to work with strings directly is significant. Ensure that this function is tested in all contexts where it's used to verify its correct operation.

mentatbot · 2024-04-03T23:09:06Z

requirements.txt

@@ -29,3 +29,4 @@ typing_extensions==4.8.0
 tqdm==4.66.1
 webdriver_manager==4.0.1
 watchfiles==0.21.0
+spiceai==0.1.7


Adding spice from a Git repository directly to requirements.txt introduces potential instability. Consider using a specific version or commit hash to ensure stability.

mentatbot · 2024-04-03T23:09:08Z

tests/conftest.py

-        )
+        mock_spice_response = MagicMock()
+        mock_spice_response.text = value
+


Consider implementing a more detailed logging mechanism for API calls made through the Spice client, similar to the previous implementation with OpenAI. This could help with debugging and monitoring API usage.

mentatbot · 2024-04-03T23:09:10Z

tests/license_check.py

@@ -22,6 +22,7 @@
    "Mozilla Public License 2.0 (MPL 2.0)",
    "Python Software Foundation License",
    "Apache 2.0",
+    "Apache-2.0",


Ensure that the license_check.py script is updated to reflect any new dependencies or changes in licensing that may affect the project's compliance.

biobootloader added 2 commits March 19, 2024 17:35

add spice to reqs

9eade0e

anthropic working on mentat

8f229d6

mentatbot bot reviewed Mar 22, 2024

View reviewed changes

mentat/llm_api_handler.py Show resolved Hide resolved

mentatbot bot reviewed Mar 22, 2024

View reviewed changes

update check model

18079f9

mentatbot bot reviewed Mar 22, 2024

View reviewed changes

logging callback

51014d6

mentatbot bot reviewed Mar 22, 2024

View reviewed changes

update

f5588f1

interrupted logging

2d111cf

mentatbot bot reviewed Mar 27, 2024

View reviewed changes

mentat/conversation.py Show resolved Hide resolved

mentatbot bot reviewed Mar 27, 2024

View reviewed changes

mentat/cost_tracker.py Show resolved Hide resolved

mentatbot bot reviewed Mar 27, 2024

View reviewed changes