-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Spice #543
Use Spice #543
Conversation
mentat/llm_api_handler.py
Outdated
@@ -44,6 +44,7 @@ | |||
) | |||
from openai.types.chat.completion_create_params import ResponseFormat | |||
from PIL import Image | |||
from spice import Spice, SpiceResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the import for Spice
and SpiceResponse
is duplicated. It's already imported at the top of the file, so this additional import statement can be removed.
@@ -409,52 +392,34 @@ async def call_llm_api( | |||
start_time = default_timer() | |||
with sentry_sdk.start_span(description="LLM Call") as span: | |||
span.set_tag("model", model) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The switch to using the Spice
client for LLM API calls is a significant change. Ensure that the Spice
client is correctly configured to handle all the features and requirements of the previous OpenAI
client, including error handling, rate limiting, and response parsing.
@@ -89,9 +89,9 @@ def create_viewer(transcripts: list[Transcript]) -> Path: | |||
|
|||
|
|||
async def add_newline( | |||
iterator: AsyncIterator[ChatCompletionChunk], | |||
iterator: AsyncIterator[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The add_newline
function has been modified to work with a string directly instead of ChatCompletionChunk
. This change simplifies the function but requires careful review to ensure it's correctly integrated with the rest of the system, especially where streaming responses are handled.
requirements.txt
Outdated
@@ -29,3 +29,4 @@ typing_extensions==4.8.0 | |||
tqdm==4.66.1 | |||
webdriver_manager==4.0.1 | |||
watchfiles==0.21.0 | |||
spice @ git+https://github.com/AbanteAI/spice@main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding spice
as a dependency directly from a Git repository can introduce instability, as any changes to the main
branch of the repository will immediately affect this project. Consider pinning the dependency to a specific commit or version tag to ensure consistent behavior.
MENTAT CODE REVIEW IN ACTIVE DEVELOPMENT. Only in use on mentat and internal repos. This pull request introduces significant changes, notably the switch to using the |
mentat/llm_api_handler.py
Outdated
@@ -44,6 +44,7 @@ | |||
) | |||
from openai.types.chat.completion_create_params import ResponseFormat | |||
from PIL import Image | |||
from spice import Spice, SpiceResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the import for Spice
and SpiceResponse
is duplicated and already present at the top of the file. Consider removing this additional import to maintain code cleanliness.
mentat/llm_api_handler.py
Outdated
temperature=config.temperature, | ||
response_format=response_format, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure comprehensive testing of the Spice
client's integration, focusing on error handling, rate limiting, and response parsing to match or exceed the functionality previously provided by the OpenAI
client.
@@ -89,9 +89,9 @@ def create_viewer(transcripts: list[Transcript]) -> Path: | |||
|
|||
|
|||
async def add_newline( | |||
iterator: AsyncIterator[ChatCompletionChunk], | |||
iterator: AsyncIterator[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After modifying the add_newline
function to work with strings directly, verify its integration, especially in scenarios involving streaming responses, to ensure it behaves as expected.
requirements.txt
Outdated
@@ -29,3 +29,4 @@ typing_extensions==4.8.0 | |||
tqdm==4.66.1 | |||
webdriver_manager==4.0.1 | |||
watchfiles==0.21.0 | |||
spice @ git+https://github.com/AbanteAI/spice@main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider pinning the spice
dependency to a specific commit or version tag to ensure stability and predictable behavior across different environments.
MENTAT CODE REVIEW IN ACTIVE DEVELOPMENT. Only in use on mentat and internal repos. This pull request introduces significant architectural changes, particularly in integrating the |
mentat/llm_api_handler.py
Outdated
@@ -44,6 +44,7 @@ | |||
) | |||
from openai.types.chat.completion_create_params import ResponseFormat | |||
from PIL import Image | |||
from spice import Spice, SpiceResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears the import for Spice
and SpiceResponse
is indeed duplicated. Please remove the duplicate import to maintain code cleanliness.
stream=stream, | ||
temperature=config.temperature, | ||
response_format=response_format, | ||
logging_callback=cost_tracker.log_api_call_stats, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the switch to using the Spice
client, it's crucial to ensure that error handling, rate limiting, and response parsing are thoroughly tested and aligned with the previous functionality provided by the OpenAI
client.
@@ -89,9 +89,9 @@ def create_viewer(transcripts: list[Transcript]) -> Path: | |||
|
|||
|
|||
async def add_newline( | |||
iterator: AsyncIterator[ChatCompletionChunk], | |||
iterator: AsyncIterator[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After modifying the add_newline
function to work directly with strings, please ensure its integration is thoroughly tested, especially in scenarios involving streaming responses.
requirements.txt
Outdated
@@ -29,3 +29,4 @@ typing_extensions==4.8.0 | |||
tqdm==4.66.1 | |||
webdriver_manager==4.0.1 | |||
watchfiles==0.21.0 | |||
spice @ git+https://github.com/AbanteAI/spice@main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure stability and predictable behavior, it's recommended to pin the spice
dependency to a specific commit or version tag.
it is in spice's
ah yeah OpenAI doesn't send token counts when you stream (anthropic does which is nice!). So those will need to be counted. I'm moving the token counting code from Mentat to Spice now.
It should always be fine to use their own instances. The only exception would be if we want Spice to handle rate limiting of calls |
@@ -48,7 +48,7 @@ async def enable_agent_mode(self): | |||
] | |||
model = ctx.config.model | |||
response = await ctx.llm_api_handler.call_llm_api(messages, model, False) | |||
content = response.choices[0].message.content or "" | |||
content = response.text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change to use response.text
directly simplifies the code. Ensure that all instances where the LLM API response is processed are updated to this simpler approach.
mentat/embeddings.py
Outdated
embedding_model, | ||
start_time - default_timer(), | ||
) | ||
# TODO: log cost for embedding models properly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment about logging costs for embedding models properly is important for cost management. Prioritize this to ensure accurate cost tracking and reporting.
@@ -96,15 +96,16 @@ async def filter( | |||
stream=False, | |||
response_format=ResponseFormat(type="json_object"), | |||
) | |||
message = (llm_response.choices[0].message.content) or "" | |||
message = llm_response.text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the removal of detailed LLM API call logging, consider implementing a new mechanism to log or monitor these calls for debugging and performance analysis.
@@ -362,39 +363,21 @@ def initialize_client(self): | |||
self.async_client = AsyncOpenAI(api_key=key, base_url=base_url) | |||
self.sync_client = OpenAI(api_key=key, base_url=base_url) | |||
|
|||
self.spice_client = Spice() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initialization of the Spice
client is a key part of the integration. Ensure that the client is correctly configured, especially regarding error handling and rate limiting.
@@ -71,7 +71,7 @@ async def revise_edit(file_edit: FileEdit): | |||
style="info", | |||
) | |||
response = await ctx.llm_api_handler.call_llm_api(messages, model=ctx.config.model, stream=False) | |||
message = response.choices[0].message.content or "" | |||
message = response.text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the change to use response.text
, verify that the revision logic for edits still functions as intended, particularly in terms of parsing and applying the suggested revisions.
file=audio_file, | ||
) | ||
return transcript.text | ||
return await self.spice_whisper_client.get_whisper_transcription(audio_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return await self.spice_whisper_client.get_whisper_transcription(audio_path) |
Ensure to handle the stream
parameter correctly across all LLM API calls. This includes verifying that streaming responses are processed accurately and efficiently, without causing delays or data loss.
object=last_chunk.object, | ||
system_fingerprint=last_chunk.system_fingerprint, | ||
) | ||
yield "\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yield "\n" |
Consider using a more robust method for determining if a file is text-encoded. The current approach might not be efficient for large files.
@@ -29,3 +29,4 @@ typing_extensions==4.8.0 | |||
tqdm==4.66.1 | |||
webdriver_manager==4.0.1 | |||
watchfiles==0.21.0 | |||
spiceai==0.1.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spiceai==0.1.7 | |
spiceai==0.1.7 |
Now that spiceai
is specified, ensure to keep track of new releases for potential updates that could benefit Mentat. Testing the application thoroughly with any new version to confirm compatibility and stability before updating the production environment is crucial.
@@ -48,7 +47,7 @@ def get_system_prompt(self) -> str: | |||
def response_format(self) -> ResponseFormat: | |||
return ResponseFormat(type="text") | |||
|
|||
async def stream_and_parse_llm_response(self, response: AsyncIterator[ChatCompletionChunk]) -> ParsedLLMResponse: | |||
async def stream_and_parse_llm_response(self, response: AsyncIterator[str]) -> ParsedLLMResponse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async def stream_and_parse_llm_response(self, response: AsyncIterator[str]) -> ParsedLLMResponse: |
Given the transition to Spice, ensure that the call_llm_api
method's integration with Spice correctly handles all scenarios previously managed with the OpenAI client, especially error handling and rate limiting.
MENTAT CODE REVIEW IN ACTIVE DEVELOPMENT. Only in use on mentat and internal repos. The pull request introduces significant changes, particularly the integration of the |
@@ -48,7 +48,7 @@ async def enable_agent_mode(self): | |||
] | |||
model = ctx.config.model | |||
response = await ctx.llm_api_handler.call_llm_api(messages, model, False) | |||
content = response.choices[0].message.content or "" | |||
content = response.text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching to response.text
for processing the LLM response simplifies the code. Ensure that this change is consistently applied across all similar instances in the codebase.
@@ -89,22 +87,13 @@ async def filter( | |||
) | |||
) | |||
selected_refs = list[Path]() | |||
start_time = default_timer() | |||
llm_response = await llm_api_handler.call_llm_api( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the removal of detailed LLM API call logging, consider implementing a new mechanism to log or monitor these calls for debugging and performance analysis.
@@ -94,7 +93,7 @@ def line_number_starting_index(self) -> int: | |||
return 0 | |||
|
|||
@override | |||
async def stream_and_parse_llm_response(self, response: AsyncIterator[ChatCompletionChunk]) -> ParsedLLMResponse: | |||
async def stream_and_parse_llm_response(self, response: AsyncIterator[str]) -> ParsedLLMResponse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's recommended to add a comment explaining why response_format
is being checked and potentially transformed here for clarity and future maintainability.
@@ -48,7 +47,7 @@ def get_system_prompt(self) -> str: | |||
def response_format(self) -> ResponseFormat: | |||
return ResponseFormat(type="text") | |||
|
|||
async def stream_and_parse_llm_response(self, response: AsyncIterator[ChatCompletionChunk]) -> ParsedLLMResponse: | |||
async def stream_and_parse_llm_response(self, response: AsyncIterator[str]) -> ParsedLLMResponse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the transition to Spice, ensure that the call_llm_api
method's integration with Spice correctly handles all scenarios previously managed with the OpenAI client, especially error handling and rate limiting.
@@ -67,7 +67,7 @@ async def revise_edit(file_edit: FileEdit): | |||
style="info", | |||
) | |||
response = await ctx.llm_api_handler.call_llm_api(messages, model=ctx.config.model, stream=False) | |||
message = response.choices[0].message.content or "" | |||
message = response.text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the change to use response.text
, verify that the revision logic for edits still functions as intended, particularly in terms of parsing and applying the suggested revisions.
@@ -72,14 +72,14 @@ def check_version(): | |||
def check_model(): | |||
ctx = SESSION_CONTEXT.get() | |||
model = ctx.config.model | |||
if "gpt-4" not in model: | |||
if "gpt-4" not in model and "opus" not in model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The update to the model compatibility warning is important for user guidance. Ensure that this message is clear and accurately reflects the models supported by Mentat.
object=last_chunk.object, | ||
system_fingerprint=last_chunk.system_fingerprint, | ||
) | ||
yield "\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change in add_newline
to work with strings directly is significant. Ensure that this function is tested in all contexts where it's used to verify its correct operation.
@@ -29,3 +29,4 @@ typing_extensions==4.8.0 | |||
tqdm==4.66.1 | |||
webdriver_manager==4.0.1 | |||
watchfiles==0.21.0 | |||
spiceai==0.1.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding spice
from a Git repository directly to requirements.txt
introduces potential instability. Consider using a specific version or commit hash to ensure stability.
) | ||
mock_spice_response = MagicMock() | ||
mock_spice_response.text = value | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider implementing a more detailed logging mechanism for API calls made through the Spice client, similar to the previous implementation with OpenAI. This could help with debugging and monitoring API usage.
@@ -22,6 +22,7 @@ | |||
"Mozilla Public License 2.0 (MPL 2.0)", | |||
"Python Software Foundation License", | |||
"Apache 2.0", | |||
"Apache-2.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure that the license_check.py
script is updated to reflect any new dependencies or changes in licensing that may affect the project's compliance.
Current status: Mentat runs with both OpenAI and Anthropic, through Spice. LiteLLM proxies and AzureOpenAI should also work, although I haven't tested yet.
Still to do before merging:
response_logger_wrapper
, perhaps by passing a callback to SpicePull Request Checklist