Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Spice #543

Merged
merged 32 commits into from
Apr 3, 2024
Merged

Use Spice #543

merged 32 commits into from
Apr 3, 2024

Conversation

biobootloader
Copy link
Member

@biobootloader biobootloader commented Mar 22, 2024

Current status: Mentat runs with both OpenAI and Anthropic, through Spice. LiteLLM proxies and AzureOpenAI should also work, although I haven't tested yet.

Still to do before merging:

  • fix/update the warning that "Mentat has only be tested on GPT-4" for claude 3
  • fix/update the warning that Mentat doesn't know how to calculate cost/context size for claude 3
  • replace Mentat's response_logger_wrapper, perhaps by passing a callback to Spice
  • spice convert api errors
  • spice log even when stream interrupted
  • have Mentat catch SpiceErrors
  • have spice do embeddings - must be in this PR becuase cost logging changed
  • whisper in spice
  • update types / get pydantic happy
  • make sure it works with images
  • test using LiteLLM proxy
  • test using AzureOpenAI
  • fix tests
  • update docs on running claude
  • put spice on pypi, set version in requirements

Pull Request Checklist

  • Documentation has been updated, or this change doesn't require that

@@ -44,6 +44,7 @@
)
from openai.types.chat.completion_create_params import ResponseFormat
from PIL import Image
from spice import Spice, SpiceResponse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the import for Spice and SpiceResponse is duplicated. It's already imported at the top of the file, so this additional import statement can be removed.

@@ -409,52 +392,34 @@ async def call_llm_api(
start_time = default_timer()
with sentry_sdk.start_span(description="LLM Call") as span:
span.set_tag("model", model)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The switch to using the Spice client for LLM API calls is a significant change. Ensure that the Spice client is correctly configured to handle all the features and requirements of the previous OpenAI client, including error handling, rate limiting, and response parsing.

@@ -89,9 +89,9 @@ def create_viewer(transcripts: list[Transcript]) -> Path:


async def add_newline(
iterator: AsyncIterator[ChatCompletionChunk],
iterator: AsyncIterator[str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The add_newline function has been modified to work with a string directly instead of ChatCompletionChunk. This change simplifies the function but requires careful review to ensure it's correctly integrated with the rest of the system, especially where streaming responses are handled.

requirements.txt Outdated
@@ -29,3 +29,4 @@ typing_extensions==4.8.0
tqdm==4.66.1
webdriver_manager==4.0.1
watchfiles==0.21.0
spice @ git+https://github.com/AbanteAI/spice@main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding spice as a dependency directly from a Git repository can introduce instability, as any changes to the main branch of the repository will immediately affect this project. Consider pinning the dependency to a specific commit or version tag to ensure consistent behavior.

Copy link
Contributor

mentatbot bot commented Mar 22, 2024

MENTAT CODE REVIEW IN ACTIVE DEVELOPMENT. Only in use on mentat and internal repos.
Please Reply with feedback.

This pull request introduces significant changes, notably the switch to using the Spice client for LLM API calls and modifications to utility functions to simplify their interfaces. While these changes can potentially improve the codebase's maintainability and performance, careful consideration is required to ensure that all features and error handling previously provided by the OpenAI client are adequately covered. Additionally, the direct dependency on the spice Git repository should be managed carefully to avoid potential instability.

@@ -44,6 +44,7 @@
)
from openai.types.chat.completion_create_params import ResponseFormat
from PIL import Image
from spice import Spice, SpiceResponse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the import for Spice and SpiceResponse is duplicated and already present at the top of the file. Consider removing this additional import to maintain code cleanliness.

temperature=config.temperature,
response_format=response_format,
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure comprehensive testing of the Spice client's integration, focusing on error handling, rate limiting, and response parsing to match or exceed the functionality previously provided by the OpenAI client.

@@ -89,9 +89,9 @@ def create_viewer(transcripts: list[Transcript]) -> Path:


async def add_newline(
iterator: AsyncIterator[ChatCompletionChunk],
iterator: AsyncIterator[str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After modifying the add_newline function to work with strings directly, verify its integration, especially in scenarios involving streaming responses, to ensure it behaves as expected.

requirements.txt Outdated
@@ -29,3 +29,4 @@ typing_extensions==4.8.0
tqdm==4.66.1
webdriver_manager==4.0.1
watchfiles==0.21.0
spice @ git+https://github.com/AbanteAI/spice@main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider pinning the spice dependency to a specific commit or version tag to ensure stability and predictable behavior across different environments.

Copy link
Contributor

mentatbot bot commented Mar 22, 2024

MENTAT CODE REVIEW IN ACTIVE DEVELOPMENT. Only in use on mentat and internal repos.
Please Reply with feedback.

This pull request introduces significant architectural changes, particularly in integrating the Spice client for LLM API calls and simplifying utility function interfaces. While these changes have the potential to enhance the project's maintainability and efficiency, it's crucial to ensure comprehensive testing and consider the stability implications of directly depending on a Git repository for the spice package. Additionally, removing duplicated imports and verifying the correct integration of modified utility functions will help maintain code quality.

@@ -44,6 +44,7 @@
)
from openai.types.chat.completion_create_params import ResponseFormat
from PIL import Image
from spice import Spice, SpiceResponse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears the import for Spice and SpiceResponse is indeed duplicated. Please remove the duplicate import to maintain code cleanliness.

stream=stream,
temperature=config.temperature,
response_format=response_format,
logging_callback=cost_tracker.log_api_call_stats,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the switch to using the Spice client, it's crucial to ensure that error handling, rate limiting, and response parsing are thoroughly tested and aligned with the previous functionality provided by the OpenAI client.

@@ -89,9 +89,9 @@ def create_viewer(transcripts: list[Transcript]) -> Path:


async def add_newline(
iterator: AsyncIterator[ChatCompletionChunk],
iterator: AsyncIterator[str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After modifying the add_newline function to work directly with strings, please ensure its integration is thoroughly tested, especially in scenarios involving streaming responses.

requirements.txt Outdated
@@ -29,3 +29,4 @@ typing_extensions==4.8.0
tqdm==4.66.1
webdriver_manager==4.0.1
watchfiles==0.21.0
spice @ git+https://github.com/AbanteAI/spice@main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure stability and predictable behavior, it's recommended to pin the spice dependency to a specific commit or version tag.

@granawkins
Copy link
Member

The approach looks good to me, and this as a starting point.

I had to install anthropic manually. Maybe it needs to be in Spice's pyproject?

Also getting these errors after each response:
image

If Butler and Ragdaemon also use Spice directly (in addition to with Mentat), should they pass around an instance of Spice? Or use their own?

@biobootloader
Copy link
Member Author

@granawkins

I had to install anthropic manually. Maybe it needs to be in Spice's pyproject?

it is in spice's requirements.txt, it should install once spice is coming from PyPI instead of github

Also getting these errors after each response

ah yeah OpenAI doesn't send token counts when you stream (anthropic does which is nice!). So those will need to be counted. I'm moving the token counting code from Mentat to Spice now.

If Butler and Ragdaemon also use Spice directly (in addition to with Mentat), should they pass around an instance of Spice? Or use their own?

It should always be fine to use their own instances. The only exception would be if we want Spice to handle rate limiting of calls

@@ -48,7 +48,7 @@ async def enable_agent_mode(self):
]
model = ctx.config.model
response = await ctx.llm_api_handler.call_llm_api(messages, model, False)
content = response.choices[0].message.content or ""
content = response.text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to use response.text directly simplifies the code. Ensure that all instances where the LLM API response is processed are updated to this simpler approach.

embedding_model,
start_time - default_timer(),
)
# TODO: log cost for embedding models properly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment about logging costs for embedding models properly is important for cost management. Prioritize this to ensure accurate cost tracking and reporting.

@@ -96,15 +96,16 @@ async def filter(
stream=False,
response_format=ResponseFormat(type="json_object"),
)
message = (llm_response.choices[0].message.content) or ""
message = llm_response.text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the removal of detailed LLM API call logging, consider implementing a new mechanism to log or monitor these calls for debugging and performance analysis.

@@ -362,39 +363,21 @@ def initialize_client(self):
self.async_client = AsyncOpenAI(api_key=key, base_url=base_url)
self.sync_client = OpenAI(api_key=key, base_url=base_url)

self.spice_client = Spice()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initialization of the Spice client is a key part of the integration. Ensure that the client is correctly configured, especially regarding error handling and rate limiting.

@@ -71,7 +71,7 @@ async def revise_edit(file_edit: FileEdit):
style="info",
)
response = await ctx.llm_api_handler.call_llm_api(messages, model=ctx.config.model, stream=False)
message = response.choices[0].message.content or ""
message = response.text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the change to use response.text, verify that the revision logic for edits still functions as intended, particularly in terms of parsing and applying the suggested revisions.

file=audio_file,
)
return transcript.text
return await self.spice_whisper_client.get_whisper_transcription(audio_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return await self.spice_whisper_client.get_whisper_transcription(audio_path)

Ensure to handle the stream parameter correctly across all LLM API calls. This includes verifying that streaming responses are processed accurately and efficiently, without causing delays or data loss.

object=last_chunk.object,
system_fingerprint=last_chunk.system_fingerprint,
)
yield "\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
yield "\n"

Consider using a more robust method for determining if a file is text-encoded. The current approach might not be efficient for large files.

@@ -29,3 +29,4 @@ typing_extensions==4.8.0
tqdm==4.66.1
webdriver_manager==4.0.1
watchfiles==0.21.0
spiceai==0.1.7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
spiceai==0.1.7
spiceai==0.1.7

Now that spiceai is specified, ensure to keep track of new releases for potential updates that could benefit Mentat. Testing the application thoroughly with any new version to confirm compatibility and stability before updating the production environment is crucial.

@@ -48,7 +47,7 @@ def get_system_prompt(self) -> str:
def response_format(self) -> ResponseFormat:
return ResponseFormat(type="text")

async def stream_and_parse_llm_response(self, response: AsyncIterator[ChatCompletionChunk]) -> ParsedLLMResponse:
async def stream_and_parse_llm_response(self, response: AsyncIterator[str]) -> ParsedLLMResponse:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async def stream_and_parse_llm_response(self, response: AsyncIterator[str]) -> ParsedLLMResponse:

Given the transition to Spice, ensure that the call_llm_api method's integration with Spice correctly handles all scenarios previously managed with the OpenAI client, especially error handling and rate limiting.

Copy link
Contributor

mentatbot bot commented Apr 3, 2024

MENTAT CODE REVIEW IN ACTIVE DEVELOPMENT. Only in use on mentat and internal repos.
Please Reply with feedback.

The pull request introduces significant changes, particularly the integration of the Spice client for LLM API calls. It's crucial to ensure comprehensive testing, especially for error handling, rate limiting, and response parsing. Additionally, managing the direct dependency on the spice Git repository carefully is important to maintain stability. The updates to model compatibility warnings are a positive step towards guiding users more effectively.

@@ -48,7 +48,7 @@ async def enable_agent_mode(self):
]
model = ctx.config.model
response = await ctx.llm_api_handler.call_llm_api(messages, model, False)
content = response.choices[0].message.content or ""
content = response.text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to response.text for processing the LLM response simplifies the code. Ensure that this change is consistently applied across all similar instances in the codebase.

@@ -89,22 +87,13 @@ async def filter(
)
)
selected_refs = list[Path]()
start_time = default_timer()
llm_response = await llm_api_handler.call_llm_api(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the removal of detailed LLM API call logging, consider implementing a new mechanism to log or monitor these calls for debugging and performance analysis.

@@ -94,7 +93,7 @@ def line_number_starting_index(self) -> int:
return 0

@override
async def stream_and_parse_llm_response(self, response: AsyncIterator[ChatCompletionChunk]) -> ParsedLLMResponse:
async def stream_and_parse_llm_response(self, response: AsyncIterator[str]) -> ParsedLLMResponse:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's recommended to add a comment explaining why response_format is being checked and potentially transformed here for clarity and future maintainability.

@@ -48,7 +47,7 @@ def get_system_prompt(self) -> str:
def response_format(self) -> ResponseFormat:
return ResponseFormat(type="text")

async def stream_and_parse_llm_response(self, response: AsyncIterator[ChatCompletionChunk]) -> ParsedLLMResponse:
async def stream_and_parse_llm_response(self, response: AsyncIterator[str]) -> ParsedLLMResponse:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the transition to Spice, ensure that the call_llm_api method's integration with Spice correctly handles all scenarios previously managed with the OpenAI client, especially error handling and rate limiting.

@@ -67,7 +67,7 @@ async def revise_edit(file_edit: FileEdit):
style="info",
)
response = await ctx.llm_api_handler.call_llm_api(messages, model=ctx.config.model, stream=False)
message = response.choices[0].message.content or ""
message = response.text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the change to use response.text, verify that the revision logic for edits still functions as intended, particularly in terms of parsing and applying the suggested revisions.

@@ -72,14 +72,14 @@ def check_version():
def check_model():
ctx = SESSION_CONTEXT.get()
model = ctx.config.model
if "gpt-4" not in model:
if "gpt-4" not in model and "opus" not in model:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update to the model compatibility warning is important for user guidance. Ensure that this message is clear and accurately reflects the models supported by Mentat.

object=last_chunk.object,
system_fingerprint=last_chunk.system_fingerprint,
)
yield "\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in add_newline to work with strings directly is significant. Ensure that this function is tested in all contexts where it's used to verify its correct operation.

@@ -29,3 +29,4 @@ typing_extensions==4.8.0
tqdm==4.66.1
webdriver_manager==4.0.1
watchfiles==0.21.0
spiceai==0.1.7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding spice from a Git repository directly to requirements.txt introduces potential instability. Consider using a specific version or commit hash to ensure stability.

)
mock_spice_response = MagicMock()
mock_spice_response.text = value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider implementing a more detailed logging mechanism for API calls made through the Spice client, similar to the previous implementation with OpenAI. This could help with debugging and monitoring API usage.

@@ -22,6 +22,7 @@
"Mozilla Public License 2.0 (MPL 2.0)",
"Python Software Foundation License",
"Apache 2.0",
"Apache-2.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that the license_check.py script is updated to reflect any new dependencies or changes in licensing that may affect the project's compliance.

@biobootloader biobootloader marked this pull request as ready for review April 3, 2024 23:24
@biobootloader biobootloader merged commit f3cabe6 into main Apr 3, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants