Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Spice #543

Merged
merged 32 commits into from
Apr 3, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
9eade0e
add spice to reqs
biobootloader Mar 19, 2024
8f229d6
anthropic working on mentat
biobootloader Mar 22, 2024
18079f9
update check model
biobootloader Mar 22, 2024
51014d6
logging callback
biobootloader Mar 22, 2024
f5588f1
update
biobootloader Mar 23, 2024
2d111cf
interrupted logging
biobootloader Mar 27, 2024
d474ce2
spice errors
biobootloader Mar 30, 2024
d0c46c6
Merge branch 'main' into integrate-spice
biobootloader Mar 30, 2024
443fba1
spice embeddings
biobootloader Mar 30, 2024
d739320
spicewhisper
biobootloader Mar 31, 2024
f649ad7
remove
biobootloader Mar 31, 2024
e7d7844
comment
biobootloader Mar 31, 2024
5f1fe25
embedding cost logging
biobootloader Mar 31, 2024
ae3d9e1
cleanup
biobootloader Mar 31, 2024
f766d80
fixed
biobootloader Mar 31, 2024
528e2b9
ruff
biobootloader Mar 31, 2024
eeae387
convert spice error
biobootloader Apr 1, 2024
f9446b9
typing
biobootloader Apr 1, 2024
ebc9abf
get spice from pypi
biobootloader Apr 1, 2024
6026d0d
spice version upgrade
biobootloader Apr 2, 2024
da22b60
hmm
biobootloader Apr 2, 2024
91bc6eb
update spice
biobootloader Apr 2, 2024
93f192d
spice
biobootloader Apr 2, 2024
3f9a9e7
apache-2.0
biobootloader Apr 2, 2024
36c2a5a
mocking spice response
biobootloader Apr 3, 2024
d727903
Merge branch 'main' into integrate-spice
biobootloader Apr 3, 2024
52fd9ec
fix tests
biobootloader Apr 3, 2024
c2bd2fa
anotha one
biobootloader Apr 3, 2024
4bf16e1
ruff ruff
biobootloader Apr 3, 2024
798e42c
format
biobootloader Apr 3, 2024
acb114e
docs
biobootloader Apr 3, 2024
4e40c30
oops
biobootloader Apr 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions mentat/agent_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ async def enable_agent_mode(self):
]
model = ctx.config.model
response = await ctx.llm_api_handler.call_llm_api(messages, model, False)
content = response.choices[0].message.content or ""
content = response.text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to use response.text directly simplifies the code. Ensure that all instances where the LLM API response is processed are updated to this simpler approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to response.text for processing the LLM response simplifies the code. Ensure that this change is consistently applied across all similar instances in the codebase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to response.text for processing the LLM response simplifies the code. Ensure that this change is consistently applied across all similar instances in the codebase.


paths = [Path(path) for path in content.strip().split("\n") if Path(path).exists()]
self.agent_file_message = ""
Expand Down Expand Up @@ -87,7 +87,7 @@ async def _determine_commands(self) -> List[str]:
ctx.stream.send(f"Error accessing OpenAI API: {e.message}", style="error")
return []

content = response.choices[0].message.content or ""
content = response.text

messages.append(ChatCompletionAssistantMessageParam(role="assistant", content=content))
parsed_llm_response = await ctx.config.parser.parse_llm_response(content)
Expand Down
2 changes: 1 addition & 1 deletion mentat/conversation.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ async def _stream_model_response(
# TODO: control-c doesn't make sense for VSCode; send information in client agnostic way
stream.send("Streaming... use control-c to interrupt the model at any point\n")
async with stream.interrupt_catcher(parser.shutdown):
parsed_llm_response = await parser.stream_and_parse_llm_response(add_newline(response))
parsed_llm_response = await parser.stream_and_parse_llm_response(add_newline(response.stream()))
granawkins marked this conversation as resolved.
Show resolved Hide resolved
granawkins marked this conversation as resolved.
Show resolved Hide resolved
granawkins marked this conversation as resolved.
Show resolved Hide resolved
granawkins marked this conversation as resolved.
Show resolved Hide resolved
# Sampler and History require previous_file_lines
for file_edit in parsed_llm_response.file_edits:
file_edit.previous_file_lines = code_file_manager.file_lines.get(file_edit.file_path, [])
Expand Down
2 changes: 1 addition & 1 deletion mentat/feature_filters/llm_feature_filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ async def filter(
stream=False,
response_format=ResponseFormat(type="json_object"),
)
message = (llm_response.choices[0].message.content) or ""
message = llm_response.text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the removal of detailed LLM API call logging, consider implementing a new mechanism to log or monitor these calls for debugging and performance analysis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the removal of detailed LLM API call logging, consider implementing a new mechanism to log or monitor these calls for debugging and performance analysis.

tokens = prompt_tokens(messages, model)
response_tokens = count_tokens(message, model, full_message=True)
cost_tracker.log_api_call_stats(
Expand Down
83 changes: 24 additions & 59 deletions mentat/llm_api_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
)
from openai.types.chat.completion_create_params import ResponseFormat
from PIL import Image
from spice import Spice, SpiceResponse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the import for Spice and SpiceResponse is duplicated. It's already imported at the top of the file, so this additional import statement can be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the import for Spice and SpiceResponse is duplicated and already present at the top of the file. Consider removing this additional import to maintain code cleanliness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears the import for Spice and SpiceResponse is indeed duplicated. Please remove the duplicate import to maintain code cleanliness.


from mentat.errors import MentatError, ReturnToUser, UserError
from mentat.session_context import SESSION_CONTEXT
Expand Down Expand Up @@ -97,11 +98,8 @@ def sync_wrapper(*args: Any, **kwargs: Any) -> Any:


# Ensures that each chunk will have at most one newline character
def chunk_to_lines(chunk: ChatCompletionChunk) -> list[str]:
content = None
if len(chunk.choices) > 0:
content = chunk.choices[0].delta.content
return ("" if content is None else content).splitlines(keepends=True)
def chunk_to_lines(content: str) -> list[str]:
granawkins marked this conversation as resolved.
Show resolved Hide resolved
return content.splitlines(keepends=True)


def get_encoding_for_model(model: str) -> tiktoken.Encoding:
Expand Down Expand Up @@ -264,6 +262,9 @@ def __contains__(self, key: object) -> bool:
"gpt-3.5-turbo-16k-0613": Model("gpt-3.5-turbo-16k-0613", 16385, 0.003, 0.004),
"gpt-3.5-turbo-0301": Model("gpt-3.5-turbo-0301", 4096, 0.0015, 0.002),
"text-embedding-ada-002": Model("text-embedding-ada-002", 8191, 0.0001, 0, embedding_model=True),
"claude-3-opus-20240229": Model("claude-3-opus-20240229", 200000, 0.015, 0.075),
"claude-3-sonnet-20240229": Model("claude-3-sonnet-20240229", 200000, 0.003, 0.015),
"claude-3-haiku-20240307": Model("claude-3-haiku-20240307", 200000, 0.00025, 0.00125),
}
)

Expand Down Expand Up @@ -362,39 +363,21 @@ def initialize_client(self):
self.async_client = AsyncOpenAI(api_key=key, base_url=base_url)
self.sync_client = OpenAI(api_key=key, base_url=base_url)

self.spice_client = Spice(provider="anthropic")

try:
self.async_client.models.list() # Test the key
except AuthenticationError as e:
raise UserError(f"API gave an Authentication Error:\n{e}")

@overload
async def call_llm_api(
self,
messages: list[ChatCompletionMessageParam],
model: str,
stream: Literal[True],
response_format: ResponseFormat = ResponseFormat(type="text"),
) -> AsyncIterator[ChatCompletionChunk]:
...

@overload
async def call_llm_api(
self,
messages: list[ChatCompletionMessageParam],
model: str,
stream: Literal[False],
response_format: ResponseFormat = ResponseFormat(type="text"),
) -> ChatCompletion:
...

@api_guard
async def call_llm_api(
self,
messages: list[ChatCompletionMessageParam],
model: str,
stream: bool,
response_format: ResponseFormat = ResponseFormat(type="text"),
) -> ChatCompletion | AsyncIterator[ChatCompletionChunk]:
) -> SpiceResponse:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's recommended to add a comment explaining why response_format is being checked and potentially transformed here for clarity and future maintainability.

session_context = SESSION_CONTEXT.get()
config = session_context.config
cost_tracker = session_context.cost_tracker
Expand All @@ -409,52 +392,34 @@ async def call_llm_api(
start_time = default_timer()
with sentry_sdk.start_span(description="LLM Call") as span:
span.set_tag("model", model)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The switch to using the Spice client for LLM API calls is a significant change. Ensure that the Spice client is correctly configured to handle all the features and requirements of the previous OpenAI client, including error handling, rate limiting, and response parsing.

# TODO: handle this for gpt-4-vision-preview in spice?
# OpenAI's API is bugged; when gpt-4-vision-preview is used, including the response format
# at all returns a 400 error. Additionally, gpt-4-vision-preview has a max response of 30 tokens by default.
# Until this is fixed, we have to use this workaround.
if model == "gpt-4-vision-preview":
response = await self.async_client.chat.completions.create(
model=model,
messages=messages,
temperature=config.temperature,
stream=stream,
max_tokens=4096,
)
else:
# This makes it slightly easier when using the litellm proxy or models outside of OpenAI
if response_format["type"] == "text":
response = await self.async_client.chat.completions.create(
model=model,
messages=messages,
temperature=config.temperature,
stream=stream,
max_tokens=4096,
)
else:
response = await self.async_client.chat.completions.create(
model=model,
messages=messages,
temperature=config.temperature,
stream=stream,
response_format=response_format,
max_tokens=4096,
)

# We have to cast response since pyright isn't smart enough to connect
# the dots between stream and the overloaded create function
response = await self.spice_client.call_llm(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the removal of the detailed OpenAI client configuration and the switch to Spice, ensure that all necessary Spice client configurations are properly set, especially those related to API keys and endpoints.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a detailed comment explaining the workaround for the gpt-4-vision-preview model and its specific issues, especially if this workaround is still relevant with the Spice client.

model=model,
messages=messages,
stream=stream,
temperature=config.temperature,
response_format=response_format,
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure comprehensive testing of the Spice client's integration, focusing on error handling, rate limiting, and response parsing to match or exceed the functionality previously provided by the OpenAI client.

if not stream:
time_elapsed = default_timer() - start_time
response_tokens = count_tokens(
cast(ChatCompletion, response).choices[0].message.content or "",
response.text,
model,
full_message=False,
)
cost_tracker.log_api_call_stats(tokens, response_tokens, model, time_elapsed)
else:
cost_tracker.last_api_call = ""
response = cost_tracker.response_logger_wrapper(
tokens, cast(AsyncStream[ChatCompletionChunk], response), model
)
# TODO: replace this tracking for stream
# response = cost_tracker.response_logger_wrapper(
# tokens, cast(AsyncStream[ChatCompletionChunk], response), model
# )

return response

Expand Down
2 changes: 1 addition & 1 deletion mentat/revisor/revisor.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ async def revise_edit(file_edit: FileEdit):
style="info",
)
response = await ctx.llm_api_handler.call_llm_api(messages, model=ctx.config.model, stream=False)
message = response.choices[0].message.content or ""
message = response.text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the change to use response.text, verify that the revision logic for edits still functions as intended, particularly in terms of parsing and applying the suggested revisions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the change to use response.text, verify that the revision logic for edits still functions as intended, particularly in terms of parsing and applying the suggested revisions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the change to use response.text, verify that the revision logic for edits still functions as intended, particularly in terms of parsing and applying the suggested revisions.

messages.append(ChatCompletionAssistantMessageParam(content=message, role="assistant"))
ctx.conversation.add_transcript_message(
ModelMessage(message=message, prior_messages=messages, message_type="revisor")
Expand Down
10 changes: 5 additions & 5 deletions mentat/splash_messages.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,14 +72,14 @@ def check_version():
def check_model():
ctx = SESSION_CONTEXT.get()
model = ctx.config.model
if "gpt-4" not in model:
if "gpt-4" not in model and "opus" not in model:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update to the model compatibility warning is important for user guidance. Ensure that this message is clear and accurately reflects the models supported by Mentat.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated warnings for model compatibility are crucial for guiding users effectively. Ensure that these messages are displayed prominently in the UI and consider adding a link to documentation or a FAQ section for users who wish to learn more about model compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated warnings for model compatibility are important for guiding users effectively. Ensure that these messages are displayed prominently in the UI and consider adding a link to documentation or a FAQ section for users who wish to learn more about model compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated warnings for model compatibility are important for guiding users effectively. Ensure that these messages are displayed prominently in the UI and consider adding a link to documentation or a FAQ section for users who wish to learn more about model compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the updated model compatibility warnings, consider adding more detailed guidance or links to documentation within the warning messages to help users understand the implications of using unsupported models.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update to the model compatibility warning is important for user guidance. Ensure that this message is clear and accurately reflects the models supported by Mentat.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update to the model compatibility warning is important for user guidance. Ensure that this message is clear and accurately reflects the models supported by Mentat.

ctx.stream.send(
"Warning: Mentat has only been tested on GPT-4. You may experience"
" issues with quality. This model may not be able to respond in"
" mentat's edit format.",
"Warning: The only recommended models are GPT-4 and Claude 3 Opus. "
"You may experience issues with quality. This model may not be able to "
"respond in mentat's edit format.",
style="warning",
)
if "gpt-3.5" not in model:
if "gpt-3.5" not in model and "claude-3" not in model:
ctx.stream.send(
"Warning: Mentat does not know how to calculate costs or context" " size for this model.",
style="warning",
Expand Down
19 changes: 3 additions & 16 deletions mentat/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,9 +89,9 @@ def create_viewer(transcripts: list[Transcript]) -> Path:


async def add_newline(
iterator: AsyncIterator[ChatCompletionChunk],
iterator: AsyncIterator[str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The add_newline function has been modified to work with a string directly instead of ChatCompletionChunk. This change simplifies the function but requires careful review to ensure it's correctly integrated with the rest of the system, especially where streaming responses are handled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After modifying the add_newline function to work with strings directly, verify its integration, especially in scenarios involving streaming responses, to ensure it behaves as expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After modifying the add_newline function to work directly with strings, please ensure its integration is thoroughly tested, especially in scenarios involving streaming responses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in add_newline to work with strings directly is significant. Ensure that this function is tested in all contexts where it's used to verify its correct operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After modifying the add_newline function to work directly with strings, please ensure its integration is thoroughly tested, especially in scenarios involving streaming responses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After modifying the add_newline function to work directly with strings, please ensure its integration is thoroughly tested, especially in scenarios involving streaming responses.

role: Optional[Literal["system", "user", "assistant", "tool"]] = "assistant",
) -> AsyncIterator[ChatCompletionChunk]:
) -> AsyncIterator[str]:
"""
The model often doesn't end it's responses in a newline;
adding a newline makes it significantly easier for us to parse.
Expand All @@ -101,20 +101,7 @@ async def add_newline(
last_chunk = chunk
yield chunk
if last_chunk is not None:
yield ChatCompletionChunk(
id=last_chunk.id,
choices=[
Choice(
delta=ChoiceDelta(content="\n", role=role),
finish_reason=last_chunk.choices[0].finish_reason,
index=0,
)
],
created=last_chunk.created,
model=last_chunk.model,
object=last_chunk.object,
system_fingerprint=last_chunk.system_fingerprint,
)
yield "\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in add_newline to work with strings directly is significant. Ensure that this function is tested in all contexts where it's used to verify its correct operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
yield "\n"

Consider using a more robust method for determining if a file is text-encoded. The current approach might not be efficient for large files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in add_newline to work with strings directly is significant. Ensure that this function is tested in all contexts where it's used to verify its correct operation.



def get_relative_path(path: Path, target: Path) -> Path:
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,4 @@ typing_extensions==4.8.0
tqdm==4.66.1
webdriver_manager==4.0.1
watchfiles==0.21.0
spice @ git+https://github.com/AbanteAI/spice@main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding spice as a dependency directly from a Git repository can introduce instability, as any changes to the main branch of the repository will immediately affect this project. Consider pinning the dependency to a specific commit or version tag to ensure consistent behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider pinning the spice dependency to a specific commit or version tag to ensure stability and predictable behavior across different environments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure stability and predictable behavior, it's recommended to pin the spice dependency to a specific commit or version tag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding spice from a Git repository directly to requirements.txt introduces potential instability. Consider using a specific version or commit hash to ensure stability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider specifying a specific commit hash or version tag for the spice dependency to ensure consistent builds and deployments. Relying on the main branch can introduce unexpected changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure stability and predictable behavior, it's recommended to pin the spice dependency to a specific commit or version tag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure stability and predictable behavior, it's recommended to pin the spice dependency to a specific commit or version tag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once spice is available on PyPI, update this dependency to use a stable version from PyPI instead of the Git repository to ensure more predictable builds and deployments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once spice is available on PyPI, update this dependency to use a stable version from PyPI instead of the Git repository to ensure more predictable builds and deployments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once spice is available on PyPI, it's recommended to switch from the Git repository to a stable release version specified in requirements.txt to ensure consistent build environments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once spice is available on PyPI, it's recommended to switch from the Git repository to a stable release version specified in requirements.txt to ensure consistent build environments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once spice is available on PyPI, update this dependency to use a stable version from PyPI instead of the Git repository to ensure more predictable builds and deployments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once spice is available on PyPI, update this dependency to use a stable version from PyPI instead of the Git repository to ensure more predictable builds and deployments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once spice is available on PyPI, update this dependency to use a stable version from PyPI instead of the Git repository to ensure more predictable builds and deployments.

Loading