-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
running into message/content length impedance over 8K tokens #192
Comments
@qaptoR
What you're probably hitting are the rate limits, which are not directly configurable, but related to your api token/account tiers:
|
you might be right, but it doesn't seem like I should be hitting a limit yet for what I'm using I ran the test mostly with gemini-flash today, and then just once with gpt4o-mini (in the image) which shows that it actually is far less than even 8K tokens of input (its a lot of markdown tabular data with empty cells) it also is annoying because it looks like I had to pay for the query, but I received no response. |
@qaptoR could you turn on the |
The content I'm working with contains information that I don't want public yet (proprietary rpg rules i'm working on). So i converted the text and tested it both at full length and with the same amount of data removed that makes it work. Hence it is about the number of bytes and not to do with the formatting of the content etc.
|
Also, whats your number for
? |
I guess I should have mentioned sooner. I'm on windows 10, using powershell. It looks like it might have a line length of 8191 characters, at least bing says it's the same for powershell and cmd, but i'll double check that. Because my string is 32K+ in length so that doesn't necessarily mean it isn't the problem, just that powershell might have a longer length closer to 32K. also, the requests are definitely leaving my machine, because my token usage has gone up each time I run the command for all of the models that I've tested with. I'm also using a fork of the WIP PR so that I can import files using the annotation @text: but I've made sure to keep it updated on top of the main branch. I can definitely check out the main branch right now. But for the moment I will say that including the text directly in the chat on this branch does not solve the problem. UPDATE: |
@qaptoR I've just checked on my poorly set up windows virtual machine, that as expected passing the payload as a |
Thank you for figuring this out so swiftly! I was driving myself mad reading your code because for the life of me I could not find anything wrong with it (because there isn't). Makes so much sense that it's a platform limitation. I suppose it's much easier to use the api with python when trying to pass a 128k token string through! Working with curl because of the limitation of lua not having an api import library leads to a lot more learning. I will remember this if I'm building an app in a language where I also must work with curl. |
storing payloads on disk and using curl -d@file
@qaptoR please pull main into your branch and try it out |
Bam! Wow, that worked immediately! And it feels so good too, just writing one line to include a file and asking it to summarize the key points! I'm not sure if you saw but I added a way to recursively define import files, so that you can write a file that has itself import commands for the most commonly used reference files. That way you can easily define a bunch of reference templates when you want to ask a question and it just includes everything. So you just helped me a tonne, because I plan on racking up towards that 128k context window all the time! |
@qaptoR I'll guess next stop will be those rate limits 🙂
Not yet, I'm interested but, first I need to build in a generic |
problem appears when trying to write queries which reference, either with visual selection or as part of chat messages, very long files (greater than 8K tokens I think is the problem)(the file in particular I'm trying to include is 37085 bytes, but if I only include 32546 it works (not including the prompt itself which is fairly short)). since a token is roughly 4 bytes it seems to roughly be right.
I get the error
<model> response is empty: ""
How can we take advantage of a 128K token window or greater if it cannot handle greater than 8k tokens?
I see that for googleai and anthropic providers the payload uses model.max_tokens or set a default value, but I cannot find anywhere else it is set/used in the code, especially in the config (although I'm just using grep and surface level manual searching).
The text was updated successfully, but these errors were encountered: