Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count seems to be off by 5-8 tokens #5

Open
drorm opened this issue Mar 16, 2023 · 6 comments
Open

Count seems to be off by 5-8 tokens #5

drorm opened this issue Mar 16, 2023 · 6 comments

Comments

@drorm
Copy link

drorm commented Mar 16, 2023

came here from openai/openai-node#18 and I'm using this to count tokens when streaming.
Seems to be off by 5-8 tokens in either direction when comparing to the result that the official API gives when using without streaming.
I haven't done a very deep analysis, just gave it a few prompts:

  • tell me a joke
  • tell me a n sentence story. (where n is between 5 and 12)
@syonfox
Copy link
Owner

syonfox commented Mar 17, 2023

That info is mostly in the demo app. I would probably make a new repository for it.

But I will add those file to npm so that it is possible if not ideal.

//18+ for fetch is required :( 
npm update gptoken // need 0.1.0 if you edit package json



let streamOne = require("gptoken/demo_app/streamOne")
require(dotenv).configure()
//proccess.env.OPENAI_API_KEY = "..."

//streamOne.test()

//streamOne(model, prompt/messages, onToken, onData, onError, options)
//see code 


//after editing .env and installing dotenv use this code to get started



require('dotenv').config();

if(!process.env.OPENAI_API_KEY) console.error("OPENAI_API_KEY is required; npm install dotenv; nano .env; require('d>
let streamOne = require("./node_modules/gptoken/demo_app/streamOne");


function onData(token, msg, parsed, response) { 
        console.log(msg.content);
}

let msgs = [{role: "system", content: "whats the best cat?"}]
streamOne("gpt-3.5-turbo", msgs, onData);






This is my current code feel free to make some edits and improve it here in the demo app.

todo: add check for missing options right now it needs the functions to exist.

@syonfox
Copy link
Owner

syonfox commented Mar 17, 2023

https://github.com/syonfox/GPT-3-Encoder/blob/GPToken/demo_app/streamOne.js

also what token estomator are you using do you have an example of the tokenized output.

We could write a test to compare them. to what is expected.

@drorm
Copy link
Author

drorm commented Mar 18, 2023

I compare it to the result given by the API when not syncing.
When I make the request, I get both of them, to give the results. It's typically between -2% to +2%, and the main purpose is to get an idea of costs. Since 3.5 is so cheap these days, I really don't think it's worth spending a lot of time on this.

@syonfox
Copy link
Owner

syonfox commented Mar 20, 2023

si, I agree that the most usefull and original function is just to get a quick estimate. It would be useful if it was compatible with some of the guts.

Anyway yeah, I think the request can give accurate tokens but a few cents here and there never hurt much.

@syonfox
Copy link
Owner

syonfox commented Mar 20, 2023

This is probably due to issue #6

@drorm
Copy link
Author

drorm commented Mar 27, 2023

Just published https://github.com/drorm/gish and you can see in the screencast, the display of the counting when streaming. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants