Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Updates RFC #20

Open
syonfox opened this issue Jan 8, 2023 · 6 comments
Open

Some Updates RFC #20

syonfox opened this issue Jan 8, 2023 · 6 comments

Comments

@syonfox
Copy link

syonfox commented Jan 8, 2023

Hi i have spent a little bit of time adding to this library in my fork. we simply need to change the URLs back and pull to update these repos.

If anyone is looking for some newer stuff take to speak and feel free to open an issue. I think ill basically leave it here the goal is to make it a 1.4.2 release and have that be final as far as what is needed fro this component.

The major things added are:

  • countTokens (a faster way if you don't care about the contents)

  • tokenStats : some interesting insights into encoded strings such as frequency and position maps.

  • jsdocs: for implementation interface docs

  • browserify: for browser support

The major things to test are: if we can get a good working version of the original python version, I added it to the npm package in case it just works but it would be good to have the package support Node, Browser and Python

JSDocs

Also check out the browser demo browser demo

GitHub last commit
example workflow
github

Compatible with Node >= 12

overall just lmk what you think.

@syonfox
Copy link
Author

syonfox commented Jan 18, 2023

#30

Check out the pull here if interested in a diff

@rossanodr
Copy link

#30

Check out the pull here if interested in a diff

Hey man. I'm just a nobbie but I'm trying to learn with your project.
After I run npm install @syonfox/gpt-3-encoder
I can't import your project.
import {encode, decode, countTokens, tokenStats} from "gpt-3-encoder"
I am getting the error 'Cannot find module 'gpt-3-encoder' or its corresponding type declarations."
:(
I'm using next js

@syonfox
Copy link
Author

syonfox commented Feb 9, 2023

I may be wrong but you will need "@syonfox/gpt-3-encoder"

I believe this is linked to the package.json name field

Let me know if that works. otherwise will have to do more research.

so ...

 import {encode, decode, countTokens, tokenStats} from "@syonfox/gpt-3-encoder" ;

//or 

 const {encode, decode, countTokens, tokenStats} = require("@syonfox/gpt-3-encoder") ;

@syonfox
Copy link
Author

syonfox commented Feb 10, 2023

#31 (comment)

@niieani
Copy link

niieani commented Apr 16, 2023

Hey @syonfox, I've also spent some time fixing up this package in #38 (published as gpt-tokenizer on npm). I don't have the tokenStats implemented in my fork, but I've added some other optimizations and features. See gpt-tokenizer for more info. Open to collab. on further work/features.

@BrentFarese
Copy link

It seems we are running into memory issues in our app using GPT-3-Encoder. Going to try gpt-tokenizer @niieani and will report back if it solves our issue as identified in #38. Seems there are some improvements in your package that might be good to bring into GPT-3-Encoder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants