Skip to content

Commit

Permalink
bump semi major version for browser support and loading change
Browse files Browse the repository at this point in the history
  • Loading branch information
syonfox committed Jan 8, 2023
1 parent 3f2c339 commit 36361c0
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 2 deletions.
13 changes: 13 additions & 0 deletions build_bpe_merges.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
const fs = require('fs');
const path = require('path');

const bpe_file = fs.readFileSync(path.join(__dirname, './vocab.bpe'), 'utf-8');

const lines = bpe_file.split('\n');

// bpe_merges = [tuple(merge_str.split()) for merge_str in bpe_data.split("\n")[1:-1]]
const bpe_merges = lines.slice(1, lines.length - 1).map(x => {
return x.split(/(\s+)/).filter(function(e) { return e.trim().length > 0; });
});

fs.writeFileSync('./bpe_merges.js', `module.exports = ${JSON.stringify(bpe_merges)};`);
2 changes: 1 addition & 1 deletion encoder.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@syonfox/gpt-3-encoder",
"version": "1.3.3",
"version": "1.4.0rc",
"description": "Javascript BPE Encoder Decoder for GPT-2 / GPT-3. The \"gpt-3-encoder\" module provides functions for encoding and decoding text using the Byte Pair Encoding (BPE) algorithm. It can be used to process text data for input into machine learning models, or to convert tokenized text back into human-readable format. It also includes functions for counting tokens in a given text and generating statistics about the tokens in a string or array.",

"main": "index.js",
Expand Down

0 comments on commit 36361c0

Please sign in to comment.