Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
train_bloom_tokenizer.ipynb		train_bloom_tokenizer.ipynb

Repository files navigation

Create a tokenizer for bigScience-bloom-1B7 model with Persian-English Corpus

You can check all the steps in the notebook or you can run the notebook on kaggle if you want to test it out

This tokenizer is trained on more than 3.2 million rows of data.

To get more info about the data I used, check out dataset page on 🤗Datasets

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook 100.0%