Skip to content

Ali619/train-bloom-1b7-tokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Open In Kaggle Model on HF

Create a tokenizer for bigScience-bloom-1B7 model with Persian-English Corpus

You can check all the steps in the notebook or you can run the notebook on kaggle if you want to test it out

This tokenizer is trained on more than 3.2 million rows of data.

  • To get more info about the data I used, check out dataset page on 🤗Datasets

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published