pre-training corpus #80

Humorloos · 2022-06-02T10:29:40Z

Hello @autoliuweijie, thank you for your amazing and inspiring work!

I would like to pre-train a K-Bert model on an english language corpus and to make it work I am currently trying to get the function in train_and_validate() to run, with args.target set to "bert". I notice that with this setting, BertDataLoader will be used for loading the data, but I am not sure what exact format the dataset file at dataset_path has to be. From the code, I see that it has to be pickle file, but I am having trouble trying to reconstruct one that works with the data loader.

It would be very helpful to have access to the data file originally used for pre-training. Could you provide a link or instructions on how to construct it myself?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pre-training corpus #80

pre-training corpus #80

Humorloos commented Jun 2, 2022 •

edited

Loading

pre-training corpus #80

pre-training corpus #80

Comments

Humorloos commented Jun 2, 2022 • edited Loading

Humorloos commented Jun 2, 2022 •

edited

Loading