Out-of-memory when finetuning large datasets with graphs #146
YouCanNotKnow
announced in
Announcements
Replies: 1 comment
-
How large is the OC20 labels file? This will require a modified implementation of |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi CHGNet devs, I am trying to finetune a model on the Open Catalyst Project dataset (https://github.com/Open-Catalyst-Project/ocp/blob/main/DATASET.md). I've run into memory problems when converting the dataset into graphs.
I have been following fine_tuning.ipynb and make_graphs.py in examples. I am able to convert the structures into graphs, but due to the scale of the dataset, memory runs out before I can make a labels.json file.
I can create labels for each individual graph or for smaller batches of the full dataset, but it looks like GraphData in data/dataset.py can only load a single labels.json. Is there a way to batch load labels into a single dataset, or to merge smaller datasets together? Some way to get around the memory problem and train on the full dataset?
Beta Was this translation helpful? Give feedback.
All reactions