Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update hdf5 dataset #292

Merged
merged 1 commit into from
Feb 23, 2024
Merged

update hdf5 dataset #292

merged 1 commit into from
Feb 23, 2024

Conversation

guillemsimeon
Copy link
Collaborator

Update hdf5 dataset such that now charges, spin and partial charges can be included at the same time. Also, renamed partial charges to 'pq' to be consistent with other datasets such as Ace.

update hdf5 dataset such that now charges, spin and partial charges can be included at the same time. also, renamed partial charges to 'pq' to be consistent with other datasets such as Ace
@RaulPPelaez
Copy link
Collaborator

LGTM thanks.

@RaulPPelaez RaulPPelaez merged commit ea8664f into torchmd:main Feb 23, 2024
2 checks passed
@peastman
Copy link
Collaborator

This PR just broke the Coulomb prior.

The Dataset used with this class must include a `partial_charges` field for each sample, and provide
`distance_scale` and `energy_scale` attributes if they are not explicitly passed as arguments.

@peastman
Copy link
Collaborator

Also, as much as possible I'd like to avoid renaming fields from the hdf5 file. One of the goals of #289 is that you can add arbitrary extra fields to the dataset and then specify ones to pass on to the model. Supporting that is one of the next things I'll do once it's merged. Having special fields that unexpectedly get renamed makes it more difficult for the user. Whenever possible the field in the dataset should have the same name as the entry in the file. It's probably too late to change that for energy and forces, but let's at least not add any new fields with magic names.

@guillemsimeon
Copy link
Collaborator Author

guillemsimeon commented Feb 23, 2024 via email

@peastman
Copy link
Collaborator

Thanks, I'll make a PR. I'd actually prefer to get away from abbreviations like that. I'm currently thinking about training models where I feed in multiple types of charges (for example, formal charges and Gasteiger partial charges). Add in total charge which you've said you want, and now you have three different types of charge. Which one is q? Using explicit names like partial_charge, formal_charge, and total_charge seems clearer.

peastman added a commit to peastman/torchmd-net that referenced this pull request Feb 23, 2024
@guillemsimeon
Copy link
Collaborator Author

guillemsimeon commented Feb 23, 2024 via email

@RaulPPelaez
Copy link
Collaborator

I agree completely with Peter, also good luck looking for "s" in a file.

RaulPPelaez added a commit that referenced this pull request Feb 25, 2024
RaulPPelaez pushed a commit that referenced this pull request Feb 26, 2024
Thanks Peter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants