-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Molecular input #234
Molecular input #234
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Simon,
thank you very much. Overall it looks good for me.
Besides the inline comments, I have one main point:
If I remember correctly, we have agreed to keep the smiles out of the MolecularInput
and put it into CategoricalMolecularInput
. togehter with stuff as get_bounds etc., so that MolecularInput
can just be used for predictions and not for optimization. I think that also the molfeatures
attribute can be removed from MolecularInput
.
In addition, we can think of replacing the MolecularEncodingEnum
with Molfeatures
.
We can setup a call to discuss this in more detail.
Best and thanks,
Johannes
Hi Johannes. Adjustments have been made based on your suggestions above and from our conversations in private, the following are the notable points:
|
Hi Simon, thank you very much. I will have a look these days ;) |
Can you just close (resolve) the comments that you adressed? This makes it easier to follow ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Simon, thank you very much! It is almost done. In addtion to the comments inline can you also create tests for the functionality in molfeatures and then also include it in specs and the respective serialization and deserialization tests?
Furthermore, I think also some merge conflicts needs to be resolved.
] | ||
# next check that only Categoricalwithdescriptor have the value DESCRIPTOR or are of type MolFeatures | ||
descriptor_keys = [] | ||
for key, value in specs.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, is this also raisig an error if one assigns a molfeatures transform to a categoricaldescriptorinput?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you then also write tests for the addtions in this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally the validate input_processing_specs would have caught what you describe I think But nonetheless, def _validate_transform_specs has been improved to make sure that it will solve these types of issues too. It has been changed from how I had it before so that checking of CategoricalEncodingEnum.DESCRIPTOR and MolFeatures is separate to avoid a bug that can occur in case of user errors that can happen when there are multiple categorical variables inputs with various mistakes in transform type. Furthermore, MolecularInputs require a MolFeatures in the transform specs. Hopefully that's fine with you too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also create a new specs thingy for the Molfeatures
data models and put it also into the serialization and deserialization tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -348,6 +350,16 @@ def _get_transform_info( | |||
[f"{feat.key}{_CAT_SEP}{d}" for d in feat.descriptors] | |||
) | |||
counter += len(feat.descriptors) | |||
elif isinstance(specs[feat.key], MolFeatures): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also include this in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added MolecularInput to test_inputs_get_transform_info
@@ -383,6 +395,9 @@ def transform( | |||
elif specs[feat.key] == CategoricalEncodingEnum.DESCRIPTOR: | |||
assert isinstance(feat, CategoricalDescriptorInput) | |||
transformed.append(feat.to_descriptor_encoding(s)) | |||
elif isinstance(specs[feat.key], MolFeatures): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also include this in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing for this can be found in test_inputs_transform_molecular
. This only tests the transform in the forward direction. This is kept separate from test_inputs_transform
for now because the inverse transform for molecular inputs is not implemented yet.
Apologies for those merge conflicts. Probably should have worked more quickly on this... Mind if we resolve them together in a call ? |
Hi Simon, nothing to be sorry. It is not your fault. It is because it took me always so long to work on it. Currently I am in vacation. But I try to resolve them some evening or is it super urgent? |
Hi Johannes. Not urgent. Enjoy your vacation ;) |
@simonsung06: should be ok now from my perspective. Just have a look ;) |
Thanks! Looks good 👍 |
MolecularInput
feature with the ability to define molecular feature types. Introduced a new kernel type,TanimotoKernel
too. The implementations here are designed to work with SingleTaskGP. Analogous to a previous PR #194, except that this one is a more robust implementation for just theMolecularInput
feature. Categorical Molecular inputs planned to be implemented in a separate PR #210.