-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MNT Add light type annotations to skops.io #219
MNT Add light type annotations to skops.io #219
Conversation
This is in line with the rest of the code base. On top of adding those type annotations, these changes were made: 1. Moved all code from _dispatch.py to _audit.py. _dispatch.py didn't make much sense after the refactor and there were also problems with circular imports. 2. Small refactor in FunctionNode: get_unsafe_set used to call is_safe, which is the inverse of how all other Nodes do it. Changed it to be consistent. 3. Fixed a bug in ReduceNode: Exception would access self.constructor, which does not exist. 4. Add a few comments to Node.__init__ about what subclasses should implement. 5. Add _construct method to Node, raise NotImplementedError 6. Add TODO comment to CachedNode as it doesn't deal with unknown id yet. 7. gettype now raises error when type could not be determined, cannot return None anymore 8. get_untrusted_types now deals with neither data nor file being passed
@skops-dev/maintainers ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, minor todo and quesiton I would like to ask before approving
skops/io/_general.py
Outdated
@@ -280,7 +318,7 @@ def __init__(self, state, load_context: LoadContext, trusted=False): | |||
"step": state["content"]["step"], | |||
} | |||
|
|||
def _construct(self): | |||
def _construct(self) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment applies to basically all of the _construct
functions:
This always return a slice
, so shouldn't we change the type to -> slice:
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually considered that, but decided against it.
The reason is that those types would not help us in our code. That's because everywhere we call _construct
, we cannot statically know which of the dozens of types it would return. And this is because we work with generic state
s. If we had type-specific states, like DictState
or NdArrayState
, we may be able to add overloads that say what type is returned by _construct
based on the state, but we don't have that (and it would be difficult to implement, maybe even impossible with the current typing situation).
So if we added return type annotations here, the only thing that mypy could infer is that _construct
returns a Union[str, int, float, dict, list, NDArray, ...]
, which is not more useful than Any
in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, why do we then need to have the type annotation in the first place? can't we just remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @BenjaminBossan
|
||
|
||
@contextmanager | ||
def temp_setattr(obj: Any, **kwargs: Any) -> Generator[None, None, None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does Generator[None, None, None]
mean? typehints should helps us understand the code better, not to make them more confusing :D
Also, if everything is Any
here, then why do we have typehints?
This is an example where I wouldn't add any typehints here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The meaning for Generator
is Generator[YieldType, SendType, ReturnType]
, which happens to be None, None, None
here. I agree that types don't add much here, though **kwargs: Any
is a shorthand for dict[str, Any]
. The difference for me between a function annotated with Any
vs no annotation is that with the latter, I don't know if it takes any or if it was just not annotated yet, so I guess not completely useless.
I would thus keep the annotations here, but if you insist, I can remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think typehints should be useful, if they're not, they take extra space and make things less readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Adrin
imo, type hints should really be used like docstrings, if having them just takes up space and doesn't make things clearer or help an IDE, they probably don't need to be there.
skops/io/_sklearn.py
Outdated
self, | ||
state: dict[str, Any], | ||
load_context: LoadContext, | ||
constructor: Type[Any], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can also me a function.
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
@E-Aho @adrinjalali I made some suggested changes and replied to the other comments, please take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with most annotations here, but I don't think we should annotate everything for the sake of having everything annotated. If it doesn't add much, we shouldn't add it then.
|
||
|
||
@contextmanager | ||
def temp_setattr(obj: Any, **kwargs: Any) -> Generator[None, None, None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think typehints should be useful, if they're not, they take extra space and make things less readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks really good, thanks!
I do think we can probably remove the -> Any:
s and the Nones
inside the Generator
type hint, but I'd also be ok with including them if you think they're more helpful to have than not to have here.
|
||
|
||
@contextmanager | ||
def temp_setattr(obj: Any, **kwargs: Any) -> Generator[None, None, None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Adrin
imo, type hints should really be used like docstrings, if having them just takes up space and doesn't make things clearer or help an IDE, they probably don't need to be there.
Replying here since the comments are a bit distributed right now @E-Aho @adrinjalali
Okay, I'll remove the I would not remove the with temp_setattr(...) as foo:
foo.bar() mypy knows that it doesn't work because we specified that the yield type is Speaking more generally, Regarding the
as I argued, it provides information, namely that this works on any object and that the dict values can also be any objects. Without types, I wouldn't know if I can pass anything or if the function was just not typed but does have restrictions. Therefore, I'd argue that this doesn't apply:
So I'd leave the types here. Agree? |
It is not adding value.
I think for that we need documentation (docstrings), not typehints. |
We can have both, no :) And generally, type hints are better supported in editors than docstrings. |
I definitely fully agree with Benjamin on type hints being handy inside IDEs. I am happy to add the rest of the type hints. I think though it could maybe be useful to also document what that generator return actually will behave like, as it does look a little odd at first glance to see something returning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to approve this, optionally add a small comment about the format of the Generator[None,...]
output type as it's a bit confusing at first glance if you aren't familiar with that syntax, but LGTM in any case
I added a comment to explain the types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @BenjaminBossan , nice work.
One point: I kinda said from the beginning when we started doing typehints, that I'm fine as long as we don't end up insisting on having hints everywhere. We're somewhat going that way now and that doesn't make me happy. It should be fine for methods to have no typehints.
Yes, I'm not married to type annotations and there are sure situations where they can be quite annoying or where mypy is just not powerful enough to understand what's going on. An example we discussed was a type for sklearn estimators, which would be too much of a hassle to implement. For this particular PR, I should note, however, that types really did help. They uncovered issue 2, 3, 6, 7, and 8 in my first comment. Was it necessary to add all the types to find those? No. But we don't know beforehand where types can help, so when in doubt, I would err towards adding them. |
This is in line with the rest of the code base.
On top of adding those type annotations, these changes were made:
_dispatch.py
to_audit.py
. The name_dispatch.py
didn't make much sense after the refactor,_audit.py
only had 2 functions, and there were also problems with circular imports.FunctionNode
:get_unsafe_set
used to callis_safe
, which is the reverse of how all otherNodes
do it. Changed it to be consistent.ReduceNode
: The raised exception would accessself.constructor
, which does not exist.Node.__init__
about what subclasses should implement._construct
method toNode
, which raisesNotImplementedError
CachedNode
as it doesn't deal with unknownid
yet.gettype
now raises an error when the type could not be determined, it cannot returnNone
anymore.get_untrusted_types
now deals with neitherdata
norfile
being passed.Unfortunately, due 1., this PR looks much bigger than it is, since most of the changes are just moving from one module to the other. I would recommend diffing the new
_audit.py
against the old_dispatch.py
to see the actual changes.