Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNT Add light type annotations to skops.io #219

Merged
merged 8 commits into from
Dec 1, 2022

Conversation

BenjaminBossan
Copy link
Collaborator

This is in line with the rest of the code base.

On top of adding those type annotations, these changes were made:

  1. Moved all code from _dispatch.py to _audit.py. The name _dispatch.py didn't make much sense after the refactor, _audit.py only had 2 functions, and there were also problems with circular imports.
  2. Small refactor in FunctionNode: get_unsafe_set used to call is_safe, which is the reverse of how all other Nodes do it. Changed it to be consistent.
  3. Fixed a bug in ReduceNode: The raised exception would access self.constructor, which does not exist.
  4. Add a few comments to Node.__init__ about what subclasses should implement.
  5. Add _construct method to Node, which raises NotImplementedError
  6. Add TODO comment to CachedNode as it doesn't deal with unknown id yet.
  7. gettype now raises an error when the type could not be determined, it cannot return None anymore.
  8. get_untrusted_types now deals with neither data nor file being passed.

Unfortunately, due 1., this PR looks much bigger than it is, since most of the changes are just moving from one module to the other. I would recommend diffing the new _audit.py against the old _dispatch.py to see the actual changes.

This is in line with the rest of the code base.

On top of adding those type annotations, these changes were made:

1. Moved all code from _dispatch.py to _audit.py. _dispatch.py didn't make much
   sense after the refactor and there were also problems with circular imports.
2. Small refactor in FunctionNode: get_unsafe_set used to call is_safe, which is
   the inverse of how all other Nodes do it. Changed it to be consistent.
3. Fixed a bug in ReduceNode: Exception would access self.constructor, which does
   not exist.
4. Add a few comments to Node.__init__ about what subclasses should implement.
5. Add _construct method to Node, raise NotImplementedError
6. Add TODO comment to CachedNode as it doesn't deal with unknown id yet.
7. gettype now raises error when type could not be determined, cannot
   return None anymore
8. get_untrusted_types now deals with neither data nor file being passed
@BenjaminBossan
Copy link
Collaborator Author

@skops-dev/maintainers ready for review

Copy link
Collaborator

@E-Aho E-Aho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, minor todo and quesiton I would like to ask before approving

skops/io/_audit.py Show resolved Hide resolved
@@ -280,7 +318,7 @@ def __init__(self, state, load_context: LoadContext, trusted=False):
"step": state["content"]["step"],
}

def _construct(self):
def _construct(self) -> Any:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment applies to basically all of the _construct functions:

This always return a slice, so shouldn't we change the type to -> slice:?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually considered that, but decided against it.

The reason is that those types would not help us in our code. That's because everywhere we call _construct, we cannot statically know which of the dozens of types it would return. And this is because we work with generic states. If we had type-specific states, like DictState or NdArrayState, we may be able to add overloads that say what type is returned by _construct based on the state, but we don't have that (and it would be difficult to implement, maybe even impossible with the current typing situation).

So if we added return type annotations here, the only thing that mypy could infer is that _construct returns a Union[str, int, float, dict, list, NDArray, ...], which is not more useful than Any in practice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, why do we then need to have the type annotation in the first place? can't we just remove it?

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



@contextmanager
def temp_setattr(obj: Any, **kwargs: Any) -> Generator[None, None, None]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does Generator[None, None, None] mean? typehints should helps us understand the code better, not to make them more confusing :D

Also, if everything is Any here, then why do we have typehints?

This is an example where I wouldn't add any typehints here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meaning for Generator is Generator[YieldType, SendType, ReturnType], which happens to be None, None, None here. I agree that types don't add much here, though **kwargs: Any is a shorthand for dict[str, Any]. The difference for me between a function annotated with Any vs no annotation is that with the latter, I don't know if it takes any or if it was just not annotated yet, so I guess not completely useless.

I would thus keep the annotations here, but if you insist, I can remove them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think typehints should be useful, if they're not, they take extra space and make things less readable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Adrin

imo, type hints should really be used like docstrings, if having them just takes up space and doesn't make things clearer or help an IDE, they probably don't need to be there.

skops/io/_audit.py Outdated Show resolved Hide resolved
self,
state: dict[str, Any],
load_context: LoadContext,
constructor: Type[Any],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can also me a function.

skops/io/_sklearn.py Show resolved Hide resolved
@BenjaminBossan
Copy link
Collaborator Author

@E-Aho @adrinjalali I made some suggested changes and replied to the other comments, please take another look.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with most annotations here, but I don't think we should annotate everything for the sake of having everything annotated. If it doesn't add much, we shouldn't add it then.



@contextmanager
def temp_setattr(obj: Any, **kwargs: Any) -> Generator[None, None, None]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think typehints should be useful, if they're not, they take extra space and make things less readable.

Copy link
Collaborator

@E-Aho E-Aho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks really good, thanks!
I do think we can probably remove the -> Any:s and the Nones inside the Generator type hint, but I'd also be ok with including them if you think they're more helpful to have than not to have here.



@contextmanager
def temp_setattr(obj: Any, **kwargs: Any) -> Generator[None, None, None]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Adrin

imo, type hints should really be used like docstrings, if having them just takes up space and doesn't make things clearer or help an IDE, they probably don't need to be there.

@BenjaminBossan
Copy link
Collaborator Author

Replying here since the comments are a bit distributed right now @E-Aho @adrinjalali

I do think we can probably remove the -> Any:s and the Nones inside the Generator type hint

Okay, I'll remove the -> Anys from _construct, as those indeed add no further information.

I would not remove the Nones from Generator, since they do indicate information, e.g. if the context manager returns something. So if someone wrote code like

with temp_setattr(...) as foo:
    foo.bar()

mypy knows that it doesn't work because we specified that the yield type is None.

Speaking more generally, Generator, like List, is a generic type and should be further specified if possible (though I admit that it looks strange at first).

Regarding the Anys in cases like this:

def temp_setattr(obj: Any, **kwargs: Any)

as I argued, it provides information, namely that this works on any object and that the dict values can also be any objects. Without types, I wouldn't know if I can pass anything or if the function was just not typed but does have restrictions. Therefore, I'd argue that this doesn't apply:

I think typehints should be useful, if they're not, they take extra space and make things less readable.

So I'd leave the types here. Agree?

@adrinjalali
Copy link
Member

Without types, I wouldn't know if I can pass anything or if the function was just not typed but does have restrictions.

I think for that we need documentation (docstrings), not typehints.

@BenjaminBossan
Copy link
Collaborator Author

I think for that we need documentation (docstrings), not typehints.

We can have both, no :) And generally, type hints are better supported in editors than docstrings.

@E-Aho
Copy link
Collaborator

E-Aho commented Nov 30, 2022

I definitely fully agree with Benjamin on type hints being handy inside IDEs.

I am happy to add the rest of the type hints.

I think though it could maybe be useful to also document what that generator return actually will behave like, as it does look a little odd at first glance to see something returning Generator[None, None, None] if you aren't expressly familiar with that typing syntax

Copy link
Collaborator

@E-Aho E-Aho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to approve this, optionally add a small comment about the format of the Generator[None,...] output type as it's a bit confusing at first glance if you aren't familiar with that syntax, but LGTM in any case

@BenjaminBossan
Copy link
Collaborator Author

I added a comment to explain the types.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BenjaminBossan , nice work.

One point: I kinda said from the beginning when we started doing typehints, that I'm fine as long as we don't end up insisting on having hints everywhere. We're somewhat going that way now and that doesn't make me happy. It should be fine for methods to have no typehints.

@adrinjalali adrinjalali changed the title Add light type annotations to skops.io MNT Add light type annotations to skops.io Dec 1, 2022
@adrinjalali adrinjalali merged commit 69763fe into skops-dev:main Dec 1, 2022
@BenjaminBossan
Copy link
Collaborator Author

Yes, I'm not married to type annotations and there are sure situations where they can be quite annoying or where mypy is just not powerful enough to understand what's going on. An example we discussed was a type for sklearn estimators, which would be too much of a hassle to implement.

For this particular PR, I should note, however, that types really did help. They uncovered issue 2, 3, 6, 7, and 8 in my first comment. Was it necessary to add all the types to find those? No. But we don't know beforehand where types can help, so when in doubt, I would err towards adding them.

@BenjaminBossan BenjaminBossan deleted the audit-type-annotations branch December 1, 2022 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants