Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REFACTOR-#4631: Add mypy checks for modin.distributed #5109

Merged
merged 15 commits into from
Oct 25, 2022

Conversation

rosdyana
Copy link
Contributor

@rosdyana rosdyana commented Oct 9, 2022

Signed-off-by: Rosdyana Kusuma rosdyana.kusuma@gmail.com

What do these changes do?

  • commit message follows format outlined here
  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves Add mypy checks for modin.distributed #4631
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

Signed-off-by: Rosdyana Kusuma <rosdyana.kusuma@gmail.com>
@rosdyana rosdyana requested a review from a team as a code owner October 9, 2022 15:42
Copy link
Collaborator

@mvashishtha mvashishtha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rosdyana thank you so much for making this PR. I have left some suggestions inline.

There will likely be some errors once you accept my suggestion to edit the mypy.ini file so it includes modin/distributed/dataframe/pandas/partitions.py. Please try to address them and check your work locally by running mypy --config-file mypy.ini modin/distributed. If you have any questions or you get stock, please comment here.

mypy.ini Outdated Show resolved Hide resolved
modin/distributed/dataframe/pandas/partitions.py Outdated Show resolved Hide resolved
modin/distributed/dataframe/pandas/partitions.py Outdated Show resolved Hide resolved
modin/distributed/dataframe/pandas/partitions.py Outdated Show resolved Hide resolved
modin/distributed/dataframe/pandas/partitions.py Outdated Show resolved Hide resolved
modin/distributed/dataframe/pandas/partitions.py Outdated Show resolved Hide resolved
modin/distributed/dataframe/pandas/partitions.py Outdated Show resolved Hide resolved
modin/distributed/dataframe/pandas/partitions.py Outdated Show resolved Hide resolved
Signed-off-by: Rosdyana Kusuma <rosdyana.kusuma@gmail.com>
Signed-off-by: Rosdyana Kusuma <rosdyana.kusuma@gmail.com>
@mvashishtha
Copy link
Collaborator

@rosdyana in order to pass the commit lint, you will have to amend your first commit so that it says something like, "FIX-#4631: Add mypy checks for modin.distributed."

You should be able to amend your first commit with:

  1. git rebase -i d5bd6376d5b7735c4c676e74bd73ef36edcdfdff~1
  2. You should then get an editable list of actions for each commit. Choose "r" for reword for your first commit, then "p" for each of the other commits to keep them on top of the first one.
    Screen Shot 2022-10-10 at 10 42 11 AM
  3. Save and exit the editable list of actions.
  4. You should get an editor again asking you to reword your initial commit. Do so.
    Screen Shot 2022-10-10 at 10 51 14 AM
  5. Resolve merge conflicts

Signed-off-by: Rosdyana Kusuma <rosdyana.kusuma@gmail.com>
Signed-off-by: Rosdyana Kusuma <rosdyana.kusuma@gmail.com>
Signed-off-by: Rosdyana Kusuma <rosdyana.kusuma@gmail.com>
@mvashishtha
Copy link
Collaborator

@rosdyana it looks like you've added commit e904736 with the correct formatting, but your first commit still has the incorrect formatting. Your commits as they are will still fail the CI. Could you please try rewording your first commit?

Meanwhile, I'll respond to the open inline comments. Please ping this thread if I haven't responded within 24 hours from now.

@mvashishtha
Copy link
Collaborator

@rosdyana just FYI, next time you want to accept someone's suggestion on a GitHub PR, I highly recommend using the "commit suggestion" if you want to accept one suggestion, or the "add suggestion to batch" button if you have multiple suggestions. That usually reduces the amount of work you have to do and makes it easier for everyone to track what's happening with the PR.

Screen Shot 2022-10-11 at 12 10 05 AM

Signed-off-by: Rosdyana Kusuma <rosdyana.kusuma@gmail.com>
Signed-off-by: Rosdyana Kusuma <rosdyana.kusuma@gmail.com>
@rosdyana
Copy link
Contributor Author

@rosdyana in order to pass the commit lint, you will have to amend your first commit so that it says something like, "FIX-#4631: Add mypy checks for modin.distributed."

You should be able to amend your first commit with:

  1. git rebase -i d5bd6376d5b7735c4c676e74bd73ef36edcdfdff~1
  2. You should then get an editable list of actions for each commit. Choose "r" for reword for your first commit, then "p" for each of the other commits to keep them on top of the first one.
    Screen Shot 2022-10-10 at 10 42 11 AM
  3. Save and exit the editable list of actions.
  4. You should get an editor again asking you to reword your initial commit. Do so.
    Screen Shot 2022-10-10 at 10 51 14 AM
  5. Resolve merge conflicts

I tried to amend that commit, but it doesn't work, I have no idea. Do you mind if I re-create the PR?

@mvashishtha
Copy link
Collaborator

@rosdyana what's the error that you got? Did you try to "reword" the first commit and "squash" or "pick" the other commits? Rebasing can be hard, but it's better for history keeping and easier for everyone involved with the PR if we can keep all the comment history in a single PR.

@rosdyana
Copy link
Contributor Author

@rosdyana what's the error that you got? Did you try to "reword" the first commit and "squash" or "pick" the other commits? Rebasing can be hard, but it's better for history keeping and easier for everyone involved with the PR if we can keep all the comment history in a single PR.

Yes, I did "reword" for the first commit, and then choose "pick" for the other commits. Should I "squash" the other commits?

@mvashishtha
Copy link
Collaborator

@rosdyana I'm going to remove the commit requirement in #4982. Please don't worry about the commit message or rebasing at all for now.

modin/distributed/dataframe/pandas/partitions.py Outdated Show resolved Hide resolved
modin/distributed/dataframe/pandas/partitions.py Outdated Show resolved Hide resolved
modin/distributed/dataframe/pandas/partitions.py Outdated Show resolved Hide resolved
modin/distributed/dataframe/pandas/partitions.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@mvashishtha mvashishtha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rosdyana I left some more comments. Please respond here or in the comment threads if you have any questions or concerns. Also, please be sure to run mypy --config-file=mypy.ini modin/distributed/ locally from the modin root to check whether mypy is passing. If it fails and you can't fix the failures, just say so here and we'll try to come to a resolution.

Once again, there's no need to rebase or amend your first commit.

modin/distributed/dataframe/pandas/partitions.py Outdated Show resolved Hide resolved
modin/distributed/dataframe/pandas/partitions.py Outdated Show resolved Hide resolved
Using Axes instead of List for index and columns

Signed-off-by: Rosdyana Kusuma <rosdyana.kusuma@gmail.com>
@rosdyana
Copy link
Contributor Author

@rosdyana I left some more comments. Please respond here or in the comment threads if you have any questions or concerns. Also, please be sure to run mypy --config-file=mypy.ini modin/distributed/ locally from the modin root to check whether mypy is passing. If it fails and you can't fix the failures, just say so here and we'll try to come to a resolution.

Once again, there's no need to rebase or amend your first commit.

Hi @mvashishtha the mypy returns this error

modin/distributed/dataframe/pandas/partitions.py:168:5: error: Statement is unreachable  [unreachable]
Found 1 error in 1 file (checked 4 source files)

assert factory.io_cls.frame_cls._partition_mgr_cls is not None

Should we ignore it? or we need to solve it?

@codecov
Copy link

codecov bot commented Oct 14, 2022

Codecov Report

Merging #5109 (88ef64c) into master (4bcce6e) will decrease coverage by 28.05%.
The diff coverage is 0.00%.

@@             Coverage Diff             @@
##           master    #5109       +/-   ##
===========================================
- Coverage   84.56%   56.50%   -28.06%     
===========================================
  Files         256      256               
  Lines       19347    18709      -638     
===========================================
- Hits        16361    10572     -5789     
- Misses       2986     8137     +5151     
Impacted Files Coverage Δ
modin/distributed/dataframe/pandas/partitions.py 0.00% <0.00%> (-89.48%) ⬇️
modin/distributed/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
modin/experimental/core/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
modin/_compat/core/latest/base_io.py 0.00% <0.00%> (-100.00%) ⬇️
modin/experimental/pandas/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
modin/experimental/sklearn/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
modin/experimental/xgboost/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
modin/distributed/dataframe/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
modin/_compat/pandas_api/latest/utils.py 0.00% <0.00%> (-100.00%) ⬇️
modin/_compat/pandas_api/latest/window.py 0.00% <0.00%> (-100.00%) ⬇️
... and 152 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mvashishtha
Copy link
Collaborator

@rosdyana I think we need some more mypy annotations before we can convince mypy that all those classes are not None-- right now I think it thinks that some of them are None, so the assertions will fail and we'll never reach the code below. Can you add # type: ignore[unreachable] to the end of line 168 (the last assertion)?

Signed-off-by: Rosdyana Kusuma <rosdyana.kusuma@gmail.com>
Copy link
Collaborator

@pyrito pyrito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rosdyana could you rebase on the latest master? There were some changes that broke CI earlier but it should be fixed now.

@mvashishtha
Copy link
Collaborator

@rosdyana @pyrito no need to rebase on master. I don't think the breaking changes are in the branch for this PR. @rosdyana you should be able to just push an empty commit to get testing working again:

git commit -sam "Empty commit to trigger CI." --no-verify --allow-empty && git push

Signed-off-by: Rosdyana Kusuma <rosdyana.kusuma@gmail.com>
@rosdyana
Copy link
Contributor Author

Hey, @mvashishtha @pyrito let me know if I need to change anything else. Thanks!

@vnlitvinov
Copy link
Collaborator

vnlitvinov commented Oct 21, 2022

@rosdyana please take a look at CI failure here: https://github.com/modin-project/modin/actions/runs/3289605374/jobs/5432602430

Maybe we should define that Union under if TYPE_CHECKING, and leave it as Any otherwise, smth like

if TYPE_CHECKING:
    from modin.core.execution.ray.implementations.pandas_on_ray.partitioning import (
        PandasOnRayDataframePartition,
    )
    from modin.core.execution.dask.implementations.pandas_on_dask.partitioning.partition import (
        PandasOnDaskDataframePartition,
    )
    PartitionUnionType =  Union[PandasOnRayDataframePartition, PandasOnDaskDataframePartition]
else:
    from typing import Any
    PartitionUnionType =  Any

# skip many lines of code

    def _unwrap_partitions() -> list:
        [p.drain_call_queue() for p in modin_frame._partitions.flatten()]
    
        def get_block(partition: PartitionUnionType) -> np.ndarray:

(untested)

Signed-off-by: Rosdyana Kusuma <rosdyana.kusuma@gmail.com>
Copy link
Collaborator

@vnlitvinov vnlitvinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot @rosdyana!

@vnlitvinov vnlitvinov merged commit ae465be into modin-project:master Oct 25, 2022
@pyrito
Copy link
Collaborator

pyrito commented Oct 25, 2022

Thanks for your contributions here @rosdyana !!

@mvashishtha
Copy link
Collaborator

+1 @rosdyana thank you for your contribution. Please continue reach out on GitHub, slack, or discuss, if you would like to contribute more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add mypy checks for modin.distributed
6 participants