Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Consistent imports in tests: pandas.util.testing as tm #29272

Closed
SaturnFromTitan opened this issue Oct 29, 2019 · 2 comments · Fixed by #29318
Closed

CLN: Consistent imports in tests: pandas.util.testing as tm #29272

SaturnFromTitan opened this issue Oct 29, 2019 · 2 comments · Fixed by #29318
Labels
Clean Testing pandas testing functions or related to the test suite
Milestone

Comments

@SaturnFromTitan
Copy link
Contributor

SaturnFromTitan commented Oct 29, 2019

This is a follow-up issue of #23018 with the goal of enforcing a consistent approach of importing things from pandas.util.testing inside the test suite.

Currently, there are multiple import styles used:

  1. import pandas.util.testing as tm
  2. from pandas.util.testing import ...
  3. from pandas.util import testing as tm

As it is already the dominant import style used and also less prone to circular imports 1. is the prefered option. After migrating all current occurrences of 2. and 3. we'll add a linting rule that enforces this convention.

These are the current files that use 2. (generated with grep -R -l --include="*.py" "from pandas.util.testing import " pandas/tests:

@SaturnFromTitan
Copy link
Contributor Author

SaturnFromTitan commented Oct 29, 2019

After investigating a few files manually, I think this should be (mostly) scriptable:

  1. Understand which items are imported directly from pandas.util.testing. Then do the following for all of them (for x in ...):
    1. Replace x with tm.x in the whole file
    2. Replace tm.tm.x with tm.x in the whole file (needed because the step before broke some code)
  2. Afterwards, remove the (now broken) imports
  3. If necessary, add import pandas.util.testing as tm

@SaturnFromTitan
Copy link
Contributor Author

SaturnFromTitan commented Oct 30, 2019

I came up with the following script which should solve the problem pretty well (+should be rather straight forward to adjust to other use cases):

import ast
import copy
import pathlib

TM_MODULE = "pandas.util.testing"
FROM_TM_IMPORT_STATEMENT = f"from {TM_MODULE} import"
TM_IMPORT_STATEMENT = f"import {TM_MODULE} as tm"


def read_file(path):
    with open(path, "r") as f:
        print(f"\nReading {path}...")
        return f.read()


def get_ast(content):
    return ast.parse(content)


def has_from_tm_imports(content):
    return FROM_TM_IMPORT_STATEMENT in content


def add_tm_import_if_needed(content):
    if TM_IMPORT_STATEMENT in content:
        # statement is already available
        return content

    if "import" not in content:
        # if nothing is imported, then there's nothing to fix
        return content

    lines = content.split("\n")
    new_lines = copy.deepcopy(lines)
    for i, line in enumerate(lines):
        if "import" in line:
            # add TM_IMPORT_STATEMENT as first import.
            # isort will later resolve the order
            new_lines.insert(i, TM_IMPORT_STATEMENT)
            break
    return "\n".join(new_lines)


def get_direct_imports_to_replace(tree):
    for node in tree.body:
        if isinstance(node, ast.ImportFrom):
            if node.module == TM_MODULE:
                return [n.name for n in node.names]
    raise Exception(f"Couldn't find any direct imports from {TM_MODULE}")


def determine_direct_tm_import_lines(tree):
    start_line = -1
    end_line = -1

    last_was_tm_import = False
    for node in tree.body:
        if last_was_tm_import:
            end_line = node.lineno
            last_was_tm_import = False

        if isinstance(node, ast.ImportFrom) and node.module == TM_MODULE:
            start_line = node.lineno
            last_was_tm_import = True

    assert start_line != -1, "Can't determine start line of from tm import"
    assert end_line != -1, "Can't determine end line of from tm import"

    return start_line, end_line


def remove_direct_tm_import_statements(content):
    """ Get the line number of the unwanted import and the line number of the next statement.

    Everything in between will be removed. This might yield too many empty lines
    removed, but black can fix that later on.
    """
    start_line, end_line = determine_direct_tm_import_lines(tree)
    start_idx = start_line - 1
    end_idx = end_line - 1

    lines = content.split("\n")
    end_idx = check_end_idx(lines, start_idx, end_idx)

    new_lines = lines[:start_idx] + lines[end_idx:]
    return "\n".join(new_lines)


def check_end_idx(lines, start_idx, end_idx):
    """
    end_idx is by default the index of the next statement. This may result in the
    deletion of comments or empty lines which isn't intended. In this case we
    can be "sharper" with end_idx.
    """
    assert start_idx < end_idx, "start_idx must be smaller than end_idx"
    for idx in range(start_idx + 1, end_idx):
        line = lines[idx]
        is_empty_line = line == ""
        is_single_comment = line.startswith("#")
        is_multi_comment = line.startswith('"""') or line.startswith("'''")
        if any([is_empty_line, is_single_comment, is_multi_comment]):
            return idx
    return end_idx


def replace_direct_imports(content, direct_imports):
    """
    Say we have a direct import x. Then in the given file it might be imported as
    'x' or 'tm.x'

    So first we replace 'x' with 'tm.x' which yields
        x -> tm.x
        tm.x ->tm.tm.x

    Now we have to take care of the 'tm.tm.x' issues, so we simply replace 'tm.tm.x' with 'x'
        tm.x -> tm.x
        tm.tm.x -> tm.x

    Aaaand we're done!

    EDIT: Nearly...
    Some modules use helpers like _assert_series_equal which will be transformed to _tm.assert_series_equal which is
    not a valid function name anymore. Therefore we have to add one final transformation '_.tm' to '_'
    """
    for name in direct_imports:
        content = content.replace(name, f"tm.{name}")

    for name in direct_imports:
        content = content.replace(f"tm.tm.{name}", f"tm.{name}")

    content = content.replace("_tm.", "_")
    return content


def overwrite_file(path, content):
    with open(path, "w") as f:
        print(f"Writing {path}...")
        f.write(content)


if __name__ == "__main__":
    directory = "pandas/tests"

    pathlist = pathlib.Path(directory).glob("**/*.py")
    for path in pathlist:
        path_str = str(path)

        content = read_file(path_str)
        if not has_from_tm_imports(content):
            continue

        content = add_tm_import_if_needed(content)
        tree = get_ast(content)
        direct_imports = get_direct_imports_to_replace(tree)
        content = remove_direct_tm_import_statements(content)
        content = replace_direct_imports(content, direct_imports)

        overwrite_file(path_str, content)

I'll open a few PRs for the remaining files :)

@gfyoung gfyoung added the Testing pandas testing functions or related to the test suite label Oct 31, 2019
@jreback jreback added this to the 1.0 milestone Nov 1, 2019
Reksbril pushed a commit to Reksbril/pandas that referenced this issue Nov 18, 2019
Reksbril pushed a commit to Reksbril/pandas that referenced this issue Nov 18, 2019
proost pushed a commit to proost/pandas that referenced this issue Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this issue Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants